Going from raw sensor data to a decision advantage
Addressing compute, power, storage, transmission, and latency bottlenecks
The goal of monitoring and controlling assets across space, air, land, and sea is to get quickly and cost efficiently to a decision advantage, or 5Ws: What, When, Why, Where, and Who?
Going from raw sensor data to 5Ws however, is a time consuming and costly process. Each asset may have 20-300 sensors, with each sensor collecting 10,000 - 100,000 data points/second, ultimately creating Terabytes of data to be moved and analyzed on a daily basis. Let’s look at the infrastructure and human capital challenges in transforming raw sensor data into decisions:
1. Edge compute/power: Some processing and battery power is needed at the edge (point of data collection) to collect the raw sensor data. Depending on the context, some pre-processing may be required at the edge to reduce the amount of raw data we want to send to the cloud. There are weight, space, and cost limitations to how much power/compute can be available on the edge, more-so in SWaP-C (size, weight, power, and costs) constrained environments.
2. Bandwidth: Intermittent connectivity and high costs, especially in remote or dedicated environments cause problems in streaming large datasets. The number of use-cases increase faster than the bandwidth, creating a capacity crunch.
3. Cloud compute and storage: AI training costs and dealing with large scale datasets is costly and time consuming.
4. Human capital: Every hour of sensor data collected takes data teams 40+ hours to analyze. Additionally, a lot of time is spent on basic data pre-processing where data teams ultimately discard or transform most of the raw data into some other useable form.
How to deal with the challenge of collecting and analyzing TBs of data/day?
There are several ways to address this:
1. Have larger edge compute/storage
2. Rely on bandwidth improvements
3. Deploy more powerful cloud systems
4. Take a step back and understand that the root of all problems is more data
At this point, let us ask two questions:
Q1. Does more data mean more information?
Q2. Can AI collect only useful data?
It turns out that more data does not necessarily mean more information, because a lot of real-world sensor data is redundant. You can read more here. At Lightscline, we are leveraging AI to exploit this redundancy and collect only relevant data upfront. This helps solve different downstream tasks like anomaly detection, classification, etc. 100x faster with a smaller data, compute, and network footprint.
The key point here is that we are not getting performance improvements by collecting more than a certain amount of raw data; unlike information theoretic approaches. This means that we can get to 5Ws with just 1 GB of data instead of 100 GB of raw data; having 100x implications in the AI infrastructure (compute, power, storage, transmission, and latency) and human capital requirements.
A variety of tasks like anomaly detection, classification, signal intelligence, etc. can be solved by using Lightscline, which can be deployed within 10 minutes on-prem or on customer’s cloud platform. Additionally, by lowering the compute, power, and transmission footprint, Lightscline enables applications currently unobtainable like training models on edge devices, 100x faster CI/CD pipelines, ML inference on multi-channel sensor data, and running ML inference on wearables/satellites.
You can see this in action here, and learn more about Lightscline here.