This column is authored by Srikanth Devarajan, Technical Director, NTT DATA Inc
From a distance, Implementing Internet of Things (IoT) analytics looks like any other analytics implementation. As you get closer, you start seeing some dissimilarities. For example, IoT analytics are distributed to “edge” sites and such distributions are done using technologies that are not commonly employed. Hence, it’s essential for business intelligence (BI) and analytic leaders to adopt a new set of best practices to manage IoT Analytics.
The Key Challenges
1. “Where to start the IoT analytics”, this has always been a struggle for Business intelligence (BI) and Analytics teams to decide. When it comes to Internet of Things, I have seen this happen to traditional teams. Sometimes teams are not even sure about the required technologies.
2. Many IoT analytics applications are often distributed to “edge” sites, where its gets trickier to deploy, manage and support. Edge sites refer to locations far from corporate or cloud data centers. Such edge sites might not have abilities for constant connectivity and often comes with other bottle necks. E.g.: power plants, airplanes, heavy equipment, Cars, other connected vehicles.
3. Shortage in expertise. Teams might be short on deep expertise in streaming analytics, time series data management and other technologies used by Internet of Things analytics.
Where to start the IoT analytics
It is recommended that Analytical Models are developed in the Cloud or at a Central Corporate location. For most IoT applications, BI and analytics apply to operational decision making. This is often implemented using the following two-step process.The first step is an iterative model where the business problem and historical data are evaluated to build the following:
1. Analytical models
2. Data discovery applications
3. BI reporting models
Activities such as data exploration, data preparation, and development of the models themselves are usually included in this stage. This process being iterative, usually takes days to construct models, test, improve and deploy applications.
The second stage occurs after the models are deployed and becomes operational. The new data from sources like sensors and business applications are fed into the model on a recurring basis. If it is a reporting application, appropriate reports are generated on a schedule. If it is a discovery application, the new data is made available to the decision makers using visualizations etc.
The first steps are always implemented centrally for following reasons.
1. Models typically require data from multiple locations for training and testing. Data coming from sources like sensors or locations (like in truck or car) is rarely sufficient. Besides, the models might need internal data like accounting information or 3rd party information like weather. Such data requirements are easier and cheaper to acquire and consolidate if located centrally.
2. The BI teams developing analytical models must be able to collaborate with each other and with the technical teams, or consult decision makers in person.
3. Its less expensive to host it on fewer cloud locations or provision BI software in one or two corporate locations.
Run-time Distribution to IoT Edge Sites
Some IoT analytic applications need to be distributed. This is because the processing must happen or take place only in devices, or at the sites where data is generated. These devices or applications may be far from corporate or cloud data centers. Moreover, some of the devices run autonomously and they cannot stop functioning just because the remote server is down. The wide area network carrying this data can also be slow, and the device could suffer from latency.
I have seen some proposed Internet of Things architecture that implies that, any sensor data shall be immediately sent to the cloud for processing. This should be avoided. Best practice is to filter such Sensor data and condition it before being transmitted to a central location, or stored and processed locally. By doing this, any action can be taken without involving the cloud or a central corporate site. Transmitting every bit of sensor data to a corporate or cloud data center is impractical or impossible in most cases. This could be due to non-availability of high volume networks or super-high bandwidth. For example, an Oil rig may be out of range of cellular towers.
Summing it up, if efficiency factors like application availability, low-latency response, network throughput are important, then bring analytics to the data and not data to the analytics.
Tools and Training
Procure software tools and create in house expertise for software distribution, configuration management and remote monitoring for edge analytics. Majority of Internet of Things analytical applications use similar types of advanced analytic platforms, or data discovery tools. The algorithms are principally similar. They would contain graphical dashboards, tabular reports, data discovery, regression, neural networks and other features found in marketing, finance or other BI applications. However, there are areas where IoT analytic expertise differs. The following are some.
Some Internet of Things applications use event stream processing platforms to process sensor data in near real time. BI experts must understand concepts like Fast Big data. Event streams are time series data. These are stored efficiently as column stores specially designed for this purpose. This contrasts with the relational database that governs traditional BI platforms. Kalman Filters, Fourier transforms are analytical algorithms common to IoT Applications not found in other BI/Analytical applications.
Analytics that are used to support decision automation scenarios where an IoT application generates control signals triggering actuators in physical devices — a concept outside the realm of traditional BI and analytics.