The Challenge and Importance of Data Flow in Distributed IoT Systems

It’s a dirty world out there.

Inside your organization, IT systems and business applications run in a controlled network of interfaces, rules, and data policies. That’s reality.

But as industry competition and commoditization risks loom, you embrace IoT for enabling new services, optimizing operations, and increasing revenue opportunities for you and your customers. That’s digital transformation.

Your plan calls for connecting thousands of devices streaming huge volumes of wide varieties of data from the outside world at high velocity into your critical enterprise systems. That’s a problem.

The result is often an incoming IoT data flow monitor with preset thresholds firing off alerts when limits are exceeded, delivering modest cost savings to the organization. The torrent is shunted to a data lake for analytics tools and data scientists to dredge for deeper insights. What they usually find is a data swamp, filled with bad data from the cheap sensors, harsh environments, flaky networks, and logic errors in the distributed systems through which the data passed before reaching your servers. Little value can be extracted from this dirty mess. Forget machine learning — garbage in, garbage out.

In some cases, raw data flows directly into enterprise systems that are unable to self-correct or must process the data on their own, causing inconsistencies between applications and costly business errors.

The Importance of Centralized IoT Data Management and Data Flow

Centralized IoT data management is critical for achieving IoT value in the enterprise. Without a consistent, clean source of trusted data for downstream systems and users, the ROI of any connected industrial system will be limited. The IoT data flow, starting from the point it enters your system, must ensure that only processed data is provided to the applications that request it. This enables analytics tools to produce actionable insights, as well as reduce the time and effort required by data scientists to train the system for continuous learning.

Creating an Industrial IoT Data Flow for the Enterprise

The diagram below shows the control and flow of data using Bright Wolf Strandz as the central data modeling, management, and processing component of an industrial IoT system. As in most systems, real-time conditions can be monitored with configurable limits and alert mechanisms, but data is also contextualized, stored, and made available to a distributed network of systems, applications, and teams for deeper learning and continuous optimization — which is where real enterprise value is created.

1. Raw data from sensors and other sources is ingested and stored.

2. Rules and policies are applied, flagging data as trusted (clean) or untrusted (dirty) and recording the reason and source of decision.

3. Only trusted data is included in results to queries from users, analysis teams, enterprise systems, machine learning, and analytics tools.

4. Analysis teams and systems can request original data set and/or untrusted data for review at any time. Trusted data can be retroactively flagged as untrusted (dirty) if an error is later found in a rule or data source (ex: temp sensor placed too close to heater or threshold set incorrectly ), removing data from queries requiring trusted data only (ex: historical temp trends).

5. Data analysis results can be cycled back into the system creating a virtuous cycle of optimization and continuous learning.

System Architecture Defines Data Flow From the Beginning

In order for industrial IoT systems to produce value through proper data flow in production, the system must be architected with a central data management plan in the prototype. This avoids costly misunderstandings between business and IT teams and prevents real technical failures and unforeseen limitations as you scale. If you’re just getting started with an IoT project and have questions, or are facing challenges with your existing system, let us know how we can help.

IoT Data Management