Six Lessons From An IoT Analytics at Scale Implementation

Cognizant AI

Published in

CognizantAI

6 min readJun 25, 2021

By Rajaram Venkataramani and Rohini Chandrasekhar

Focus on the data user, wrangle no data until you have to, and use a decentralized data architecture to minimize costs and maximize business value.

Performing machine learning (ML) analytics on massive datasets from the Internet of Things (IoT) can generate big payoffs. Analyzing data about tracks, signals and other assets is expected to deliver significant savings and increased efficiency to one of our clients, a rail infrastructure operator, through improved preventive maintenance as well as reduced repair costs and service outages.

However, this success required overcoming significant challenges. These included storing and processing more than 15Tbytes of data in the cloud and on-premise, performing high volume analytic processing on both structured and unstructured data and providing the needed reports as quickly and inexpensively as possible.

Here are six lessons we learned about designing the data architecture, making the required data available in a form ML models can use, managing the application programming interfaces (APIs) that link data sources with reporting tools, and ensuring the insights are easily accessible by field personnel.

Lesson 1: Leverage the Cloud for Its Scalability and Analytic Tools

Delivering very large-scale analytics at reasonable speed and expense would have been impossible without the scalable, cost-effective storage, compute and networking infrastructure of a leading hyperscaler, as well as its analytic, data management and reporting tools. We chose Microsoft Azure for its price performance, as well as Databricks to manage data processing and model engineering, and Microsoft’s Power BI for its sophisticated reporting and analytics.

Using the cloud also allowed us to try new approaches and to “fail fast,” using its pay as you go capabilities rather than investing in fixed on-premise infrastructure.

Lesson 2: Use a Loosely Coupled Data Infrastructure Organized Around the Business

We used a loosely coupled data mesh (a network of distributed data providers and consumers, linked by APIs) which does not require any data provider or consumer to have detailed knowledge of the other components to be able to share data with them. This loose coupling means we can easily change any of the data sources, APIs or reporting tools to meet new business needs without having to rearchitect the entire system. This gave us greater flexibility and agility while minimizing costs. The APIs can also tap intelligent auto-management features in the underlying infrastructure to assure the necessary compute and network resources are available for data movement, analysis or machine learning models.

We also found it useful to organize the operational and analytic data by business domains, which include assets such as signals and tracks and processes such as repair and construction. Organizing, naming and managing the data by such business domains made it easier to identify the data needed for specific reports, and to understand how and when to move and process it. It also helped each business unit to control how its data was shared with and used by other business units. As we discuss below, we did the same thing with our APIs, which can call either datasets or business functions, such as “predicted mean time to failure for tracks or signaling equipment.”

Lesson 3: Wrangle Only the Data You Need When You Need It

As much as 80 percent of the work in ML projects goes into the up-front collection, management and processing of data, such as the need to create features from the data that can be used by ML models. This is another area where our loosely coupled data mesh helps reduce cost and delays, by allowing us to move and process only the data we needed as we needed it, rather than migrating and transforming all the data at once or enabling real-time movements for every type of data. We scheduled the movement of data from cloud repositories to reporting tools, and the provisioning of the underlying infrastructure, based on the insights and alerts we needed to generate from it.

One example is the many Tbytes of track image data this client generates every day. Rather than upload and process all this data, we do that work — and incur those costs — only for images of the sections of track for which we have evidence of potential problems.

Lesson 4: Stick to Batch Data Movements Wherever Possible

Analyzing data in real time may sound worthwhile, but the unpredictable, spiky nature of such data movement can send your cloud compute and network costs through the roof. We found that 80 percent of the data our business users needed could instead be moved in batch processes we could schedule and manage to minimize costs. The remaining 20 percent that needs pricier, real-time movement was data that required immediate attention, such as an impending failure in a switching signal that could foul traffic for hundreds of miles.

Even with pre planned, batch migrations we had to understand our data needs — such as which reporting tools needed which data, and how long we had to keep the data — to minimize costly traffic spikes.

Lesson 5: Build Your APIs Around Business Domains

APIs are the universal connectors that link the client’s data sources with their reporting applications, and like the data itself we designed and managed them based on the business domains they serve, such as track or signaling functions. This helped assure we developed the right APIs to meet the needs of each decision maker, and only the APIs we needed, to minimize cost and delay.

We designed, developed, registered and deployed the APIs using common processes, whether those APIs returned data or a business function, such as an alert to schedule preventive maintenance on a section of track. [RS7] This makes it easier for developers to find the APIs and reuse existing APIs rather than waste time and effort reinventing them.

This domain-centric model is a change from the traditional practice of categorizing APIs based on technical attributes, the data sources they call or the applications they serve. Implementing it requires forming an API team to coordinate the data and application teams to ensure the APIs were well defined, reusable and we developed only the necessary APIs.

Lesson 6: Make These Insights Mobile

The final piece of our success formula was building a mobile technology stack that ensured the insights and alerts generated by our analytics were available to workers using mobile devices in the field, using the same decision report tools as they would use in the office. Among the lessons we learned was the need to encrypt this sensitive infrastructure data, and to present it in easily accessible and understandable formats for workers in harsh environments or with poor connectivity. Listening to the “data customer” and tailoring information to their needs, such as providing information about only the areas of track near their location, helped drive acceptance of the tool and the users’ productivity.

It’s All About the Reports — and the Business

The common theme across our best practices is to make the technical infrastructure as invisible as possible, tailoring details such as the management of APIs and when and where to process data behind the scenes. As you tackle the challenges of ML-enabled analysis of IoT data at scale, we recommend starting with the business user and what they need to know, and let those needs dictate the technical choices.

About the Authors

Rajaram Venkataramani is Chief Architect — AI, Analytics and Cloud within Cognizant’s AI practice and Rohini Chandrasekhar is Manager, AI Offerings and Solutions.