Time Series analytics in Snowflake for Manufacturing / IOT industry use cases

The manufacturing community is in exciting times with the pace of change in the industry, the diversity of challenges in delivering products on time, innovating to stay competitive, rising costs and labor shortages. These challenges have led organizations to find new ways to leverage technology, data and analytics on their journey to Industry 4.0, Smart Manufacturing, supply chain digitalization or just to improve overall performance. One of the key components of this improvement is to bring together Information Technology (IT) and Operational Technology (OT) data, often referred to as IT/OT convergence. Removing these acronyms, and unfortunately bringing in several others, it is about bringing ERP and other “transaction” oriented data together with manufacturing shop floor or “time series” related data. In many manufacturers, these two environments (IT and OT) have been separated for years, often decades, in siloed systems requiring manual effort to bring data together to perform root cause and other diagnostic or reporting processes.

Time Series data is, as the name implies, data values across time. The inlet pressure on a pump, the volume level on a storage tank, the RPM on a Formula 1 car engine, and even the Snowflake stock price on any given trading day.

From a data complexity perspective it does look very simple. Essentially a timestamp, signal identifier and a value. However — ingesting, storing and most importantly using this kind of data in combination with other data that organizations need to make good decisions has not been an easy thing to do. Despite its simplicity, integrating time series data into common data warehousing and data lake type platforms has been difficult. Some challenges with time series data:

  1. Time series data are found in specialized devices in remote locations, by the millions in production facilities like refineries, using a myriad of DAQ, PLC, DCS, SCADA and IOT edge devices.
  2. The descriptors, or metadata, are also a key component. Ensuring the time series data accurately describes the type of units, location, asset hierarchies, value types (UDTs), operating bands, etc.. are critical to keep maintained and up to date.
  3. Most relational databases have struggled with performance on the volumes of historical streaming data. Deciding what values to store, for how long, which are needed immediately (real time) at the edge, vs. for analytics solutions over time are all considerations. How this data will be used, for what use cases, is another key consideration, balancing storage vs. reporting or analytics requirements.

In addition to these examples, event data, such as sales transactions, may be grouped or aggregated into consistent time intervals, e.g. monthly time series buckets, to support forecasting models. Different use cases require different operations or data engineering capabilities to support the mathematical models applied.

Some of the considerations in analyzing time series data are: how to handle missing data, outliers, inconsistent time intervals or how to tie timer series data to “event” or transaction data types. Before selecting the appropriate models to apply, effort in data engineering is often required to align the data into a more usable format, but doing so in a manner that saves the granular data, allowing for easy “rendering” of the data supporting the many potential use cases.

This convergence is not a simple implementation or App, mainly because of the legacy systems and organizations between these two areas of IT and OT. Organizations with different objectives, often reporting lines that are not the same and even separate career paths for those in these groups, has led to political challenges. Beyond that, the systems and applications and historically even the programming languages in these organizations have been different, requiring different skill sets. However, over the past several years this has begun to change with the use of cloud computing and more stable connectivity, as well as even more data being available from shop floor sensors.

Data is being used to drive decisions and improvements in a number of key manufacturing areas. The implementation steps many manufacturers go through are:

  1. Gain visibility by bringing in shop floor data from Sensors (SCADA, PLC, etc.), MES, Historians , QM or LIMS systems, MRO and even product return and customer service information related to product quality
  2. Identify root cause(s), impacts and variable importance typically through advanced analytics techniques applying AI/ML, but may include statistical process control (SPC) methods
  3. Support decisions, recommended actions and predictions of product quality, yield output and equipment maintenance

This requires capabilities including rapid data ingestion of both transaction as well as streaming time series data, with as little latency as possible to support these business use cases. Supporting these use cases also requires the need to securely store the data, allowing for better results from advanced analytics models.

The time series data is critical in this analysis, providing historical context to develop the models and streaming the data into the deployed models as it becomes available, so that timely actions can be performed. An important capability is to link this time series data to a recorded value from an event or transaction type system, which may be the quality results recorded in a QM or LIMS system, schedules from an MES application, or other time series related data such as control or diagnostics codes. Capabilities such as Snowflakes recently released ASOF JOIN help bring these types of data sets together quickly and easily. (https://docs.snowflake.com/en/sql-reference/constructs/asof-join?_ga=2.178720612.1984317519.1711548346-207824126.1635783707&_fsi=yyG4eiKC)

Smart manufacturing and Industry 4.0 initiatives rely on the use of multiple types of data from the convergence of IT and OT systems. From visibility to the application of ML techniques, the ability to efficiently and quickly store, combine and analyze these different types of data is critical in supporting the modern manufacturing organization.

--

--