Unleashing the power of Real-Time Machine Learning to accelerate the production of toothpaste in Haleon

Introduction

Oleksandr Teslenko
Trusted Data Science @ Haleon
7 min readAug 21, 2023

--

In today’s world, the demand for toothpaste is ever-growing, with consumers seeking a diverse range of products tailored to their specific preferences. In response, Haleon’s toothpaste factories face the challenge of producing an array of toothpaste variants, including popular brands like Sensodyne, Parodontax, and Aquafresh. Despite the wide variety of flavours and characteristics, many of these toothpaste products share similar manufacturing processes. This similarity presents a unique opportunity for optimisation and efficiency.

To address this challenge and improve toothpaste production across all variants, Haleon’s Data Science Team has developed a prediction-based system. This system aims to proactively detect specific stages in the manufacturing process that lead to optimal product outcomes in terms of quality and characteristics while optimising the process in terms of time, cost, and resources. A crucial aspect of this approach is providing operators with real-time data and predictive insights, empowering them to manufacture each batch as close as possible to the reference process.

In this article, we will take an in-depth look at the architecture, design, and implementation of the machine learning system.

Toothpaste production process overview and problem formulation

The manufacturing process for toothpaste is a carefully orchestrated sequence of steps aimed at achieving the desired product characteristics. The production process consists of many different stages. The main steps are the addition of new ingredients into mixers, where they are mixed together, carefully controlled temperature adjustments, changing pressure, speed, cooling, and other critical activities to ensure the optimal formulation of the toothpaste.

Despite its precision, the manual nature of toothpaste production can lead to several caveats over time. One primary challenge is the potential for extended manufacturing times, caused by human-related delays during the execution of each step. These delays can accumulate and significantly impact the overall efficiency of the production process. This not only affects resource utilisation but also adds unnecessary costs to the production process.

Often these delays are caused by the lack of real-time visibility and monitoring. Operators must manually monitor various parameters to determine the completion of each step accurately. As a result, the production process becomes susceptible to delays, wastage, and inefficiencies.

To address the challenges in toothpaste production, a data-driven approach can significantly enhance the manufacturing process. Implementing real-time monitoring and step prediction tool can provide operators with critical insights for successful manufacturing processes.

Mixing storage tank for toothpaste production

System architecture overview

At the heart of the application lies a robust architecture designed to meet specific requirements: low latency, high reliability, and scalability. Operators must have real-time access to updates without any disruptions during the manufacturing process. Furthermore, as production demands grow, the system must easily scale to support additional mixers and manufacturing sites.

The architecture of the system comprises several interconnected components:

  1. Sensors readings collector: The system begins with the Sensors readings collector. This software constantly collects readings from sensors installed on mixers at a frequency of every 5 seconds. Sensor data includes critical parameters such as temperature, pressure, weight, etc., which significantly influence toothpaste quality and consistency.
  2. Azure Data Lake Gen2: The collected sensor readings are aggregated and sent to the Azure Data Lake Gen2 at regular intervals, approximately every minute. The data is stored in CSV format, ensuring compatibility and easy processing for subsequent stages.
  3. Databricks Delta Live Tables: This component serves as the ETL (Extract, Transform, Load) pipeline, efficiently transforming the raw data into a structured format. New files from Azure Data Lake Gen2 are processed in real-time, and the results are saved in the resulting table. This dynamic nature ensures that the data is always up to date and readily available for predictive analysis.
  4. Azure Functions: Azure Functions are used to run predictive models every minute. The predicted stage of the process is written back to the Databricks table.
  5. Power BI: Visualisation and analysis of predictions and sensor readings is done in Power BI. Power BI serves as the platform for operators to gain insights and make informed decisions based on real-time data. The visualisations and dashboards provide a comprehensive view of the manufacturing process, highlighting critical parameters and potential anomalies.

Key elements for managing real-time data

The system incorporates key features that contribute to its success in real-time data processing:

a. Databricks Auto Loader: Efficient data ingestion is crucial for seamless system functioning. Databricks Auto Loader continuously scans a directory in the Azure Data Lake Storage, automatically loading new files as they arrive. Cloud event notification systems facilitate efficient file discovery and incremental loading. Moreover, a checkpoint mechanism tracks processed files, handling failures and ensuring data accuracy without the need for a separate control table.

b. Azure Functions: The use of Azure Functions enables the system to run the prediction model with high efficiency and parallelism. These serverless computing units run the prediction model at regular intervals, typically every minute. Each model instance corresponds to a specific mixer, enabling parallel execution for multiple mixers across different manufacturing sites, and allowing operators to monitor and optimise production processes for individual batches in real-time.

Data Science Model

For step detection (the main steps are mentioned in “Toothpaste production process overview and problem formulation” part) in the toothpaste production process, an approximate Dynamic Time Warping method was used (fastDTW). Dynamic Time Warping is a widely used algorithm in time series analysis to measure the similarity between two temporal sequences that may vary in time or speed.

The fastDTW variant is an approximate version of DTW that significantly reduces computation time while maintaining high accuracy. For accurate step detection, historical data templates for each parameter in the toothpaste production process are prepared. These templates capture the expected signal patterns for every manufacturing step. By analysing past production runs, we can create templates that represent the ideal behaviour for each parameter during each step.

Example of historical data template

During the production process, as new data signals are collected from the sensors on mixers, the fastDTW algorithm compares these online data signals to the historical templates. If the distance between the online signal and the template is below a predefined threshold, it indicates that the step has been detected. The operator is then notified in real-time, enabling them to proceed to the next step promptly.

An in-depth discussion of the model and challenges faced during model development will be done in the next article.

MLOps Setup: Empowering Data Science Workflow

The success of the system heavily relies on an efficient MLOps setup, ensuring smooth data science model deployment, continuous integration, deployment, and monitoring. The key components of the MLOps setup include:

  • Data Science Model Deployment: ​​Unlike conventional ML model training, this project revolves around a distance calculation algorithm. Therefore to ensure agile and automated model deployment, we encapsulate our data science model as a Python package. This package is invoked within Azure Functions, enabling seamless integration with other services. When updates are required, Azure DevOps pipelines are triggered to build and package the new version automatically.
  • CI/CD Automation: CI/CD process is facilitated by Azure DevOps pipelines. These pipelines not only enforce code quality checks but also run a set of tests, including unit, integration, end-to-end, and model accuracy tests. Once all tests are successfully completed, the updated model version is released. The continuous deployment mechanism ensures that our application is easily and consistently deployed across development, test, and production environments.
  • Versioning: Data versioning is achieved through Databricks, providing traceability and easy access to historical data for analysis. Model versioning is managed through Python package versioning, enabling efficient tracking and updates.
  • Monitoring: Azure Application Insights is employed to collect logs and monitor the overall health of the application. Additionally, a custom Azure Function application monitors data freshness and model predictions in real-time. For comprehensive data quality and data drift monitoring Azure ML Monitoring tool was integrated. This choice enables us to easily consolidate metrics from Azure Application Insights, Azure Functions, and Azure ML Monitoring, providing a holistic view of system performance.
  • Infrastructure as Code: Infrastructure management is simplified through the use of Terraform. Most of the application’s infrastructure is defined as code, enabling consistent and reproducible setup and updates across different environments.

Conclusion and Next Steps

The current version of the application provides operators with data about the process and predictive model insights every minute, enabling near real-time monitoring and optimisation of the production process. Now the application is in pilot mode on one of Haleon’s sites. The goal of the system is to help operators to reduce the toothpaste production time.

With the system proving its worth at the current site, the Data Science team has set their sights on expanding its reach to more locations. By adding more sites, the system will foster consistency and standardisation across Haleon’s toothpaste production, optimising operations on a broader scale.

Additionally, the next steps, including increased data frequency from 1 minute to 5 seconds, supervised model training, and advanced monitoring, signify the team’s commitment to relentless improvement and innovation.

--

--