Cloud-native solutions for zero unplanned downtime in QSR Supply Chain and Restaurant Operations Equipment

Amit Sharma
Engineered @ Publicis Sapient
13 min readJul 29, 2020

--

Over the past decade, there have been monumental advances in automation of the supply chain and operations of food in the QSR landscape. This momentum is only expected to increase with the recent pandemic, which has escalated trends that have been evolving in the QSR landscape for the past decade. Some of the trends include:

● Heightened public awareness of hygiene in food ingredients, prep and transport

● Downward pressure on costs matched by an upward pressure on commodity prices

● Demand for a greater diversity of menu items with shorter run-times for novel menu items

● Requirement for faster response and adaptability to demand in the supply chain

● Incorporation of fresher ingredients in recipes which have a shorter time to market

In order to cater to these market demands, automation solutions ranging from simple electro-mechanical devices to robots with intelligent hardware and software, are being deployed. The types of equipment are a diverse mix, including:

Optical sensors and automatic process-control systems for quality and safety inspections of food

● Automatic control equipment for food chilling and freezing

● Meat processing equipment for intelligent cutting, deboning and trimming

● Gripping equipment for non-rigid food such as fish fillets and whole fish

● Post-harvest grading equipment for products such as fruits and vegetables

These types of automation solutions lead to many advantages for QSRs and their supply chains, including production quality, product consistency, worker safety, food safety and compliance to legislation.

Along with the benefits of the equipment that drives automation comes the need for maintenance of the equipment. Breakdown of equipment in the supply chain or operations can lead to unplanned downtime, loss of valuable resources, and damage to the customer experience and loyalty. This makes the need for predictive maintenance of equipment vital to the functioning of a QSR.

A seminal study claimed that a “functional predictive maintenance program” can reduce maintenance cost by 30 percent, reduce downtime by 45 percent, and eliminate breakdowns by as much as 75 percent.

Until recently, the processes that would have made predictive maintenance possible either didn’t widely exist in the QSR industry or were far too expensive to be considered practical.

With the right blend of math, sensors and the power of cloud computing, equipment can be fixed before they break, valuable resources can be spared, and unplanned downtime can be rendered obsolete. A QSR’s success in this area hinges on how well it can solve the challenges that accompany the journey to achieve its goals of predictive maintenance.

Challenges with “Predicting” Equipment Maintenance

Volume and Source of Data

Data is growing with every passing day, as every event is dataful, i.e., there is a massive amount of data of different sizes and scales in the form of machine logs, parts data and manuals, sensor and maintenance(repair) data is available to process. This results in 10s of terabytes (maybe petabytes) of data with multiple data formats originating from incongruent systems with varying technology proficiencies. This is multiplied for large QSRs with 100s of restaurants. These formats include structured and unstructured documents, images, streaming data of parts records and others. Additionally, for a problem such as predictive maintenance, the data might need to be retained over a period of years. Identifying a common storage option to host these disparate types at scale needs to be addressed. Cost could be a huge factor for vast data, so identifying a solution with scale and archiving capability is essential.

Document Data and Extraction

Equipment maintenance documents/manuals, such as records and administrative data, have complex terminology that requires time and effort for semantic analysis and processing. Often, this processing is manual, which is error-prone and requires specific expertise. In order to process this information more impeccably, additional data formats and software components might be required to ensure data exchange into legacy systems.

Predictive Data & Meaningful Insights

A key focus area for predictive maintenance includes using data to target the prevention of failed efforts and identifying upfront windows for doing the proper maintenance or ordering parts. Some of the challenges with traditional data analytics include the ingestion of data in batches from multiple sources, storing the data in a data warehouse, and often waiting for days or even months to identify patterns and gather insights after episodes or encounters occur. Additionally, identifying relevant patterns takes time and effort with complex programming models, often contributing to the cost of pursuit.

Data Privacy and Security

Restaurant data could consist of sensitive information that should be protected and maintained in compliance with local, state and federal laws. Further, any data breach could lead to serious consequences for vendors, franchisees and the QSR’s corporate organization.

The Framework to Solve the Challenges

Our recommendation is to adopt a cloud-native solution that offers a number of services and capabilities to address the challenges we outlined. The framework view below shows a high-level logical architecture of the predictive maintenance platform. This architecture can be used to gather deep insights into the different data sets ranging from fault logs, maintenance data, parts/machine, usage history, etc. and provide predictive analytics to proactively identify maintenance needs ahead of time and be cost-efficient.

Framework

The following sections provide the mechanics of the framework:

Streaming/Real-time Data — IOT Sensors/Smart Kitchens/Devices

The framework enables streaming/real-time data ingestion from multiple sources in disparate formats and storing the information in centralized storage. Streaming devices such as smart ovens, smart commercial refrigerators, heating and cooling data, etc. are easy to collect and process in the cloud. Streaming data includes operations data originating from sensors on the machine or other monitoring systems capturing operations events data sources, such as record/back-office systems.

Centralized Storage on Cloud

All the data including raw copies of source system data and transformed data need to go into a single store for prediction and visualization. In the Cloud world, it’s called the Data Lake. The Data Lake on cloud automatically crawls data sources, identifies data formats, and then suggests schemas and transformations, saving time and complexity. Maintenance logs, part removals and part repair history from the maintenance and engineering systems are collected periodically based on the capability of the system. No data is turned away and all data types are supported.

For example, if you upload a series of parts maintenance records, logs and maintenance manuals to a fully-managed extract, transform and load (ETL) tool can scan these documents to identify the schema and data types present in these files. We keep it in its raw form and we only transform it when we’re ready to use it. This metadata is then stored in a catalog to be used in subsequent transforms and queries. Once you define where your lake is located, the cloud solution collects and catalogs this data, moves the data into object storage for secure access, and finally cleans and classifies the data using Machine Learning algorithms. The object storage makes perfect sense for this and normally comes with archival options to make sure cost doesn’t add up while history is retained. Additionally, user-defined tags, parts metadata and maintenance data is stored in No SQL DB, a key-value document database, to add business-relevant context to each data set and for OLTP use.

Data Extraction from Documents/Batch Data

Parts/manuals documents contain complex terminology/relationships that require specific expertise and manual analysis. Cloud’s natural language processing service makes it easy to use Machine Learning to extract relevant maintenance information from unstructured text. You can quickly and accurately gather information, such as parts condition, quantity, strength and frequency from a variety of sources including mechanical notes and records. Cloud solutions can help extract text and data from scanned documents and images with little to no complexity. But this is not a one-time job and needs to be a continuous action. Hence, there are services which can form a data extraction pipeline for further analysis into maintenance patterns. They make orchestration easy and integration with cloud ML to keep predictions crisp.

Predictive Data Using AI/ML

Real-time data are useful to identify usage trends or maintenance needs, the ability to predict based on chronological data is key to tackling it proactively.

Machine Learning is a pillar for this. For your labeled data, you will use the supervised Machine Learning where you feed the features and their corresponding labels into an algorithm in a process called training. During training, the algorithm gradually determines the relationship between features and their corresponding labels. This relationship is called the model. Often in Machine Learning, the model is very complex. For your unlabeled data, you will use unsupervised learning. The goal is to identify meaningful patterns in the data. To accomplish this, the machine must learn from an unlabeled data set. For predictive maintenance, both types of use cases are possible based on the data source.

Cloud-based ML solution is a fully-managed service that provides developers and data scientists the ability to build, train and deploy Machine Learning (ML) models quickly. It removes the heavy lifting from each step of the Machine Learning process to make it easier to develop high-quality models. This is in contrast to the traditional ML development, which is a complex, expensive, iterative process and makes it even harder as there are no integrated tools for the entire Machine Learning workflow. It makes it easy to deploy your trained model into production with a single click so that you can start generating predictions for real-time or batch data. This framework uses Machine Learning to deliver highly accurate predictions based on the technology to combine time series data with additional variables. These solutions require no Machine Learning experience to get started. You only need to provide historical data, plus any additional data that you believe may impact your decisions.

For companies with no ML experience, Cloud Auto ML from GCP and AWS Sage Maker Autopilot could be great options. But there are caveats to this.

A significant point is to make sure that the trained model should test your hypothesis and reach a conclusion. But with Machine Learning, you need to refine a hypothesis and repeat with different feature sets (these contain the predictive power), regularizations or hyperparameters. The goal is to make sure the loss is minimum. The reference architecture below covers how you can set the continuous pipelines to rinse and repeat on the ML model.

Real-time Data Analytics and Visualization

Streaming analytics helps businesses to understand their need in real-time and adjust their actions to better serve needs for beneficiaries. Cloud streaming solutions provide capabilities to analyze streaming data, gain actionable insights and respond to Analytics Framework for maintenance events in real time. There are managed systems that scale automatically to match the volume and throughput and taper down according to your incoming data, making sure that elasticity doesn’t come at the cost. These are good with Big Data which is the need for predictive maintenance.

Once the data is processed, it can then be used for visualization to gain deeper insights into patterns and trends. BI tools or Cloud-powered visualization service make it easy to deliver insights into maintenance trends.

Data Access Security and Compliance

Any data storage and processing requires stringent regulatory and compliance frameworks. You need to be able to control the access of the data with the least privileged principle. Cloud provides several competencies to help meet these compliance requirements for data privacy and security to store and process any type of data.

Data access should be granted with the least privileged principle and Cloud systems excel in this, where you can provide access with predefined or custom roles and policies to limit any exposure for data leaks/hacks.

Cloud systems come with default encryption at rest. There is also a provision for customers to manage their keys for encryption using CMKS.

Data in object storage is stored in chunks where each chunk is further encrypted using different layers like KEK and DEK. It also allows for ways to encrypt data in transit using TLS and two-way SSL, etc.

Reference Architecture and Best Practices

Cloud-native architectures can be built with different Cloud providers. AWS, GCP and Azure are leaders in this. We have captured the reference architecture for AWS and GCP below to help with the understanding.

With AWS Cloud

The following outlines a reference architecture for predictive maintenance on the AWS Cloud. The following are the major areas we have seen in the logical architecture above:

AWS
  1. Operations data, Maintenance logs, part removals and part repair history are collected and uploaded into AWS based on systems capability. The frequency of data collection varies based on the connectivity package installed on the system.
  2. The operations system provides the scheduled and actual operations information. This is critical for correlating delays/issues and usage patterns with component failures.
  3. The IoT system sends fault logs and data in real time using the streams. Amazon Kinesis is used as the streaming solution which includes Amazon Kinesis Data Streams and Amazon Kinesis Data Firehose for data ingestion and storage into data lakes.
  4. Data lake is built on the object storage to capture structured and unstructured data from batch and streaming systems. AWS S3 is a perfect choice for it. Data is captured in the staging bucket before prepping it.
  5. Amazon Glue is used for ETL and schema discovery from the data lake. Glue can also push the data to Dynamo DB (OLTP) and Redshift (OLAP) use.
  6. Identifying information or other compliance data must be dealt with from the data lake before it is used. Access to data is managed with AWS KMS, IAM roles and policies on the buckets and services.
  7. Amazon SageMaker trains the model to correlate issues and usage to fault data, maintenance logs, part removals and operations data. Feature engineering identifies the most significant and predictable chapters and components. Features will define the usage pattern of the equipment and parts about to wear out. This can be applied to kitchen equipment, plumbing, etc. Amazon EMR can be used if a company already has Hadoop or spark jobs infrastructure to train or run the models.
  8. Models are tuned periodically to improve predictions and reduce false positives.
  9. A front application or mobile app can be built to view the online predictions. AWS Amplify can be used with App sync to build a mobile application using the API gateway and lambda. Customers can choose for Elastic beanstalk if they don’t want any management or mobile app capabilities. These applications can send in-app notifications to the beneficiaries for upcoming maintenance or unavoidable hiccup. They can be further extended to send the notifications or set appointments with maintenance companies.
  10. Data can be visualized using the BI tools or sent to the maintenance dashboard to view the KPIs.
  11. All the infrastructure should be automated with Terraform and Jenkins CI/CD pipelines.

With Google Cloud Platform

The following outlines a reference architecture for predictive maintenance on the GCP Cloud. The following are the major areas we have seen in the logical architecture:

GCP
  1. Operations data, Maintenance logs, part removals and part repair history are collected and uploaded into GCP based on systems capability. The frequency of data collection varies based on the connectivity package installed on the system.
  2. The operations system provides the scheduled and actual operations information. This is critical for correlating delays/issues and usage patterns with component failures.
  3. The IoT system sends fault logs and data in real time using the streams. Cloud pub/sub are used as the streaming solution which includes data ingestion and storage into data lakes. Cloud IoT core works seamlessly with Cloud pub/sub to bring this type of data.
  4. Data lake is built on the object storage to capture structured and unstructured data from batch and streaming systems. Cloud Storage is the perfect choice for it. Data is captured in the staging bucket before prepping it.
  5. Cloud Dataflow and Cloud Composer are used for ETL and schema discovery from the data lake. They can also push the data to Cloud Firestore (OLTP) and Bigquery (OLAP).
  6. Identifying information or other compliance data must be dealt with from the data lake before it is used. Access to data is managed with Cloud KMS/CMKs, Cloud IAM roles and policies on the storage buckets and services.
  7. Cloud AI platform trains the model to correlate issues and usage to fault data, maintenance logs, part removals and operations data. Feature engineering identifies the most significant and predictable chapters and components. Features will define the usage pattern of the equipment and parts about to wear out. This can be applied to kitchen equipment, plumbing, etc. Cloud Dataproc can be used if a company already has Hadoop or spark jobs infrastructure to train or run the models. Kubeflow pipelines are also an option for continuous training and tuning of ML models for advanced companies. BigQuery ML is a really interesting prospect if you are heavy on BiqQuery Storage.
  8. Models are tuned periodically to improve predictions and reduce false positives.
  9. A front application or mobile app can be built to view the online predictions. App Engine can be used with Firebase to build a mobile or web application.

GKE can be used to build these microservices-based APIs for Mobile/Web on containers if the company already has clusters running and managed.

Flutter can be a cross-platform choice and works well with Firebase and AutoML.

These applications can send in-app notifications to the beneficiaries for upcoming maintenance or unavoidable hiccup. They can be further extended to send the notifications or set appointments with maintenance companies.

10. Data can be visualized using the BI tools or sent to the maintenance dashboard to view the KPIs. Cloud Data Visualizer can build the desired views from the BigQuery data sets.

11. All the infrastructure should be automated with Terraform and Jenkins CI/CD pipelines.

Conclusion

In a future that will be increasingly automated via intelligent equipment, IoT, and robotics, predictive maintenance is vital for supply chain and operations. It allows for preemptive corrective actions, a decrease in equipment downtime, a decrease in costs for parts and labor, improved crew and customer safety, cost savings, and will ultimately allow QSRs to consistently delight their customers and increase loyalty.

Our recommended solution framework for predictive maintenance is designed to enable QSRs to bring to bear the best of what today’s cloud-native technologies have to offer. The framework can be adapted to any cloud, and we believe that this solution, upon implementation, will prepare QSRs to make optimal decisions in planning, repairing and maintaining the equipment used in their supply chain and operations.

References

1Operations & Maintenance Best Practices: A Guide to Achieving Operational Efficiency, U.S. Department of Energy Federal Energy Management Program, August 2010

Authors

Ravi Evani — Vice President, Engineering

Amit Sharma — Senior Architect

--

--

Amit Sharma
Engineered @ Publicis Sapient

Certified Cloud Architect and an Expert with 18 yrs. of experience in the design and delivery of cloud-native, cost-effective, high-performance DBT solutions.