Edge AI in a Smarter Chick-fil-A

by Keith Schaefer & Alain Nicolet

The Black Box problem

At Chick-fil-A, we are working to build a smarter restaurant. We want to connect, analyze and automate our restaurants as much as possible to help our business scale, increase our restaurant capacity, and make the lives of our restaurant Operators and their employees easier.

When you bite into a delicious Chick-fil-A sandwich, know that it has been on a journey. It starts with breading (in-restaurant) and then frying/grilling. From there, it enters a process called “hot holding”, which is the transition between cooking and food prep. Every product has a maximum time that it can live in hot holding before it must be discarded. Since all hot products follow this process it is very important that we are able to manage timers to track batches of product throughout their lifecycle, from cooked to hot-holding to served. This ensures the high quality and food safety you expect from Chick-fil-A.

In the back of every Chick-fil-A restaurant is a timer panel, often affectionately referred to as the “black box” or “scoreboard timer”. Its job is simple: to keep track of the quality timers for all the different products we sell. The timer that exists in most restaurants today was custom developed and manufactured just for Chick-fil-A. While it has served us well for many years, it is (figuratively and literally) a black box. To operate and scale more successfully, we need data.

The legacy timer that exists in most restaurants

The average Chick-fil-A restaurant does more than twice the sales of any other quick service restaurant in the US. This makes for a very hectic kitchen environment and a lot of restaurant Team Members looking for as much optimization as possible in their work flows. A resulting principle for our smart restaurant work is to always…

“make the right thing the easy thing to do”.

The black box timer solution does not support this goal — it requires multiple taps by restaurant team members to start timers and an excruciating three-second press-and-hold to cancel an existing timer. To quote Kimberly “Sweet Brown” Wilkins: “Ain’t nobody got time for that!”

In addition it provides exactly zero systematic access to data.

Moving in the right direction

The team’s first step was to build a simple touchscreen timer application. We did this by building a ChromeOS native app that runs on a touchscreen tablet in Kiosk mode.

This solves the basic usability issues with the old timer, and the black box aspect since we’re building all of this on our custom Restaurant Internet of Things (IoT) platform.

By using our IOT platform, all of the data from our new timer is now automatically available to other devices in the restaurant via our pub/sub messaging framework, and it automatically flows into our data lake for querying, reporting, and other downstream analytics workloads. We also get device on-boarding and identity management “for free”.

Timer application running in ChromeOS kiosk mode

Our timer application enables us to display data to restaurant team members and even use voice to prompt them when its time to discard what remains. By publishing MQTT events (simple messaging framework) when the screen is tapped, we are also able to synchronize timers across different stations in the restaurant as needed.

Scanning Pans with 3D Cameras

Remembering to tap a screen is infinitely more difficult than not having to do it at all. Since managing timers is a very important task and forgetting it can result in inaccurate timers and bad data, we needed to develop a simpler process for our busy restaurants. So, we developed a solution for automatically starting and stopping all of our timers.

First, we installed a new bracket next to the chicken holding station. Chicken is handed off in stainless steel pans, so this bracket becomes a natural hand-off point in the process of cooking chicken and bringing it up to the line for assembly. This is important since it is natural to our restaurant team member’s workflow.

Bracket with / without pans

Each transfer pan has been laser-burned with a barcode across it’s top edges.

Above the pan / bracket area we have mounted a 3D camera. Anytime a pan is placed in the bracket, we are able to detect the ban and read the barcode to find out which pan has been “scanned in”. We can then start the correct timer for the pan (for example, spicy chicken).

Since we have a 3D camera, we are also able to count how much chicken is in each pan, giving us a very rich data set for making our chicken production more dynamic!

The Setup

How did we do it?

First, we selected the Intel RealSense D415 camera, which has a price point of about $150. It comes equipped with some fantastic features like Intel’s Vision Processing Unit (VPU). The VPU is a specialized chip that does all the heavy lifting of stitching the stereo images together to calculate the depth of every pixel. This allows most of the heavy-lifting to be done on-board the camera, and reduces the requirement for having any externalized compute solution next to the station. This is important since the restaurant is a wonderful environment, but very hostile towards computers.

3D Camera attached to NUC compute device (for prototyping)

The barcode scanning is fairly trivial — we use pyZBar to scan each image for barcodes. It’s also humorous that the Natural History Museum maintains the Python3 port of ZBar.. but it works! However, the process of knowing when to scan — when a pan is in the bracket and not just in view of the camera — is a little trickier.

We proved out the idea of detecting the pan with some simple ‘pixel counting’ code. We literally hand-tagged the pixels where we expect the top lip of the pan to be. We then obtain the depth value for each pixel and see how many of them are at the right distance from the camera. Since we have our camera and pan mounted on heavy duty stainless steel with almost no wiggle room, this is a valid way of solving the problem, and it works well in our lab.

Sample of the full solution in our lab

But what if a camera gets bumped and turns slightly? Or an installer measures wrong and mounts the camera bracket 2 inches too high? This fundamentally breaks the ‘pixel counting’ method. We want a much more robust solution before rolling out to thousands of different restaurants with different kitchen configurations all across the country.

Machine Learning Meets Chicken

We decided the more robust solution would be to develop a machine learning model to detect when a pan was in the bracket.

Training Data

To gather training data, we set up our 3D camera and started capturing some datasets.

First we captured ‘no pan’ images and tagged them accordingly. To simulate the real restaurant environment, we simulated the assembling of sandwiches nearby, but without a pan in the bracket.

Then we put a pan in the bracket under the camera and did the same thing to capture several thousand images tagged ‘pan’. An image, in this case, is actually a 1280 x 720 array of depth values.

Neural Network Implementation

We used Keras as a wrapper to TensorFlow to create the Convolutional Neural Network (CNN) that we want to train. We landed on the very simple CNN architecture with just 2 convolutional layers, one dense and one dropout layer.

Most CNNs accept jpg or pngs — actual image files. The RealSense camera provides us with images, but we determined that the depth data was actually a cleaner source of data to use for what we need. In order to feed a matrix of depth values to our CNN, we do some preprocessing magic to convert the depth values to a grayscale png that we can use for training. This works very naturally with the Keras tools for CNN training.

Grayscale image of pan in bracket

For training purposes, we scaled the images down to 96 x 96 pixels and ran through 2 epochs of 600 batches (32 images per batch) with 30 validation batches. This trained very quickly (even just on a mac with no GPU support) and was incredibly accurate.

The trained model only takes up 27mb of hard drive space, evaluates in .005 seconds and only uses about 15% of the available processor on a low powered Intel NUC (with no GPU). This is far more robust than our ‘pixel counting’ method, and was actually quicker to develop. In hind sight, we would have started here. The model is not perfect so we plan to continue to capture more training data that includes common edge cases and retrain the model to make it even more robust.

Depth to Volume

Since we already have a set of high resolution depth values for the pan, we can take that depth map and convert it to the volume of chicken in the pan as well. Once we have that volume, we can convert it to weight of chicken per type of chicken.

Now we’re cooking with peanut oil! We know exactly how much chicken was cooked and when, as well as how much is wasted when a pan expires. This is now an incredibly rich source of data that we can use in conjunction with historical forecasts and other real-time signals in building a system to tell restaurant Team Members exactly how much chicken they need to cook and when. This helps us obtain the high quality and the fresh chicken sandwich than you are used to at your local Chick-fil-A!

What’s Next?

We are still in the early days with this solution, but it has proven valuable in our early pilot restaurants. We plan to roll it out to one of our markets (30 or so restaurants) before the end of the year, and then roll out to more restaurants next year.