AHA 2.0

Published in

chick-fil-atech

8 min readNov 1, 2022

You know those articles, the ones an R&D wing of a big company put out that purport to be doing really cool stuff with technology? You read the article, get excited about the idea, then years later you think to yourself, “Hey whatever happened to that?” When you go looking there’s nothing.

I hope you read our last article on Edge AI in a Smarter Chick-fil-A (if you didn’t, we suggest starting there first) and are here to follow up and see what happened. When we wrote the first article in the series, it was probably a toss-up whether the project would ultimately succeed or not. However, I am happy to report that something very similar to what we initially pursued is indeed installed in every Chick-fil-A restaurant across the US, Canada and Puerto Rico!

Why We Built the Automated Holding Assistant

This solution, which we call AHA internally, is delivering on a two-part goal:

Making it easier for Chick-fil-A Team Members in our restaurants to sell high quality chicken products by accurately tracking hold times.
Enabling our restaurant teams with analytical insights on the cooking and holding processes so that they can use data to improve.

We did this by marrying our business process for hot holding to a new technology solution that Team Members use to scan in pans of chicken via a combination of etched barcodes on the chicken pans and a 3D camera to automatically manage food quality timers and capture work-in-progress chicken inventory data.

*The legacy Intel NUC + 3D Camera solution*

Some Challenges We Overcame

We learned a lot about the importance of precision in computer vision models through this exercise. Even though we had a great amount of control over things like the distance from camera-to-pan-bracket designs, there are still many environmental variables across 2800+ Chick-fil-A Restaurants. Some of our restaurant kitchens are so small that the hot holding towers (which our solution is attached to) are installed in tight corners, and some so large that they’re essentially two kitchens needing two separate instances of the system that don’t talk to each other.

We found that we had to iterate on our initial proof of concept (from the last article) greatly to include:

· Learning how to maintain a tablet that was ruggedized enough to survive a particularly harsh part of our kitchens and yet cost efficient enough to scale to all our restaurants (with at least two stations per restaurant).

· Redesigning our sheet metal brackets twice to protect components better and prevent misconfigurations in the field like tablets blocking the camera’s view.

· Restructuring the image classifier model a few times to increase accuracy as more environmental factors in view of the camera threw off the original version.

· Stress testing our pans to see how long barcodes lasted before being essentially rubbed off by dishwashers and rough handling.

· Finding a more efficient barcode to increase the chances of the camera finding one given the challenges of glare, hands on the pans and chicken overflowing the edges while scanning.

· Adding traditional computer vision techniques like de-noising and blurring to assist the Z-bar barcode scanning as it struggled to detect our grey-on-silver barcodes.

· Diagnosing and working around Z-bar; finding erroneous barcodes in the pattern of the tile on the floor.

· Purchasing and weighing a whole lot of chicken (and then eat some of it) multiple times as we tried to correlate depth readings to weight. And so much more!

The Beauty and Challenge of Digital Meeting Physical

As a personal sidenote, this combination of physical and software design was such a blast for my engineering mind. There’s great satisfaction of seeing your code affect things in the physical world. Beyond that, building a solution where you control both the physical and software aspects create endless possibilities for solutions.

For example, if our pan-scanning process is struggling in a new restaurant layout, we have options. We could change the physical sheet metal design, add new configurations in the software, write new code, modify our installation or calibration techniques, or leverage some combination of the above. The experience of evaluating various solutions and choosing the one that will be most reliable across thousands of restaurants is thrilling!

After our initial prototype was deployed in some restaurants to prove value, we moved to performing a Failure Mode and Effects Analysis (FMEA) on the solution. This is a great tool for software projects, especially when there are physical components in play. In short, we asked ourselves what architectural changes we would need to make this solution reliable enough to support 24-hour operations in every Chick-fil-A restaurant.

*The tablet solution and holding pans in the real restaurant world*

Minimizing Complexity

This exercise helped us conclude that having a NUC to run the RealSense camera, a Chrome tablet to show the UI, and our Restaurant Compute platform to broker messaging communications was simply too many failure points for the solution. Any failure mode that forces the Restaurant Team Member to change their behavior (e.g. tapping the screen to start timers instead of scanning the pans) is a big no-no.

What We Changed

In effort to simplify and minimize dependencies, we first worked with a few ChromeOS experts to see if we could get the Intel RealSense camera working in Chrome, removing the need for additional hardware (the Intel NUC). We were ultimately unable to solve this challenge.

We also re-evaluated our use of Chrome tablets as finding something rugged enough to handle our kitchen environment was challenging. We eventually landed on a fairly robust tablet that supports Linux and that gave us a lot of flexibility in how we implemented our interactions with peripheral devices (like the camera).

Our research revealed that the Intel Atom processor in our Linux tablet was able to run our Tensorflow model quickly enough to process the camera frames at a pretty good rate, ensuring a responsive system.

At this point, we had a self-contained ecosystem that ensured as few physical and architectural break points as possible. Meaning, as long as the tablet has power, and the USB cable and camera don’t fail (which are pretty rare occurrences), the restaurant has a working system. It still uses the edge MQTT broker to help with syncing data between multiple stations (tablets/camera/bracket combo) and with exfiltrating data to the cloud, but that is no longer a failure point that would disrupt the team’s workflow.

What We Kept

We were able to keep our Python backend code to drive the camera behavior, the back-end messaging communications, and the angular application we developed as the user interfaces.

Some Other Changes

We also made the move from Tensorflow to TFLite. Running Tensorflow on the Intel Atom tablets required compiling it from source. We have yet to automate this process, which means it’s easy for our Tensorflow dependency to get out of date, and it is painful to recompile to apply security patches. The move to TFLite increased our model performance and removed the need for the custom compiling tasks.

With the visibility we had into AHA performance across the chain, we also identified a small number of restaurants where system reliability was sub-optimal. We identified bar code scanning as the root cause of the trouble (bar codes are etched into the stainless-steel pan for pan identification purposes). After experimenting with some different techniques, we learned that adding a simple “Gaussian blur” gave us a significant improvement in scanning for these struggling restaurants, and also a small improvement across the remainder of the chain.

The Power of Analytics

One of the key benefits of implementing this system is capturing data for analytics so that our Operators can improve their restaurant operations.

All the data we capture about pans, scans, and timers flow through an internal, in-restaurant MQTT message bus and eventually up to the cloud.

From there, we become a downstream subscriber of the data we published. We load the data into both a PostgreSQL database and in our enterprise Data Lake. We use the PostgreSQL database to back a custom reporting site for Operators with reports allowing them to see their operational status in near real-time and comparisons to other restaurants in the chain.

We use the Data Lake datasets to back reports for a chain-wide view across this business process, which is used by Chick-fil-A, Inc. Staff to assess the overall operational health of all restaurants across the chain. We also use the Data Lake data to track the operational health of the AHA system through an OEE score we developed. This is how we know if changes to the system are making it better, and how we monitor if performance is degrading over time.

What’s next?

We aren’t done yet! AHA continues to be a living product and code base, and we have more work to do.

Model Refinement

The next step in the ongoing deep dive is to help make the system even more bulletproof by evaluating if an object detector can help us replace the balance of our early-stage “pixel counting code.” Currently, it identifies where in the image the pan is. In our testing so far, it looks promising that an object detector based on EfficientDet-Lite0 will be able to evaluate fast and accurately enough to identify both when a pan is in the bracket and where the pan is in the image.

User Interfaces

We are also working at rebuilding the main set of user interfaces in the kitchen to be able to make all of this operational data as visible and useful to Team Members as possible. We already experience a bit of screen proliferation, and as the number of technology solutions grows in the restaurant, that problem is only getting worse. Instead, our goal is to simplify the existing kitchen production system interfaces and tailor them to the Team Member’s need at each workstation, while also combing and displaying data in smarter ways.

Conclusion

Not all R&D projects end up as successful products. If you’re lucky you get to experience a handful of times in your career where a greenfield project finds its way through the gauntlet of trials to the point where it adds real value to people’s lives. In this case, the timer solution we have deployed throughout the Chick-fil-A organization is doing just that.

We set out with a goal to make “the right thing the easy thing,” and we have concluded that AHA delivers on that goal, resulting in a better experience for customers (by way of speed-of-service and consistent quality) and by reducing complexity in Team Members’ jobs. We have also enabled Restaurant Operators to better identify areas of improvement in their operations and further drive important metrics like food quality and reduction of waste.

I’m thrilled and humbled that our team’s work directly impacts the lives of thousands of Team Members doing incredibly hard jobs in our Chick-fil-A kitchens across the country, hopefully making things just a bit easier for them in the short term while gathering data to help make their jobs significantly easier in the long term.

AHA 2.0

Written by Chick-fil-A Team