Why We Put a Supercomputer in a Grocery Store

Intelligent Retail Lab
Intelligent Retail Lab
6 min readAug 23, 2019

--

By Jeff Moody

At the Intelligent Retail Lab (IRL), our mission is to explore how Artificial Intelligence (AI) can be used to improve the in-store experience for our customers and associates. While the application of AI for e-commerce is now table stakes, there haven’t been many explorations for how AI can be applied inside a real-world retail environment, like a grocery store, and that’s why we built IRL, our in-store AI lab. In order to learn more about what challenges are faced in a physical retail space, and start the process of building the technology to help solve these challenges and make the shopping experience better, we started with sensors.

Initially, we installed 40 cameras inside a single aisle. Over time, we tested additional sensors: load cells, depth sensors, and shelf pushers to name a few. With the realization that the extra capacity could help us prove out a variety of use cases, we set our sights on building an AI factory. Future-proofing the lab with added sensors from the onset meant we could easily productionize tests and leverage real-time data from the store. Our first area of focus will provide a myriad of benefits for our customers, associates and business: to understand when products are no longer in-stock on the shelf.

Among all of the sensors we’d been exploring to accomplish this task, we found one which gave us the best price-per-sensor to usefulness-of-data ratio: the cameras. As a result, we started working on getting cameras to face all of the shelves in the store. With more cameras came more challenges in being able to process and manage all of that data, so we began exploring how best to process the product imagery from our cameras.

Given that each camera is pushing out megabytes and megabytes of data per second (between 6 to 9 Megabytes or 50 to 75 Megabits per second) and we have cameras throughout the entire store, we functionally couldn’t push all of that data into the Cloud to be processed. Even if we could push all of that data into the Cloud, the latency (time it takes to get data to get from point A to point B) would add several minutes of delay between the camera “seeing” something in-store, the AI interpreting that information, and a message getting to an in-store associate to let them know that your favorite drink or breakfast cereal is out of stock. There were a few options we explored to solve this problem, though given the very nature of doing something which really hasn’t been done before, we opted to put a large cluster of general-purpose computers in-store. Enter: the data center.

Clustering 100 servers so that they can share computer processing power, memory and storage in order to analyze all of the data generated inside IRL as close to real-time as possible was no small task for the IRL DevOps team. As a group of engineers with experience in operating, managing, maintaining, and running computer systems combined with a healthy dash of software development, we set out to scale from 5 servers to 100 as seamlessly as possible. With our goals set on having a fully functioning 100 node “supercomputer” in the store, we looked to the technologies we knew best on our team to make this possible. By leveraging the power of Chef Software’s Chef Infra application for IT Systems Automation, we have shortened the timeframe for getting a server running and into our cluster from days of manual commands to minutes running Chef Infra code.

With our ability to scale our AI platform server from one server to dozens quickly, we decided to focus our efforts. As the old saying goes, “The only way to eat an elephant is one bite at a time.” Instead of trying to understand the state of all products on all shelves in the store at once, we started by focusing on a few different parts of the store and then expanding from there. In order to do so, we had to build a system different from anything anyone inside the IRL Engineering group had built before. We ran into obstacles exploring the most obvious software solutions: it was too memory intensive, computationally intensive, or network intensive to function correctly beyond just a few shelves of products. With a lot of hard work and investigation into the ecosystem of different software platforms which could potentially be used to handle all of our data and help the computer separate the “interesting” data from the “uninteresting” (specifically, photos and videos from before and after the state of products on a shelf changed), we ultimately settled on the GStreamer Open Source platform for data processing.

GStreamer and the storage and recovery of this data provided its own suite of obstacles to overcome. While we’ve been using the Kubernetes platform for running our AI workloads and the ancillary services to help get data into and out of the AI models, trying to deal with terabytes of data per day on physical servers in a physical grocery store is not a simple or straightforward task. We on the DevOps team explored many, many options to keep all of our work running on a single platform; but ultimately discovered that without a Herculean amount of effort, we were using the wrong tool for the job.

Once we had accepted that the type of work done by our media processing engine wasn’t a good fit for Kubernetes, we looked for similar technologies that would allow us a way to use Continuous Deployment methodologies combined with application configuration as code and immutable software artifacts for easy roll-back if necessary and came upon Chef Habitat. Chef Habitat has given us essentially all the benefits of our Docker + Kubernetes workflow that applies to our other systems while removing the constraints built into Docker in order to allow the media processing engine to work more naturally. While the constraints built into Docker (and extended by Kubernetes) provide a very useful system for solving a lot of modern technology problems around IT Systems Automation, software delivery, software configuration, and software service discovery, it is not a silver bullet for all IT problems. Chef Habitat, similarly, cannot solve all problems, but provides solutions to a lot of the same problems in different ways.

While our focus in the Intelligent Retail Lab around understanding what’s on the shelf is still a work in progress, we’ve been hard at work laying the solid foundations to make expanding that focus easier in the future. We now have a data center capable of processing 1.6TB of raw images per second (the equivalent of about 3 years’ worth of music each second), solid software platforms in Chef Habitat and Kubernetes, scalable systems management with Chef Infra under the Effortless Config design patterns, and cameras located all over the store. With these pieces, we’ve started building and training the AI models to make sure that certain sections of our Lab are in stock, and we’re working hard to add more and more sections as time progresses. In addition, we now have solid platforms which allow us to take focused pieces of IRL technology and start testing their effectiveness in other stores.

--

--

Intelligent Retail Lab
Intelligent Retail Lab

From the team at the Intelligent Retail Lab — Walmart’s artificial intelligence lab located within a fully-operating grocery store.