Computer Vision Case Study: Amazon Go

There’s been a bit of buzz around Amazon Go — Amazon’s new, checkout free grocery and convenience store. It truly does take the idea of convenience to a new level. In order to explore what happens behind the scenes, it’s important to first get an idea of what this offers to a consumer. Let’s do a quick comparison of a conventional shopping routine against one that someone may experience at an Amazon Go store:

Amazon Go

  1. Quick access to everyday products, especially groceries and convenience goods
  2. Turn-style entry. Consumer scans in with Amazon App on smartphone
  3. Consumer goes around the store, picks up items, adds to bag, shops like normal
  4. Consumer exits

Conventional Store

  1. Quick access — same products, groceries, convenience stuff
  2. Enter store and start shopping
  3. Pick up items in the same fashion
  4. Wait in line for cashier or self scan
  5. Take items out of bag, scan them, put back in bag
  6. Consumer exits

If we think about the number of steps as a measure of convenience, it becomes pretty apparent that Amazon Go cuts out about two steps — making it approximately 33% more convenient. The lines are minimized and the process of checking out (arguably the most time consuming and annoying part of shopping) is completely cut out. Since the store has a limited capacity and the cameras need to be able to track individuals, it is possible that you may have to wait in line OUTSIDE of the store.

You may be a little skeptical about the security and the accuracy of the whole process, I know I was. So, what is going on behind the scenes? How can Amazon offer such convenience? Are there still employees? How do the cameras know what I took and how much to charge me?

You asked (errrr, I did, at least), Amazon answered:

So how does it work? We used computer vision, deep learning algorithms and sensor fusion, much like you’d find in self-driving cars. We call it “Just Walk Out” technology. Once you’ve got everything you want, you can just go. When you leave, our “Just Walk Out” technology adds up your virtual cart and charges your Amazon account. Your receipt is sent straight to the app.

You scan in to turn styles with the app on your smartphone and the cameras and computer does the rest of the cart totaling and check out for you. There are still employees in the store, but their primary focus is shifted from check out and register to customer assistance, preparing fresh food, and store maintenance and operations.
But, we all know you’ll hang on to the cupcake

Sounds impressive, right? Basically, Amazon has developed cameras that can recognize individuals, track them around the store, know which account is linked to each customer, understand exactly which product and how many of each are put in your bag, and tally it all up with high confidence. There were some big words mentioned in the above statement though, what is actually going on?

Computer Vision, AI, Machine Learning

As a brief description, computer vision is a machine that sees like a human. This means the technology needs to replicate the brain, the eyes, and the visual cortex. It takes more than simply recording images and video, but recognizing, identifying and understanding the objects that it tracks. To get to a machine to accurately and confidently identify different features in an image requires machine learning.

The computer is likely trained with a multi-layered convolutional neural network. There’s so much to talk about here, so I’m going to save that topic for another day. Basically, each frame will pass through multiple layers of neurons. Each neuron is assigned a specific filter and will check for a feature, like an edge or a bend, in the image. Every filter is slid over a magnified portion of the image and gives comes out with a confidence value. The filters and weights get trained as more data is passed in and will become smarter and more confident with larger and larger data sets.


After a machine can recognize objects in a frame, it’s a whole new task to be able to track and identify a person and picking up various items, even multiple items at a time, in a busy store. In real-time the machine must understand movement from frame to frame.


As a second step of security and verification, the store is equipped with sensors. I’m hypothesizing here, but it’s likely some combination of scales and pressure sensors. Allowing the computer to detect when and from where an item is removed. If every item has a set weight, the scale should reduce by a known, predictable amount, making it possible to know if someone grabs two of the same item and have secondary confirmation on which items are selected.


The next layer of confidence is built on customer history. The more you shop at Amazon Go, the more informed the computer will be on your shopping habits and history. It will gain additional confidence that the items in your bag are correct because it can verify that you’ve had a similar transaction in the past. Pretty cool, right?

While there aren’t currently any Amazon Go’s in NY, I’m pretty excited to check it out the next time I visit Seattle, assuming it’s open to the public by then (or if I get hired by Amazon…)