#70: Karpathy on Tesla Autopilot at CVPR’21, Distill’s hiatus, and tattling on Flemish Scrollers using computer vision

Leon Overweel
Dynamically Typed
Published in
6 min readJul 18, 2021

Hey everyone, welcome to Dynamically Typed #70. For today’s edition I summarized Andrej Karpathy’s talk about updates to Tesla’s Autopilot system at CVPR ’21. Further in the productized AI section, I have links for a LEGO-detection app and the automated chip design methodology used to design Google’s latest generation of TPUs. For ML research, I wrote about Distill’s one-year hiatus and the open-sourcing of DeepMind’s AlphaFold model. Finally, for climate change AI I covered the launch of the CCAI Wiki, and for cool things I found a project that tattles on Flemish politicians for looking at their phone during debates. Happy Sunday!

Productized Artificial Intelligence 🔌

Karpathy on Tesla Autopilot at CVPR ‘21

Tesla’s head of AI Andrej Karpathy did a keynote at the CVPR 2021 Workshop on Autonomous Driving with updates on the company’s Autopilot self-driving system. Just like his talk last year at Scaled ML 2020, this was a great watch if you’re interested in productized AI. The talk kicks off with the value that “incremental autonomy” is already providing today, in the form of automatic emergency braking, traffic control warnings (“there’s a red light ahead!”), and pedal misapplication mitigation (PMM) — stopping the driver from flooring it when they meant to hit the brakes.

Examples of “incremental autonomy”

Karpathy then goes into details of the next generation of Autopilot: Tesla has “deleted” the radar sensor from recent new cars and is now relying on vision alone. “If our [human] neural network can determine depth and velocity, can synthetic neural nets do it too? Internally [at Tesla], our answer is an unequivocal yes.” This is backed by the fact that the new vision-only approach for Autopilot has a higher precision and recall than the previous sensor fusion approach.

Where does the Autopilot team get a large and diverse enough dataset to train a vision model like this? From the million-car fleet of course! There are now 221 manually-implemented triggers running on the Tesla fleet to detect scenarios that they may want to look at for training data. (Could “inactive traffic lights on the back of a moving truck” be the 222nd?) Once collected, these images are labeled offline with a combination of human annotators, the old radar sensors, and very large neural nets — which would be too slow to deploy in the cars, but are very useful in this offline setting.

The loop of the Tesla Data Engine is then: (1) deploy models in ghost mode; (2) observe their predictions; (3) fine-tune triggers for collecting new training data; (4) create new unit tests out of wrong predictions; (5) add similar examples to the dataset; (6) retrain; and repeat. At 1.5 petabytes, the final dataset for this first release of the new Autopilot system went through this shadow mode loop seven times. It contains six billion labeled objects across one million 10-second videos.

The neural network trained on this data has a ResNet-ish backbone for basic image processing, which branches into “heads,” then “trunks,” and then “terminal” detectors. This amortizes learning into different levels, and allows multiple engineers to first work on different heads in parallel and then sync up to retrain the backbone. I hadn’t heard of this structure for letting a large (50-ish person) team collaborate on one big neural network before — very cool.

And finally, on the deployment side, Tesla is now also vertically-integrated: they built their own FSD (“Full Self Driving”) Computer, with their own neural engine.

Karpathy wrapped by re-emphasizing auto-labeling: using a much heavier model than you could ever use in production to do (a first stab at) data labeling offline, to then be cleaned up a bit by a human, is very powerful. And his overall conclusion remained in line with Tesla’s overall stance on self-driving: no fleet, no go.

Quick productized AI links 🔌

  • 🧱 Brickit is an iOS app that uses computer vision to identify LEGO bricks in a big pile and then shows you a list of projects you can build with those bricks — with instructions! The most impressive part is that it can detect so many small objects with so many different classes in one photo. I’d guess it does this by tiling the image or sliding a window over the photo, and then running the smaller images through some custom model powered by Core ML and the iPhone’s neural engine; but I can’t find information much about how the app works exactly. Brickit is a great example of productized AI: its core functionality is enabled by a highly-complex machine learning, but it abstracts this away into a simple user interface.
  • 🖥 Google AI researchers Azalia Mirhoseini and Anna Goldie published a Nature paper on their AI-powered computer chip design methodology, which uses “an edge-based graph convolutional neural network architecture capable of learning rich and transferable representations of the chip.” Trained on a dataset of 10,000 chip floorplans, the method replaces “months of intense effort” for humans, and comes up with a more optimal end result. I covered this research when it first came out in April 2020, but the big news now is that it has been productionized: Mirhoseini and Goldie have used it to design the next generation of Google’s Tensor Processing Units (TPUs)!

Machine Learning Research 🎛

  • ⏸ Distill, my favorite machine learning journal, is going on hiatus. Maybe I jinxed this last month when I hoped that the founding of Anthropic, a new AI safety research company started by many of the people behind Distill, wouldn’t impact their work on the journal. Oops. Over the past five years, Distill’s innovations of being web-only — not forcing articles to fit into two-column static PDFs — and explicitly caring about publishing explainers and artifacts, have pushed AI explainability to a whole new level. I’ll miss this feed of highly-polished interactive articles a lot, but I also understand the editorial team’s decision here: they found that their mentorship, article template, community, and dedicated authors were more central to the excellent quality of work on Distill, than the fact that Distill is its own journal was. They think the future of Distill-style articles is self-publication, “either on one-off websites or on a hypothetical ‘Distill Arxiv.’” See the editorial team’s blog post for more of their thoughts on this, and some other considerations — volunteer burnout also played a role.
  • 🧬 AlphaFold, DeepMind’s protein folding neural network that represented a breakthrough in structural biology, is now open-source. The model’s paper, Highly accurate protein structure prediction with AlphaFold by Jumper et al. (2021), got published in Nature; the code is on GitHub at deepmind/alphafold. Lots of people in the community were asking for this.

Artificial Intelligence for the Climate Crisis 🌍

Cool Things ✨

The Flemish Scrollers
  • 👀 Belgian artist Dries Depoorter launched a project called The Flemish Scrollers that watches daily live streams of the Flemish parliament and uses computer vision to detect when Belgian politicians are looking at their phone instead of paying attention. Whenever this happens, @FlemishScroller tattles on Twitter by tweeting a video clip and tagging the distracted politicians. Pretty funny!

Thanks for reading! If you enjoyed this issue of Dynamically Typed, consider subscribing to get a new issue delivered straight to your inbox every second Sunday.

Originally published July 18th, 2021, at https://dynamicallytyped.com.

--

--

Leon Overweel
Dynamically Typed

Incoming deep learning engineer @PlumeraiHQ , writing http://dynamicallytyped.com | 🚲 🚣‍♂️ 🇳🇱 🏳️‍🌈