From KITTI to KITT delivers training data for Level-5 automated vehicles

No one can have missed the rise to power of artificial intelligence in the last couple of years and in particular the huge strides made in machine learning. The impact of this is being felt across many different industries, but perhaps none more-so than the automotive sector. OK, this is a cliche, but I was a huge fan of Knight Rider in my younger days (some time ago now!). KITT (Knight Industries Two Thousand) was the stuff of science fiction, a cool Pontiac Firebird with its wonderfully British butler-like AI personality and self-driving capabilities (and of course the awesome strobing red light on the front). It seems now that we can see a path to all of us having access to a KITT, although less likely that we’ll own our own and more likely our mobility will be delivered as a service, scheduled intelligently, operated autonomously, freeing us to focus on other activities during our daily commutes or the journeys to that meeting across the city or to drop the kids to their play-dates.

Progress in deep learning in the past decade has been driven by the combined forces of availability of sufficiently powerful hardware and on-tap cloud computation, the availability of great frameworks like Tensorflow and Caffe and availability of lots and lots of training data. At the heart of autonomous driving systems are sensor sub-systems that can build a model of the environment immediately surrounding the vehicle. This model includes the terrain, road-markings, surface and weather conditions, locations of other vehicles, pedestrians and other road users. The data used to build this model comes from banks of different sensors, utilising laser scanning or LIDAR, stereo video capture, radar, with all this data being combined in a sensory soup and fed to the deep learning systems to identify the various entities around the vehicle. The task of identification is non-trivial to say the least, but with advances in convolutional neural networks (CNNs), deep learning systems can now be built that can do a very reliable job of figuring out that a clump of pixels is, in fact, a small dog running out onto the road, or a cluster of 3D points matches the shape of a pedestrian pushing a pram. In fact these CNNs are often better than a human observer at recognition.

That’s the good news. However, to be this good, the neural networks need to be trained with vast quantities of ground truth training data. A visual system needs to be able to recognise a pedestrian in different weather conditions, with different clothing, of varying ages, from any angle, moving in lots of different ways, from fully un-occluded to nearly totally hidden behind another part of the scene. It does this using the unfeasible power of deep learning networks to generalise but also building on a vast set of training examples sufficiently covering all the myriad circumstances that the vehicle might encounter on a journey. Today, most of this ground truth data is taken from the real-world, photos and movies captured from fleets of vehicles making many many journeys through the roads and streets of the world. These photos and videos are annotated by hand, using armies of human labellers, meticulously identifying the different items visible, so that these can be present to the machine learning systems en-masse during training. Assuming enough of these examples are presented, we expect that the systems will learn and can subsequently reliably identify these things the real-world as they are encountered.

On the road to Level 5 automation (where a vehicle is expected to be fully autonomous, requiring no intervention by a human driver, in any driving circumstance) a lot of training data is required. How much is an unknown question, but most of the manufacturers in the transport space are now actively gathering training data by deploying fleets of vehicles equipped with LIDAR and camera arrays and acquiring and storing petabytes of data recordings to be used in the training of future vision systems.

You might well imagine that requiring humans in the loop for this labelling task (to create the ground truth data) is a bottleneck, and you’d be right. This is why we recently announced an investment in a great team in Karlsruhe, Germany, called / @understanddotai. The team, led by founders Marc and Philip, has its roots in the Karlsruhe Institute of Technology (KIT). KIT is a leading centre of research in autonomous vehicles and created one of the original open source autonomous vehicle training data sets, the famous KITTI data-set. What Marc, Phillip and team recognised was that the task of labelling could be significantly optimised through automation and they set out to build a new pipeline approach to data labelling, using custom deep learning models and very cool proprietary vision algorithms, to reduce by orders of magnitude the number of human labellers required, and ultimately to fully automate many aspects of this pipeline. We believe that only through automation can we build the repositories of training data required to deliver the vision systems for level 5 self-driving vehicles. The team is already working with many of the world’s top manufacturers and with this investment, led by our good friends in LEA Partners in Karlsruhe, we hope to help the team in to enable my dream of one day having my very own KITT. No pressure, Marc and Philip!