Machine Learning, and more so Deep Learning, is so popular now that it is being referred as AI itself. Gladly, your startup just got funded or your new team budget was approved! Now you will be doing Deep Learning as well. You already had fun with Keras, Imagenet, etc. That is truly exciting! Now here are a few things to consider when getting started for real in your business. I’ll illustrate my suggestions with some anecdotes from my time working on self-driving cars at comma.ai with George Hotz early last year.
1) Don’t let data leave your engineers hanging
Deep Learning is a data first science. The whole reason for your team or startup to exist is to make sense of a dataset. Think about it, you can only ship that sweet AI Bitcoin Chatbot if you first make sense of text! You can only automate multimedia collage for your next Snapchat Stories clone if you understand images, video, and so on.
You should not consider data handling just a side aspect of your work. Don’t do a poor job at it. For example, if it takes you “just 15 minutes” to prepare and load a dataset, those are 15 minutes you have wait every single time you find a better model decision or spot a bug in you Tensorflow code.
The rules here are simple. Version your dataset and preprocess everything once and use it several times. Tools like Celery, Luigi and others are your friends. If you are in a large team where tasks need to be submitted to a cluster, consider a data server solution that feeds batches to the model training workers. Please, please! Do not let your team members have to wait for the whole dataset to be downloaded to a worker before they even know there is a bug in their model.
Fun Story. comma.ai probably has the second or third largest driving dataset out there. In the early days of comma.ai, to train the model that drives the car, several hours of video were loaded to a big beefy machine with >700 Gb of memory. Everytime he needed to train on more data, George would Prime Now another 100gb of RAM. I was hired to work on better versions of that model, but I didn’t want to have to wait 15 minutes to load up the data. Instead, I got pieces from open source projects for a simple ZMQ server. There was no data letting us hanging anymore, we could scale to larger training sets and cheaper machines. The model training was bound only by the GPU and its researcher.
2) Start with something you can visualize
Fun Story. In my exit interview I asked George for advice on how to be more productive as an engineer (trust me, he’s the most productive I’ve ever met, I took every single opportunity to learn the heck out of him). He suggested to always start by building things that would help me visualize what I’m doing. George had followed this advice himself previously. Also, all of George’s IPython Notebooks had sliding widgets to quickly visualize how parameters affect results while prototyping.
3) Define your validation/hard cases dataset early on
I put the fun visualization stuff as number 2 to give you a break after freaking you out about preparing data. But if you want to avoid being just a monkey on the typewriter, randomly adding more layers to your neural networks, you have to define how to measure progress. Ask what metrics correlate well with better deliverables and which data you should be tracking. This might go beyond the simple “random 10% of the data left out for validation”. A validation dataset optimally has the same statistical properties as production. That same production can be tracked for hard, edge and failure cases that will make future validation sets. Therefore, your validation set may be evolving and should version it just like your training set.
Fun Story. I learned that for the hard cases and validation set of a self-driving car you can consider those moments you had to take control back while driving. But there is no better validation test than sending an experienced controls engineer to the road to minutely judge the quality of your self-driving system. If you’re in that business, try poaching one from Tesla (I love you Elon, I’m just kidding! And props to Tesla).
4) Premature scaling is the main reason of death of early stage startups
“Don’t try to teach me that, I’m pretty sure I listen to more startup podcasts than you!” says you. True, but the new thing here is you should consider GPUs and training hardware as employees in that equation. Once you hire/buy more than you need you will spend a lot of energy finding use for your surplus. Managing clusters can be hard and large scale HPC for deep learning is a topic of research itself. My suggestion here is to always make sure all your GPUs are constantly in use before you buy another one. You can spend as big as Google once you are as productive and profitable as Google.
If your team and company are already big enough, be serious about who you hire to work on the infrastructure. If you hire 10x researchers and leave them hanging, in the best case they will build the infrastructure that is good enough for them, in the worst case they will simply quit! That is not what you want.
Fun Story. Once I left the office without getting all my GPUs busy, Niel (comma’s VP of Phone App) gave me such a disappointed look that I developed OFG (Overnight Free GPU) phobia. It is a common issue nowadays.
Yeah! Working on AI can be both challenging and really fun. Make sure you are a bit thoughtful about handling resources and visualizations and you should be fine. But let me know in the comments if I missed any of your main concerns. As a sane community, let’s make sure together that AI back it up all its hype!