Why I made an open source model zoo

Eric Florenzano
3 min readApr 24, 2016

--

“The Matrix is the best movie ever created.” It’s a line I say to get a reaction, because that’s fun and it usually does, but it also happens to be my honest opinion. The truth of the matter is, since the time I first saw it, I’ve been fascinated by virtual reality and how it will affect our society.

In college circa 2007, I took an advanced computer vision class, which was both fascinating and sad. I was saddened to learn that, at that time, it seemed like mostly parlor tricks and hacks that barely worked — each algorithmic tool in the computer vision toolbelt was painstakingly hand-coded and tested until it mostly worked. Mostly.

But today I work in the nascent virtual reality industry, and through the lens of that, have noticed the wide and varied successful applications of computer vision for things like position tracking, geometry reconstruction, pose estimation, etc. How is all this possible using those parlor tricks we learned about in school?

Stanford recently open sourced their latest machine learning class, CS231n: Convolutional Neural Networks for Visual Recognition, and not only put all the videos and lecture notes online, but made the homework standalone in a way that you can go through it yourself without checking in with the classroom. I devoured this entire course. It’s absolutely fascinating. Please check it out.

If you’re reading an article with this title though, you probably know that something interesting happened in 2012, a renewed interest in convolutional neural networks (CNNs) began which has not yet ended and is reshaping the field of computer vision. I wanted to try my own hand at building a CNN, and chose to attempt stereo reconstruction from a monoscopic camera. In essence, to try to extract 3D from 2D.

I set up a simple Keras model and started it training. First it would just produce static. After a while the static started to take shape, starting to have a ghostly form. A day later and you could tell that shape was looking like the input photo. A second day of training later and it was still making progress. The next day when I woke up, my computer was shut down. There had been a power surge the night before, and the latest saved weights file was corrupt. Time to start over.

This time, I decided, let’s start with a pre-trained VGG or AlexNet and fine tune from there — that will surely improve training time and probably produce better results. There were lots of models out there, but the weights were on random dropbox accounts, all of which seemed to be over their capacity. I managed to find one weights file, but there were no instructions on how to initialize the model with these weights. I eventually found out the weights needed to be processed first.

This whole process is busted. There should be an easy way to periodically save weights and have them shuttled to the cloud, available for safe use later. Once they’re on the cloud, let’s make a standard way of loading them up, easy enough to bootstrap your own efforts. This should be simple stuff.

Now it is: introducing Gradientzoo, an open source model zoo with integrations so far for Keras, Lasagne, Tensorflow, and plain Python. This is definitely an MVP: all it does is provide a unified API, it saves and loads files, keeps several versions of them, and periodically cleans out old versions based on your project settings.

Everything in the project is fully open source, meaning you can run this model zoo on premise as a private model zoo, or you can run a public one on the open internet. But I hope you choose to support the one at gradientzoo.com, because I believe if we unify and are more open with these trained models. That if it’s easier to make a running start training new models composed of existing pre-trained ones, we’ll progress much faster and further as an industry.

Let me know what you think — I’ll read all responses to this post and any e-mail that comes in on thoughts@gradientzoo.com. I’m especially interested to hear what you think about pricing and how to make the service sustainable!

--

--

Eric Florenzano

http://Soundboxing.co Virtual Reality consulting ⁽ᴰᴹ⁾ Likes React.js/Native, Go, Python, Kubernetes. Ex: Mochi, YC, Twitter, Gyrosco.pe.