I sometimes see people refer to neural networks as just “another tool in your machine learning toolbox”. They have some pros and cons, they work here or there, and sometimes you can use them to win Kaggle competitions. Unfortunately, this interpretation completely misses the forest for the trees. Neural networks are not just another classifier, they represent the beginning of a fundamental shift in how we write software. They are Software 2.0.
The “classical stack” of Software 1.0 is what we’re all familiar with — it is written in languages such as Python, C++, etc. It consists of explicit instructions to the computer written by a programmer. …
Update Oct 18, 2017: AlphaGo Zero was announced. This post refers to the previous version. 95% of it still applies.
I had a chance to talk to several people about the recent AlphaGo matches with Ke Jie and others. In particular, most of the coverage was a mix of popular science + PR so the most common questions I’ve seen were along the lines of “to what extent is AlphaGo a breakthrough?”, “How do researchers in AI see its victories?” and “what implications do the wins have?”. …
The accepted papers at ICML have been published. ICML is a top Machine Learning conference, and one of the most relevant to Deep Learning, although NIPS has a longer DL tradition and ICLR, being more focused, has a much higher DL density.
I thought it would be fun to compute some stats on institutions. Armed with Jupyter Notebook and regex, we look for all of the institution mentions, add up their counts and sort. Modulo a few annoyances:
Have you looked at Google Trends? It’s pretty cool — you enter some keywords and see how Google Searches of that term vary through time. I thought — hey, I happen to have this arxiv-sanity database of 28,303 (arxiv) Machine Learning papers over the last 5 years, so why not do something similar and take a look at how Machine Learning research has evolved over the last 5 years? The results are fairly fun, so I thought I’d post.
(Edit: machine learning is a large area. …
I thought it would be fun to cross-reference the ICLR 2017 (a popular Deep Learning conference) decisions (which fall into 4 categories: oral, poster, workshop, reject) with the number of times each paper was added to someone’s library on arxiv-sanity. ICLR 2017 decision making involves a number of area chairs and reviewers that decide the fate of each paper over a period of few months, while arxiv-sanity involves one person working 2 hours once a month (me), and a number of people who use it to tame the flood of papers out there. It is a battle between top down and bottom up. …
The first time I tried out Virtual Reality was a while ago — somewhere in the late 1990's. I was quite young so my memory is a bit hazy, but I remember a research-lab-like room full of hardware, wires, and in the middle a large chair with a big helmet that came down over your head. I was put through some standard 3 minute demo where you look around, things move around you, they scare you by dropping you down, basics. The display was low-resolution, had strong ghosting artifacts, there was long response lag, and the whole thing was quite a terrible experience. I remember finally putting up the helmet and feeling simultaneously highly nauseated and… extremely excited. This was the future! …
When we offered CS231n (Deep Learning class) at Stanford, we intentionally designed the programming assignments to include explicit calculations involved in backpropagation on the lowest level. The students had to implement the forward and the backward pass of each layer in raw numpy. Inevitably, some students complained on the class message boards:
“Why do we have to write the backward pass when frameworks in the real world, such as TensorFlow, compute them for you automatically?”
This is seemingly a perfectly sensible appeal - if you’re never going to write backward passes once the class is over, why practice writing them? Are we just torturing the students for our own amusement? Some easy answers could make arguments along the lines of “it’s worth knowing what’s under the hood as an intellectual curiosity”, or perhaps “you might want to improve on the core algorithm later”, but there is a much stronger and practical argument, which I wanted to devote a whole post…
The last few weeks we heard from several excellent guests, including Selina Tobaccowala from Survey Monkey, Patrick Collison from Stripe, Nirav Tolia from Nextdoor, Shishir Mehrotra from Google, and Elizabeth Holmes from Theranos. The topic of discussion was scaling beyond the tribe phase to the Village/City phases of a company.
My favorite among these was the session with Patrick (video), which I found to be rich with interesting points and mental models. In what follows I will try to do a brain dump of some of these ideas in my own words.
I found Patrick’s slight resentment of new organizational structure ideas refreshing and amusing. We hear a lot about disruption, thinking from first principles, being contrarian, etc. Patrick was trying to make the point that the mean organizational structure of a company that we’ve converged on (e.g. hierarchies etc.), and the way things are done is actually quite alright. That in fact we know a lot about how to organize groups of people towards a common goal, and that one must resist the temptation to be too clever with it. He also conceded that a lot of things have changed over the last several years (especially with respect to technology), but made the point that human psychology has in comparison stayed nearly constant, which means we can more confidently draw on past knowledge in these areas. …
About