I sometimes see people refer to neural networks as just “another tool in your machine learning toolbox”. They have some pros and cons, they work here or there, and sometimes you can use them to win Kaggle competitions. Unfortunately, this interpretation completely misses the forest for the trees. Neural networks are not just another classifier, they represent the beginning of a fundamental shift in how we write software. They are Software 2.0.

The “classical stack” of Software 1.0 is what we’re all familiar with — it is written in languages such as Python, C++, etc. It consists of explicit instructions to the computer written by a programmer. …


Image for post
Image for post

Update Oct 18, 2017: AlphaGo Zero was announced. This post refers to the previous version. 95% of it still applies.

I had a chance to talk to several people about the recent AlphaGo matches with Ke Jie and others. …


The accepted papers at ICML have been published. ICML is a top Machine Learning conference, and one of the most relevant to Deep Learning, although NIPS has a longer DL tradition and ICLR, being more focused, has a much higher DL density.

I thought it would be fun to compute some stats on institutions. Armed with Jupyter Notebook and regex, we look for all of the institution mentions, add up their counts and sort. Modulo a few annoyances:

  • I manually collapse e.g. “Google”, “Google Inc.”, “Google Brain”, “Google Research” into one category, or “Stanford” and “Stanford University”.
  • I only count up one unique mention of an institution on each paper, so if a paper has 20 people from a single institution this gets collapsed to a single mention. This way we get a better understanding of which institutions are involved on each paper in the conference. …


Have you looked at Google Trends? It’s pretty cool — you enter some keywords and see how Google Searches of that term vary through time. I thought — hey, I happen to have this arxiv-sanity database of 28,303 (arxiv) Machine Learning papers over the last 5 years, so why not do something similar and take a look at how Machine Learning research has evolved over the last 5 years? The results are fairly fun, so I thought I’d post.

(Edit: machine learning is a large area. …


I thought it would be fun to cross-reference the ICLR 2017 (a popular Deep Learning conference) decisions (which fall into 4 categories: oral, poster, workshop, reject) with the number of times each paper was added to someone’s library on arxiv-sanity. …


The first time I tried out Virtual Reality was a while ago — somewhere in the late 1990's. I was quite young so my memory is a bit hazy, but I remember a research-lab-like room full of hardware, wires, and in the middle a large chair with a big helmet that came down over your head. …


When we offered CS231n (Deep Learning class) at Stanford, we intentionally designed the programming assignments to include explicit calculations involved in backpropagation on the lowest level. The students had to implement the forward and the backward pass of each layer in raw numpy. Inevitably, some students complained on the class message boards:

“Why do we have to write the backward pass when frameworks in the real world, such as TensorFlow, compute them for you automatically?”

This is seemingly a perfectly sensible appeal - if you’re never going to write backward passes once the class is over, why practice writing them? Are we just torturing the students for our own amusement? Some easy answers could make arguments along the lines of “it’s worth knowing what’s under the hood as an intellectual curiosity”, or perhaps “you might want to improve on the core algorithm later”, but there is a much stronger and practical argument, which I wanted to devote a whole post…


The last few weeks we heard from several excellent guests, including Selina Tobaccowala from Survey Monkey, Patrick Collison from Stripe, Nirav Tolia from Nextdoor, Shishir Mehrotra from Google, and Elizabeth Holmes from Theranos. The topic of discussion was scaling beyond the tribe phase to the Village/City phases of a company.

My favorite among these was the session with Patrick (video), which I found to be rich with interesting points and mental models. In what follows I will try to do a brain dump of some of these ideas in my own words.

I found Patrick’s slight resentment of new organizational structure ideas refreshing and amusing. We hear a lot about disruption, thinking from first principles, being contrarian, etc. Patrick was trying to make the point that the mean organizational structure of a company that we’ve converged on (e.g. hierarchies etc.), and the way things are done is actually quite alright. That in fact we know a lot about how to organize groups of people towards a common goal, and that one must resist the temptation to be too clever with it. He also conceded that a lot of things have changed over the last several years (especially with respect to technology), but made the point that human psychology has in comparison stayed nearly constant, which means we can more confidently draw on past knowledge in these areas. …

About

Andrej Karpathy

I like to train deep neural nets on large datasets.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store