Is Deep Learning “Software 2.0”?

Photo by Samuel Zeller on Unsplash

Andrej Karpathy wrote an article about what he calls “Software 2.0”. Karpathy (Director of AI at Tesla) makes the argument that Neural Networks (or Deep Learning) is a new kind of software. I do agree that there indeed exists a trend towards “teachable machines” as opposed to the more conventional programmable machines, however I do have an issue with some of the benefits that Karpathy mentions to back-up his thesis.

Certainly Deep Learning is already eating the Machine Learning world with advances across the board. Karpathy mentions several well known ones: visual recognition, speech recognition, speech synthesis, machine translation, robotics and games. He frames his argument about the sea change in computing and that perhaps it is time to think about a new kind of software (I guess the kind that you teach like a dog instead of programming).

Kaparthy lists the benefits of “software 2.0” (ignoring btw the obvious benefit that teaching may be easier than programming). He writes about architectural features of these systems that appear to be a step above current conventional software. I recommend you read his post, and come back here to see my commentary.

Each of his bullet points would have been better phrased as a question rather than a statement. That’s because the verdict is still out whether any of his points are even true. Let’s me go down a partial list:

Computational homogenous — This is interesting, but not valid considering that digital systems are essentially also computationally homogeneous if we look from the perspective of universal gates (i.e. NAND and NOR). In fact newer Deep Learning silicon is not homogenous and uses specialized cores. There are some silicon that optimize for 3x3 convolutions as an example. It may also seem homogenous today, but that’s because it is still early. However taking inspiration from nature, one should expect the usual diversification into specialized and modular parts. The brain as example isn’t homogenous, there are many parts and many different kinds of neurons.

Easy to bake into silicon — Not exactly true in that the major risk for ASIC designers is to commit to an architecture that could get obsoleted in a few months. It’s not that its not easy, the “baking in” is the real problem. The hard part is knowing you have the key components for Deep Learning. That’s not easy with this is a fast moving field. That’s why programmable GPUs may have its value over ASICs for a longer time than expected.

Constant running time — Not true for more complex networks that conditionally traverse different paths (see: Conditional Logic is the New Hotness). It indeed is true that the simple networks have bounded computation. However, it is entirely possible to have iterative components (see MCTS in AlphaGo) that may have a big variance in running time.

Constant memory use — Not true for networks that are dynamically constructed on the fly. See for example: https://arxiv.org/abs/1704.05526

Highly portable — Not true. Deep Learning is more portable and modular than classic ML, but it definitely is missing several features of Modular systems. I have analysis on this: https://medium.com/intuitionmachine/the-end-of-monolithic-deep-learning-86937c86bc1f . Kaparthy’s two bullet points also are related to the issue of modularity.

It is easy to pick up — This is typical for maturing technology. As software professionals we don’t need to understand the quantum physics that is the basis of semiconductors. We don’t even need to know how a Arithmetic Logic Unit (ALU) is constructed. We work at an abstraction level that makes sense for the task at hand. A lot of Deep Learning is taught however with a lot of math, but that’s not going to be true in the future. A more kind of intuitive level of software development will have to arise that will be closer to how we do teaching today.

It is better than you — I agree with this sentiment. A lot of innovative discoveries are being found by brute force methods. Deep Learning may lead to the ‘last invention of man’.

Again I reiterate that I agree with Kaparthy that teachable machines are indeed “Software 2.0". What is clearly debatable is whether these new kinds of systems are different from other universal computing machinery. There are of course made of the same stuff, that is information processing. Specifically, at its core, computation, memory and storage. However, Deep Learning in addition to support universal computation, has the capability of being able to learn from induction. This capability implies that the modularity of the system will be different from a system that was programmed by hand. However, a different modularity ( how you organize your software ) does not imply that fundamental hardware characteristics disappear. Software 2.0 cannot transcend the laws of physics. The last statement is the flaw in Kaparthy’s arguments. In fact, when you begin to model Deep Learning in physics terms (and not pure mathematics) then you begin to realize the flaws of the machine learning orthodoxy.

What about the problems with Deep Learning with regards to software? There are a lot of things that you need to consider because Deep Learning systems are intrinsically ‘intuition machines’ and thus has behavior that is starkly different from Software 1.0. The way I address is through the Deep Learning Canvas. That is, it’s a checklist as to what you should think about when you develop these kinds of systems.

We are still in the early innings of Deep Learning development, there are still plenty of issues that need to be addressed to evolve this new kind of computing into something that has the same features as Software 1.0 but better across the board. It is an open question that many firms like Google and Uber are working on, but there’s still a lot of missing pieces.

Deep Learning will eventually become Software 2.0. To give credit to Andrej Karpathy, by coining this term “Software 2.0”, he’s given a name to a kind of software development that many have had implicitly in their heads. It’s added fuel to the Deep Learning hype machine, that will take a couple of years to become truly mature. In the meantime, this is still a very complex subject. If you are seeking more clarity on the many issues brought up here, then go seek out some wisdom and get my book:

Exploit Deep Learning: The Deep Learning AI Playbook