As a beginner in AI, what library should I focus on: Keras, TensorFlow or PyTorch?

Marki
5 min readOct 13, 2020

--

Don’t forget to sign-up to AlphaSignal.ai to get a weekly round-up of the most compelling breakthroughs in Machine Learning. We use the most advanced algorithms to filter out noise and identify news and publications truly worth reading.

Answered by Sridhar Mahadevan, PhD Computer Science:

First, I want to say that I have the greatest of respect for both the amazing engineering talent at Google, and the superb AI group there, many of whom are close colleagues and friends (including my former PhD students).

So, it pains me to say this, but as someone who has used Tensorflow for over a year now, and recently moved on to use PyTorch, in my view, there is no contest. TF is obsolete! PT is the way forward. It is everything TF should have been, and is not, and I for one do not use TF anymore.

To understand why, this is a question that Google search (or Bing search, don’t want my friends at Microsoft to get annoyed!) itself can provide excellent answers to (:-). I suggest doing a search of “Tensorflow vs. PyTorch”. You will find in-depth analyses of static (TF) vs. dynamic graph representations (PT), and other useful discussions. But, ultimately, the choice of a programing language for deep learning comes down to personal taste, like one’s choice of a wristwatch.

Having worn dozens of wristwatches over the past 40+ years, I now wear an Apple Watch. Why? Because 1) it is highly accurate 2) Elegantly designed with customizable watch faces 3) Allows me to not only tell the time, but read email, answer the phone, check out the weather, see my text messages, ask Siri for information, use Google Maps, and a million other things. The iWatch renders all other watches obsolete, the same way that today’s smart phone renders all previous Blackberry type phones obsolete. However, I dare say you will find plenty of folks who swear by their old-fashioned analog watches, the ones where you have to keep twisting the hands to ensure the watch keeps the current local time if you travel (and I do a lot across time zones).

I use a hand-built GPU machine at home with 3 Nvidia Titan XP cards, and a 20 core i9 processor and 128 gig of RAM. In other words, a seriously powerful machine. When I run TF, it feels sluggish. It’s extremely slow off the starting gate, sometimes (I take that to be the time required to build the initial graph, or perhaps doing some other bookkeeping). It prints out incomprehensible, and mostly annoying startup messages (which to me feel like I’m staring into the deep depths of some machine code from the way one used to program computers 50+ years ago). Keras + TF is an improvement, but you’re still stuck with TF.

Contrast that with PT. It’s an elegantly designed language. It uses abstraction properly, and the code I’ve seen written in PT is beautiful. On my hand-built machine, PT is much much faster. It prints no annoying messages. It’s off to the races from the get go, and as the esteemed British Prime Minister May often says, it just “gets on with the job”. To me, there’s no contest. PT is the way forward.

All that being said, let me return to the original question. If you are a “beginner in AI”, why in the world are you learning any of these languages? This might seem a strange question, but if you are beginning to learn AI, then “learn AI”! Read Russell and Norvig’s superb text on AI. Understand the problem of machine learning, and the various formulations of ML. And so on.

There are over two dozen subfields of ML, and deep learning happens to be one subfield. AI is made up of over two dozen subfields. In other words, by focusing your energies learning TF, you are spending a lot of time on what is actually a small subset of AI. Even today, with all the hoopla over DL (and I get that you want to work on what everyone says is the “exciting” part of AI), it remains a fringe area within AI. This may shock some of the folks who work in DL, and think they are the center of the AI universe. DL tends to focus on a particular formulation of AI problems, usually where data is overwhelming, where structure is almost absent (i.e. data is just vectors of images, and everything is “vector’ese” and there is almost no discussion of stuff like relations, or the sort of deep structure that humans tend to think of the world as made up of).

It never surprises me to find that researchers who work in one area always think the world revolves around them. That’s true in any field. My dear father is close to 90, and works on deciphering an ancient language that was spoken around 2500 BC. When I visit him in India, and talk to him about his latest research — oh, yes, he’s still writing papers, and for over 30+ years, every weekend phone call to him usually begins with a discussion of some recent paper of his — you come away with an overwhelming sense for why this is the most important problem in the entire world for him. It’s not just that he’s cracking a fascinating mystery. The solution that he is proposing has deep ramifications that touch even modern India, and his work actually has been opposed by some politicians who happen to favor a different solution (you probably think that solving an ancient mystery like this would not have any political ramifications, but then, you don’t know India!).

So, do yourself a favor, and pull back, at least a little. Don’t spend every waking hour hacking TF or PT, no matter how tempting these may be. Learn AI! Understand the large number of subfields, and what people in these subfields do. Read the classic papers. Understand the “frame problem”, what is the “no-free-lunch “ theorem in ML, what “behavior-based robotics” is all about, what the major issues in natural language processing are, what “random projections” are, and so on. There is an ocean of work in AI that is not DL, and it is every bit as exciting as DL is. And then, yes, if you must, certainly learn TF or PT. :-)

--

--