Google Colab

The First Few Steps

Paolo Perrotta
The Pragmatic Programmers

--

📚 Connect with us. Want to hear what’s new at The Pragmatic Bookshelf? Sign up for our newsletter. You’ll be the first to know about author speaking engagements, books in beta, new books in print, and promo codes that give you discounts of up to 40 percent.

Google Colab is a computational notebook — a system that mixes code, text, and data in the same document. In this article, you’ll get your first experience with Colab. You’ll also learn about some of the strengths and weaknesses of Colab, and computational notebooks in general.

It’s easier to understand computational notebooks if you see them in motion. Check out this quick hands-on tutorial. At the end of the tutorial, you’ll find a link back to this article. See you in a short while!

Tutorial: Colab in 20 Minutes.

Welcome back! So you had first-hand experience of a computational notebook. How was it? Maybe make a short mental list of things you liked and disliked about it. We’ll talk about that topic in a minute.

But first, let’s see where the idea of computational notebooks comes from.

Computational Notebooks

In the 1980s, Donald Knuth proposed a new way of managing computer programs: instead of having code on one side and documentation on the other, he proposed putting both in the same document. Such a document would be the best of both worlds: it could be read by a human, but also executed by a computer. Knuth called that concept literate programming.

Literate programming didn’t exactly take the world by storm, but it did get some traction in the field of scientific computing. As you’ve seen in the tutorial, this idea fits data scientists (and other kinds of scientists) like a glove.

Fast forward to the early 2000s. In those years, a system called IPython bought Knuth’s idea to a new generation of scientists and researchers. In time, IPython evolved into a project called Jupyter Notebooks, which became the de facto standard for machine learning experiments. Big software vendors started building their own notebook systems based on Jupyter. One of them was Google Colab.

Colab shares a few strengths and weaknesses with other computational notebooks. Let’s talk about those, beginning with the strengths.

Computational Notebooks: the Good

In the tutorial, you learned first-hand why computational notebooks are popular among data scientists. If you do data science and machine learning, then you probably want to write and run code — but you also want to examine data, maybe draw charts, and jot down notes. Those notes might include text, tables, and a fair share of math. With a computational notebook, you can put all those elements in the same document.

Computational notebooks also tend to be friendly and easy to use. For example, if you work in machine learning, you probably write a lot of code — but writing code is not the purpose of your job. You might not want to learn the complex, intimidating tools used by professional programmers. With a computational notebook, you don’t need to bother with large development environments and esoteric utilities: you write a bit of Python, you click a button, and boom! You’re done. If you need to write short snippets of code, a notebook gets the job done with minimal fuss.

As good as they are, however, notebooks have their own share of weaknesses. Let’s talk about those.

Computational Notebooks: the Bad

The main downside of notebooks is the same as many other friendly tools: they make it easy to do simple things, but they can also make it hard to do complicated things.

To be clear, I don’t mean that you cannot write complex algorithms in a notebook. In the tutorial, you used a neural network to detect human faces. That’s not a simple algorithm for sure. However, the structure of the code was simple: most of the time, we just did one thing after the other.

Computer systems don’t always follow a straight path from A to B, like the cells in a notebook. They tend to be tangled networks of files that include code, tests, and data. To write a complex system you might need to manage revisions, share code with other people, debug, write automated tests, and put code on production machines. All those things are hard to do in a computational notebook. Even in our simple tutorial, things quickly became awkward when we started managing errors and running cells out of order.

I told you that computational notebooks aren’t the best way to manage code. To be fair, they aren’t the best way to manage text either. You probably don’t want to put loads of text in a notebook, no matter how carefully you format it. That’s why this article is half Medium post, half Colab notebook: when I tried to put all this information in a notebook, it became exhausting to read.

We went through some of the strengths and weaknesses of computational notebooks in general. Now let’s look specifically at Google Colab.

Jack of All Trades, Master of Some

I made an argument that computational notebooks shine when you put code and text together — but they’re not the ideal tool to write either code or text. Does Google Colab fix those shortcomings?

In short: not quite, but it does take a big step in that direction — in particular when it comes to writing and sharing code. For example, Colab comes with features like autocompletion and code inspection. You can hover your mouse over a piece of code in Colab to see its documentation:

Colab also includes a simple debugger and an interactive command-line interpreter:

Finally, my favorite feature: Colab has a simple versioning system that allows you to revert to older revisions of the notebook, give names to versions, and even check the differences between different versions:

All in all, Google Colab is still a computational notebook, and it comes with many of the strengths and weaknesses of other notebooks. However, it does make many of the weaknesses less painful. Computational notebooks have a lot of catching up to do if they want to be as convenient as the tools of professional developers — but Colab is certainly a step in the right direction.

Oh, and don’t forget those sweet GPU-backed runtimes! To be honest, that’s probably the killer feature of Colab for most of us. Thanks to those free GPUs, Colab isn’t a good choice just for professional data scientists: it’s also a no-brainer for students, hobbyists, and AI artists. As I write this article, in late 2021, the Internet seems to be bursting with great Colab notebooks that you can copy, tweak, and play with.

If you’re curious, here is a great list of notebooks that you can start from. Have fun!

Check out Programming Machine Learning, my zero-to-hero introduction to machine learning, from the basics to deep learning. Go here for the ebook, here for the paper book, and come to the forum if you have questions or comments!

--

--