Machine Learning: Google Colab- Why, When and How to Use it
Machine Learning (ML)is the latest trend in the field of Computer Science and anyone with any inclination towards Data Science can be seen trying to learn about ML and how to apply it to solve real-world problems or simply create a marketable product.
With the publishing of programming libraries like Keras, ScikitLearn, TensorFlow and PyTorch to name a few, the process of developing ML models has been simplified to a great extent. However, people still face challenges setting up coding environments that can handle such libraries and many people have computers that take very long to run machine learning operations on even moderately large datasets.
If you are facing these problems or just looking for a better environment for ML coding,fear not for Google has your back!
What is Google Colab? Why Should You use it?
In 2018, Google launched an amazing platform called ‘Google Colaboratory’ (commonly known as ‘Google Colab’ or just ‘Colab’). Colab is an online cloud based platform based on the Jupyter Notebook framework, designed mainly for use in ML and deep learning operations. There are many distinguishing features that set it apart from any other coding environment.
One of the main benefits of using Colab is that it has most of the common libraries that are needed for machine learning like TensorFlow, Keras, ScikitLearn, OpenCV, numpy, pandas, etc. pre-installed. Having all of these dependencies means that you can just open a notebook and start coding without having to set up anything at all. Any libraries that are not pre-installed can also be installed using standard terminal commands. While the syntax for executing terminal commands remains the same, you must add an exclamation mark (!) at the start of the command so that the compiler can identify it as a terminal command.
Another feature is that the Colab environment is independent of the computing power of your computer itself. Since it is a cloud based system, as long as you have internet connectivity, you can run even heavy machine learning operations from a relatively old computer that, ordinarily, wouldn’t be able to handle the load of executing those operations locally. Additionally Google also offers a GPU (Graphics Processing Unit) and a TPU (Tensor Processing Unit) for free. These hardware accelerators can enable you to run heavy machine learning operations on large datasets much faster than any local environment.
While Colab allows you to upload your files onto the runtime each time you use it, I find that uploading and re-uploading large datasets each time you restart your runtime can be frustrating. Colab also offers you data versatility, a simple alternative by allowing you to ‘mount’ Google Drive onto your notebook. This operation requires just two lines of code that Colab inserts for you with the click of a button and this enables you to read files that you have uploaded to Google Drive. This means that you don’t have to reupload your files after every runtime restart. You can simply upload them once and access them simply by mounting the Google Drive.
The figure alongside shows the buttons to Upload your files or mount your Google Drive. If you would like to see the code required to mount your Google Drive, that is given later in the article.
Seamless Collaboration and Access
Like the rest of Google’s online document editing platforms like Google docs, Google Slides, Google Sheets, etc., Colab too offers similar sharing options allowing you to seamlessly collaborate with others on joint coding projects. One thing to keep in mind is that when you share a notebook, other users cannot see the output and results from code that you execute. Also, if you upload some files from your computer to the notebook, other collaborators will not be able to see them so it is better to upload those files to Google Drive and then access them from there so everyone can see and use the files.
If you have spent some time coding using a local environment, you can just upload those files to colab and continue working there. Similarly, you can always download your colab notebooks in either the .py format or the .ipynb format. You can also save your notebooks to Google Drive or save them directly to GitHub. Colab is equipped with an autosave feature similar to what is seen in Google docs, slides, etc. but you can manually save notebooks as well.
Tips and Tricks to using Colab
When you first open Google Colab , you can immediately access a few pre-written tutorial notebooks to get started. These can be very useful if you have little or even no prior experience with ML. You can also open a new notebook (Python 3 supported). You can also access your recent notebooks. Alternatively, if you are going through some of the basic tutorials on the TensorFlow or PyTorch websites, links to Colab implementations of the same are often present on the website itself.
Once you have opened your notebook, you have to click on the ‘Connect’ button on the top right before you can execute any code but sometimes, this is done automatically. If you execute code before clicking this button, it automatically connects to a runtime and then executes it.
In the toolbar on the top left, if you Hover the mouse over ‘Runtime’ you can select the option ‘Change Runtime Type’ and then select which hardware accelerator to use (GPU or TPU) if you need more computational resources and speed than what is offered by the standard CPU runtime. The GPU is significantly faster than the default runtime (CPU) and the TPU is much faster than even the GPU so these accelerators allow you to work with large volumes of data at a much quicker cost, without you having to spend large sums of money buying a GPU or TPU.
Keep in mind that when you restart your runtime you lose all the saved variable states and you will have to re-execute all the code that you have executed so far.
In order to mount Google Drive, you can either write the following code:
from google.colab import drivedrive.mount('/content/gdrive')
Or you can click on the ‘Mount Drive’ button on the left. You will be shown a link that you will have to click on. You will have to give permission from your google account (this might require a sign-in) after which you will be presented with an activation key that you will have to paste in the field provided in Colab to verify that you have given permission to access Google Drive files.
You can also toggle between light mode and Dark mode and can adjust the settings in the Command Palette to change what your code looks like. You can also decide whether or not to display the line numbers next to the code.
If you want to document your code well, other than simply inserting comments, you can insert entire text fields into the notebook to properly document information, similar to how you can insert text fields on Jupyter Notebooks.
Google Colab is hands down the ultimate free tool for people looking to venture into Data Science. The various benefits of having autosaved code linked with your Google Drive coupled with their parting piece- the free GPU and TPU make Colab a user-friendly platform that people looking to get involved in the field of Data Science will find much utility in. For further info, you can visit Google’s FAQ page.
You can check out my other articles here:
Machine Learning- Decision Trees and Random Forest Classifiers
Classification is the ability to identify which group to which an object, animal, etc. may belong to given some…
Machine Learning — Nearest Neighbours Algorithm with Code Walkthrough
The ‘Nearest Neighbours’ algorithm is one of the easiest machine learning algorithms to understand mathematically…
Programming in the 21st Century: Which Languages Should You Learn?
An Insight into how beginners can choose which programming languages to learn based on their interests and what they…