Example 3B: Short Format — Tooling

Derek Snow
7 min readDec 17, 2019

--

Screen capture of a website associated with awesome-google-colab on GitHub

Paul Romer, Nobel Prize winner in Economics, 2018, Romer has tried to make his work more transparent and Jupyter notebook was a good fit for him

Introduction

Google Colab and Colab-like products like Azure Notebooks, Kaggle, Sagemaker, and DataPlatform Notebooks have brought data storing, processing, and sharing capabilities all into one online platform. I am happy with the description “Jupyter Notebooks in the cloud”. That is where it starts, but it is not where the project ends. Colab has recently added some experimental features like code auto-completion and code snippets. They are constantly developing new features based on their internal needs and community feedback. I foresee a future where tools like Colab become of equal importance to tools like Excel. Colab is first of all revolutionary for the type of compute power it provides. In the latest round, Colab offers users 24 GB of RAM (after 12 GB overload) with access to GPUs (NVIDIA new T4 GPUs) and the ability to connect to Google Drive for storage. As of now, these resources are completely free to anyone with a Google Account.

Examples

The use case for Colab + Python is so broad and includes quantitative academic and industry analysis, dashboards, and web-apps, see for example the use of Colab as a game, torrent tool, password cracker, scientific simulator, medical answer system, video restorer, image segmenter, faces-wapper, object detection system, API accessor, estimator, predictive maintenance tool, customer value estimator, forecaster, optimizer, classifier, and prediction tool.

Google Colab is as safe as your private Google Docs, no one can access your own private Colab notebooks. Colab as a project is a loss-leader to drive the adoption of Google’s GCP product and helps, along with TensorFlow, to further enhance Google’s reputation among data scientists and engineers. As a side note, I gladly adopt their other tools; as part of a younger generation, user experience far outweighs other cloud criteria, and on top of that, where GCP lacks functionality they are quick at catching up. I have recently looked for an avenue to share Colab notebooks and could not find any, so I created, awesome-google-collab and an unofficial sharing site google-colab, which pushes the latest notebooks shared by the community to the following channels:

Motivation

Apart from its use for reproducibility, Colab has quickly become one of my favorite IDEs to interact with. Like any new product, it has a few issues, but over the years most of these have been ironed out. The most important benefits that Colab has over Jupyter Notebooks are free compute and the ease of sharing and collaborating with other tinkerers. This is an important step towards the reproducibility of academic research and other quantitative processes. Colaboratory allows you to use and share “Jupyter notebooks” with others without having to download, install, or run anything on your own computer other than a browser.

I recently used Google Drive, which allows you to open Colab notebooks with a right-click ‘open with Colaboratory’ function, to create and share files related to asset management in machine learning. It’s a large project with hundreds of models that rely on hours of computing time, which only a few years ago would have required hundreds of dollars of resources. I see a future where books can be written in the Colab format. We have already seen users make use of Colab’s linked index functionality like the Python Data Science Handbook. I expect many more books to take this format within the next year.

Colab and similar products can help with the reproducibility of code and more importantly the code underlying academic results. These free resources not only make code transparency easier, from here forward, but it also makes unpublished Python code highly suspect. There are no limitations to sharing code and data anymore and no limitation in accessing this code, the data and the necessary processing power to analyze the results. At this point some might complain about the notebook format, this is also not an issue because Colab can run python code that can be edited by an online IDE, or code pushed to a drive that can be executed in Colab’s shell environment. If you want to be paid for resources I suggest you make use of AI Platform Notebooks a platform for serious enterprise users with the heavier requirements. For 99% of the users, Colab should do the trick and additional resources are not needed. I have included the following extract from GitHub which shows the repositories that actively make use of Colab, hopefully, this might inspire you to add Colab functionality to your GitHub projects.

awesome-google-colab

Course and Tutorial

· Python Data Science Notebook — Python Data Science Handbook: full text in Jupyter Notebooks

· ML and EDA — Functional, data science centric introduction to Python.

· Python Business Analytics — Python solutions to solve practical business problems.

· Deep Learning Examples — Try out deep learning models online on Google Colab

· Hvass-Labs — TensorFlow Tutorials with YouTube Videos

· MIT deep learning — Tutorials, assignments, and competitions for MIT Deep Learning related courses.

· NLP Tutorial — Natural Language Processing Tutorial for Deep Learning Researchers

· DeepSchool.io — Deep Learning tutorials in jupyter notebooks.

· Deep NLP Course — A deep NLP Course

· pyprobml — Python code for “Machine learning: a probabilistic perspective”

· MIT 6.S191 — Lab Materials for MIT 6.S191: Introduction to Deep Learning

· HSE NLP — Resources for “Natural Language Processing” Coursera course

· Real Word NLP — Example code for “Real-World Natural Language Processing”

· Notebooks — Machine learning notebooks in different subjects optimized to run in google collaboratory

Text

· BERT — TensorFlow code and pre-trained models for BERT

· XLNet — XLNet: Generalized Autoregressive Pretraining for Language Understanding

· DeepPavlov Tutorials — An open source library for deep learning end-to-end dialog systems and chatbots.

· TF NLP — Projects, Practice, NLP, TensorFlow 2, Google Colab

· SparkNLP — State of the Art Natural Language Processing

· Deep Text Recognition — Text recognition (optical character recognition) with deep learning methods.

· BERTScore — Automatic Evaluation Metric for Bert.

· Text Summurisation — Multiple implementations for abstractive text summurization

· GPT-2 Colab — Retrain gpt-2 in colab

Image

· DeepFaceLab — DeepFaceLab is a tool that utilizes machine learning to replace faces in videos.

· CycleGAN and PIX2PIX — Image-to-Image Translation in PyTorch

· DeOldify — A Deep Learning based project for colorizing and restoring old images (and video!)

· Detectron2 — Detectron2 is FAIR’s next-generation research platform for object detection and segmentation.

· EfficientNet — PyTorch — A PyTorch implementation of EfficientNet

· Faceswap GAN — A denoising autoencoder + adversarial losses and attention mechanisms for face swapping.

· Neural Style Transfer — Keras Implementation of Neural Style Transfer from the paper “A Neural Algorithm of Artistic Style”

· Compare GAN — Compare GAN code

· hmr — Project page for End-to-end Recovery of Human Shape and Pose

Voice

· Spleeter — Deezer source separation library including pretrained models.

· TTS — Deep learning for Text to Speech

Reinforcement Learning

· Dopamine — Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

· Sonnet — TensorFlow-based neural network library

· OpenSpiel — Collection of environments and algorithms for research in general reinforcement learning and search/planning in games.

· TF Agents — TF-Agents is a library for Reinforcement Learning in TensorFlow

· bsuite — Collection of carefully-designed experiments that investigate core capabilities of a reinforcement learning (RL) agent

· TF Generative Models — mplementations of a number of generative models in Tensorflow

· DQN to Rainbow — A step-by-step tutorial from DQN to Rainbow

Visualisation

· Altair — Declarative statistical visualization library for Python

· Altair Curriculum — A data visualization curriculum of interactive notebooks.

· bertviz — Tool for visualizing attention in the Transformer model

· TF Graphics — TensorFlow Graphics: Differentiable Graphics Layers for TensorFlow

· deepreplay — Generate visualizations as in my “Hyper-parameters in Action!”

Operational

· PySyft — A library for encrypted, privacy preserving machine learning

· Mindsdb — Framework to streamline use of neural networks

· Ranking — Learning to Rank in TensorFlow

· TensorNetwork — A library for easy and efficient manipulation of tensor networks.

· JAX — Composable transformations of Python+NumPy programs

· BentoML — A platform for serving and deploying machine learning models

Other

· Transfer learning NLP — code for the tutorial on Transfer Learning in NLP held at NAACL 2019

· BDL Benchmarks — Bayesian Deep Learning Benchmarks

Finance

· RLTrader — A cryptocurrency trading environment using deep reinforcement learning and OpenAI’s gym

· TF Quant Finance — High-performance TensorFlow library for quantitative finance.

· TensorTrade — An open source reinforcement learning framework for robust trading agents

Artistic

· Rapping NN — Rap song writing recurrent neural network trained on Kanye West’s entire discography

· dl4g — Deep Learning for Graphics

Medical

  • DocProduct — Medical Q&A with Deep Language Models

Operations

Thanks for reading, for any updates on new projects, follow FirmAI on LinkedIn and for further tips and tricks see www.google-colab.com.

--

--