My ML Development Workflow

Raghuram Rayaprolu
3 min readOct 7, 2020

--

This is a workflow that I have started using to play with Machine Learning models on Google’s Colab platform. The free resources (CPU, GPU & TPU) of Colab are handy for someone like me who are new to the ML space and have basic processing power needs which do not justify any investment in paid subscriptions or own hardware setup.

I found the below two detailed articles are very useful in understanding the approach in detail:

  1. https://medium.com/r/?url=https%3A%2F%2Ftowardsdatascience.com%2Fgoogle-drive-google-colab-github-dont-just-read-do-it-5554d5824228
  2. https://medium.com/r/?url=https%3A%2F%2Ftowardsdatascience.com%2Fcolaboratory-drive-github-the-workflow-made-simpler-bde89fba8a39

However, I don’t see the need of pulling in the repository / codebase into the colab runtime as described. Google drive offers a convenient persistent storage and can be access from Colab notebook.

Below are the steps I followed to make this work for me:

Setup

Step 1: Open Google Colab and create an empty notebook

Step 2: Save the notebook in Google drive using ‘Save a copy in Drive’ option from ‘File’ menu

Step 3: Use the below code to connect Google and access it from the notebook

def setupGdrive (location):
from google.colab import drive
from os.path import join
ROOT = '/content/drive' # default for the drive
PROJ = 'My Drive/' + location
drive.mount(ROOT) # we mount the drive at /content/drive
PROJECT_PATH = join(ROOT, PROJ)
#%cd "{PROJECT_PATH}" # uncomment to change the directory
return PROJECT_PATH
PROJECT_PATH = setupGdrive('<folder of your choice>')

Step 4: Once the above is executed, the saved file (in step 2) can be seen at /content/drive/My Drive/Colab Notebooks/

Regular Use

Step 1: Open Google Drive from browser

Step 2: Locate the notebook and right click to select Open with Google Colaboratory

Step 3: Run the code provided in Step 3 of setup and you are ready to use the google drive is mounted and ready for use just like any other folder

Few Tips

  • Location of notebooks: I prefer to work out of custom location than the default ‘Colab Notebooks’ folder where the files are saved when using ‘Save a copy in Drive’. So I generally move those file manually to another folder through Google Drive UI and reopen them.
  • Persistent Data Storage: For all persistent storage needs, I use google drive. Most often I create a data folder in the same location the notebook (as I always create a separate subfolders for each project). I uncomment the %cd line in the code shared above (step 3 of Setup) as that would change my directory and I would then be able to refer to all the components (other notebooks / python files etc.) along with the data using relative paths.
  • Temporary Data Storage: Some time the data is temporary in nature and I don’t want that to saved in my Google drive clogging the space I got for free from Google. As most often this is readily available data, I download it while executing my code. I save this in ‘/content/’ absolute path and this would be cleaned along with the runtime.
  • GitHub access: This is one of the primary reason why I move my notebooks to seperate folder. I generally sync the notebooks up to GitHub through a local repository that I maintain in the Google Drive. Below are few snippets of code I find useful while dealing with GitHub from Colab runtime

General params for GitHub

GIT_USERNAME = "<git user name>" 
GIT_TOKEN = "<your git token>"
GIT_REPOSITORY = "<git repository>"
GIT_PATH = "https://"+GIT_TOKEN+"@github.com/"+GIT_USERNAME+"/"+GIT_REPOSITORY+".git"

New Repo Initialization

! git clone $GIT_PATH

Push changes to remote (GitHub)

Commit_Comments = '<your version comments>'!git status
!git add .
!git commit -m "$Commit_Comments"
!git push
!git log

--

--