My ML Development Workflow
This is a workflow that I have started using to play with Machine Learning models on Google’s Colab platform. The free resources (CPU, GPU & TPU) of Colab are handy for someone like me who are new to the ML space and have basic processing power needs which do not justify any investment in paid subscriptions or own hardware setup.
I found the below two detailed articles are very useful in understanding the approach in detail:
- https://medium.com/r/?url=https%3A%2F%2Ftowardsdatascience.com%2Fgoogle-drive-google-colab-github-dont-just-read-do-it-5554d5824228
- https://medium.com/r/?url=https%3A%2F%2Ftowardsdatascience.com%2Fcolaboratory-drive-github-the-workflow-made-simpler-bde89fba8a39
However, I don’t see the need of pulling in the repository / codebase into the colab runtime as described. Google drive offers a convenient persistent storage and can be access from Colab notebook.
Below are the steps I followed to make this work for me:
Setup
Step 1: Open Google Colab and create an empty notebook
Step 2: Save the notebook in Google drive using ‘Save a copy in Drive’ option from ‘File’ menu
Step 3: Use the below code to connect Google and access it from the notebook
def setupGdrive (location):
from google.colab import drive
from os.path import join
ROOT = '/content/drive' # default for the drive
PROJ = 'My Drive/' + location
drive.mount(ROOT) # we mount the drive at /content/drive
PROJECT_PATH = join(ROOT, PROJ)
#%cd "{PROJECT_PATH}" # uncomment to change the directory
return PROJECT_PATHPROJECT_PATH = setupGdrive('<folder of your choice>')
Step 4: Once the above is executed, the saved file (in step 2) can be seen at /content/drive/My Drive/Colab Notebooks/
Regular Use
Step 1: Open Google Drive from browser
Step 2: Locate the notebook and right click to select Open with Google Colaboratory
Step 3: Run the code provided in Step 3 of setup and you are ready to use the google drive is mounted and ready for use just like any other folder
Few Tips
- Location of notebooks: I prefer to work out of custom location than the default ‘Colab Notebooks’ folder where the files are saved when using ‘Save a copy in Drive’. So I generally move those file manually to another folder through Google Drive UI and reopen them.
- Persistent Data Storage: For all persistent storage needs, I use google drive. Most often I create a data folder in the same location the notebook (as I always create a separate subfolders for each project). I uncomment the %cd line in the code shared above (step 3 of Setup) as that would change my directory and I would then be able to refer to all the components (other notebooks / python files etc.) along with the data using relative paths.
- Temporary Data Storage: Some time the data is temporary in nature and I don’t want that to saved in my Google drive clogging the space I got for free from Google. As most often this is readily available data, I download it while executing my code. I save this in ‘/content/’ absolute path and this would be cleaned along with the runtime.
- GitHub access: This is one of the primary reason why I move my notebooks to seperate folder. I generally sync the notebooks up to GitHub through a local repository that I maintain in the Google Drive. Below are few snippets of code I find useful while dealing with GitHub from Colab runtime
General params for GitHub
GIT_USERNAME = "<git user name>"
GIT_TOKEN = "<your git token>"
GIT_REPOSITORY = "<git repository>"GIT_PATH = "https://"+GIT_TOKEN+"@github.com/"+GIT_USERNAME+"/"+GIT_REPOSITORY+".git"
New Repo Initialization
! git clone $GIT_PATH
Push changes to remote (GitHub)
Commit_Comments = '<your version comments>'!git status
!git add .
!git commit -m "$Commit_Comments"
!git push
!git log