Training Turi Create models in Google Colab
Training an image classification model using an Nvidia Tesla T4 GPU on Google Colab.
In the late months of 2017, Google went public with one of its internal AI-development tools: Colaboratory. Combining the power of FREE cloud-based compute & shareable Jupyter notebooks, Colab strikes a resonant chord in the machine learning community because of how easy they’ve made it for literally anyone with an internet browser to play. When I first learned of the tool, my jaw dropped to the ground.
Pairing that with the groundbreaking work that Apple has done to make machine learning powered iOS apps a reality through their CoreML and Turi Create frameworks; our (developers, “data curious” engineers, data scientists) community has a high ceiling in the coming years.
Turi Create + Google Colab
Just because both of these tech giants have spearheaded the way for developers to get started with building machine learning models, doesn’t mean that everything is a walk in the park. Getting Turi Create to train a model, with a GPU, on Colab was no small feat. In the rest of this post I will share how I was able to conquer the beast and train an Image Classification model on a Tesla T4 GPU. At the end, I link you directly to the code so you can play around for yourself.
The Challenges
- Under the hood, Turi Create leverages both CUDA (Nvidia’s GPU API) and mxnet-cuXX (Apache’s deep learning framework) to train models on GPUs.
- Colab comes with CUDA 10 pre-installed at time of this writing.
- Turi Create recommends using CUDA 8 and mxnet-cu80==1.1.0 for its GPU workloads. Check out this resource from their GitHub.
You could try to uninstall CUDA 10, reinstall all the right drivers and toolkits for CUDA 8, and then see if you can get things running.
That’s an option for sure! Google gives you some pretty serious control over the instance you are on. Personally, I went down that path and spent the better part of 2–3 days researching the right debian packages and libraries I needed to replace.
Here’s a useful starting point for going down that route: https://medium.com/@nickzamosenchuk/training-the-model-for-ios-coreml-in-google-colab-60-times-faster-6b3d1669fc46
However, since he’d written that article, Google had already changed the runtime environment significantly. I was not able to get Turi Create to properly use the GPU with CUDA 8 and mxnet-cu80. Each time I got to the model training portion, my runtime would crash due to “exploding” RAM. This seemed strange because the data I was working with was fairly small, on the scale of ~500Mb. It was hard to diagnose because Google doesn’t expose how jobs are scheduled on their GPU nodes, nor how GPU memory is shared/allocated.
A Solution
Contrary to documentation, I was able to train a Turi Create model on a Colab Python 3 + GPU runtime by using the default CUDA 10:
# Check the CUDA version
!nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c)
2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
and installing the corresponding mxnet-cu100 library with pip:
!pip install turicreate==5.4
# The wrong version of MXNET will be installed
!pip uninstall -y mxnet
# Install CUDA10-compatible version of mxnet
!pip install mxnet-cu100==1.4.0.post0
When you begin the data preparation and model training portion, you will need to import the turicreate library and set the number of GPUs in the config:
import turicreate as tc# Tell turicreate to use ALL available GPUs
tc.config.set_num_gpus(-1)
Slightly surprising, yet also encouraging, this solution paved my way to FREE GPU-accelerated model training in the cloud. Here is a link to the image classification model training code, hooked right up to Colab. Additionally, here is a repo of other example models that you could train on Colab.
Skafos.ai for Delivery
You now have a CoreML artifact trained with Turi Create in Colab. If integrated properly, this image classification model can run on an iOS device. Skafos.ai is an excellent solution for managing model versions and delivering updates over-the-air without re-submitting apps to the app store — check it out here.