How to run Deep Learning models on Google Cloud Platform in 6 steps?

Published in

Google Cloud - Community

9 min readDec 12, 2018

Using Deep Learning Virtual Machine

Google Cloud Platform is a tool provided by Google which one can leverage to build large scale solutions. This platform has recently gained a lot of popularity because of their easy access to GPUs. Additionally, they also give you $300 worth of credits for free with a year of validity which depending on the kind of processing you need to do can last up to a year.

However, some of the instances can get really challenging since there isn’t a proper GUI or many packages won’t get installed. In this blog I am going to talk of an easy way to deploy a marketplace solution for running Deep Learning model. Moreover, this would also create a Jupyter Notebook GUI that can be used to view quick results.

Step 1 : Set up a Google Cloud Account

The first thing you need to go is setup a google cloud account.

Go to https://cloud.google.com/ and sign in using your Gmail account. If you have a school or organization account it may lead to some collaboration issues in the future so I would strongly recommend to create an account using your personal Gmail account. If you don’t have a Gmail account, now maybe a good time to create one.

While signing in, Google will ask you to share your credit card details. You can put them in but your credit card wont be charged unless you have run out of all your $300 worth of credits, so you need not worry.

Step 2: Create a project

Once you have signed in, it will take you to a console screen which should look something like this.

If you don’t get an automatic assignment of a project, you should go ahead and create one. Click on the “Create New Project” icon and the banner and create a new project. The project ID is assigned automatically, however you can change it if you like. I decided to stay with the default project ID assigned to me.

Step 3: Deploy Deep Learning Virtual Machine

Now that you have an account and project, you can deploy a marketplace solution.

Your billing will only start once you deploy the solution.

To set up a deep learning marketplace solution, search for “Deep Learning VM” in the search bar. This should take you to the landing page for “Deep Learning VM”.

The advantage of using Deep Learning VM is that we don’t have to install python or tensorflow since it is a part of a pre-packaged image developed by Google. Once you’re on this page, you just hit the “Launch Compute Engine” button. This page also shows the number of past deployments you have for this engine. It is 3 in my case.

Once you launch the compute engine, you will be taken to the configuration page, where you can set a name for the environment, select the zone for the machines and select the number of CPUs and GPUs you would want.

It is important to note the Zone you select for your deployment since the machine configuration you choose will depend on it. For example, there maybe restrictions in some zones on the number of CPUs and GPUs you can access.

Depending on the kind of machine that you choose, you can see the billing amount on the right hand side change accordingly.

For example, if you choose 16 CPUs and 0 GPUs, you can see that you will be charged $392.36 per month if you use 730 hours per month. If you hit “Details” it will give you the break up for the billing. Generally, GPUs are more expensive than CPUs, so if you don’t need GPUs, it is better to skip them altogether. You also need to request for GPU quota in the zone of your deployment (which I will talk about in detail in Step 6).

For now, choose Zone: us-west1-b, 16 CPUs and GPUs as “None”. The next thing to choose is the size of your hard-drive. “Standard Persistent Disk” should be good for any project, but if you want you can expand the memory if you have a lot of data. Keep in mind larger disk sizes will lead to bigger bills, so best to be parsimonious about the requirements. They can always be modified later on if required (covered in Step 5).

Once you have selected the config hit “Deploy”. Based on your selection, it may take 5 to 10 mins for the deployment to set up. If you get an error after deploying, check to see if you selected GPUs by mistake. If you select GPUs without having an assigned quota, it may lead to an error. Just create another deployment without any GPUs and you should be good to go.

Now there are 3 different ways of running code on this VM. The easiest is using a GUI of Jupyter Notebook which runs on localhost:8080 on your machine. To access this, you need to install Google SDK to SSH this VM.

You can install Google SDK here. Initialize Google SDK and connect to your google account and project that you created. Initialization options should show up automatically after installation, if it doesn’t you can run the command : gcloud init and make sure that you connect with the same email and project ID as before.

Once you have Google SDK installed and configured, you just copy the SSH link that shows up on your deployment page and paste it on to the Google SDK. The SSH link will be under the header “ Create an SSH connection to your machine” (see image below)

If you have successfully created a SSH connection, a PuTTY screen will pop-up (image below)

Step 4: Access Jupyter Notebook GUI

Once you have your SSH setup, you are just once click away from your Jupyter GUI. Go back to the deployment manager and hit the localhost:8080 button

Voilà!! That will take you to the Jupyter Notebook instance that is deployed on 16 CPUs. You can use this like any machine.

Additionally you can also run python batch jobs on the PuTTY terminal or by hitting the SSH terminal in the Compute Engine VM. More on that in the next step.

Step 5: Add GPUs to Virtual Machine

Before we add GPUs we need to request GPU quota in the same zone our instance is deployed on.

Just search for “Quotas” in the search bar and that should take you to the Quotas page under “IAM & admin”.

Here under “Metrics” first, select “None” and then search for GPU. Based on the GPU present in your zone you can select the name of the GPU and we also need to select “GPUs (all regions)”. For example, since our zone is us-west1-b, you can select the “NVIDIA P100 GPUs” and “GPUs (all regions)”

Once you select both the GPUs, you hit the “Edit Quotas” button on top.

Make sure that the GPU you select is in the same zone as your instance, or else your deployment will not be able to access it.

This will generate a form where you need to share your personal phone number and reason for this request. As soon as you submit this form, you will get an email from Google saying that your request is under process and it will take 2–3 business days.

Although their email says 2–3 business days, the request is approved within a couple of hours. Once your quota request is approved, you can edit your virtual machine to include more GPUs.

Step 6: Change Virtual Machine configuration

To add the requested GPUs, you need to edit the instance that is created on the Compute Engine page. Go to the menu on your google console and then hit “Compute Engine”.

The VM instances pages gives a list of the VMs that we have installed across various solutions on google cloud platform. An important thing to note here is that we should stop all instances if we do not want to be billed for the machines. Even if we don’t run any code, google charges us for the instances.

Hence it is important to stop all instances when we aren’t running anything.

Once you have stopped the instance, you can edit it. If your quota request is approved, you should be able to add more GPUs and deploy the solution again in no time.

Another neat trick is to “Enable connection to serial ports” and “Allow full access to cloud APIs” (under Access scopes) to enable your instance to talk to buckets and vice versa.

Once your config has been modified by adding another GPU, you can either run a deep learning model on the Jupyter Lab UI or the PuTTY terminal. You will notice that it will be much faster as we have added GPUs to our system. This also means our bill is higher so make sure to keep checking the “Billing” page to ensure that you don’t run out of credits.

Below is a short video on how I accessed the Jupyter Notebook GUI on the cloud to run models.

Additional hacks

Move data from bucket to VM

You can copy data from your bucket to the instance that you just created using “gsutil” feature of GCP.

You can either use the PuTTY terminal or the SSH on the Compute Engine to write this command.

More details on gsutil here.

Clear trash in jupyter notebook memory

Sometimes we end up storing large files or objects on the virtual jupyter notebook. This might lead to memory constraints. However just deleting them doesn’t free up memory. Instead we need to manually go and clear the trash from the jupyter location in the VM to free up memory.

To check the available memory run, you can use du -sh * in the PuTTY terminal. Once you navigate to the jupyter folder in PuTTY, you can force delete it to free up memory. In my case the command:

rm -rf /home/jupyter/.local/share/Trash

worked. By running du -sh* again, I could observe that the available memory in dev/sda1 environment went up after running this command.

Conclusion

I am sure there are many other ways to run Deep learning models on Google Cloud Platform without investing too much time in setting up an environment.

What has been your experience with GCP? Please share your comments and let me know if I can help solve your queries in any way.