Visual Studio Code Jupyter Datalab

How to debug Datalab solutions in Visual Studio Code

Daz Wilkin
Sep 6, 2017 · 6 min read

Following up on yesterday’s post describing a way to aggregate log byte-counts using Stackdriver’s built-in logs-based metrics, I had another opportunity to enjoy Cloud Datalab. Spelunking the twists and turns of time-series is enhanced by this tool’s powerful rendering and visualization capabilities.

While I encourage Google Cloud Platform users to explore it, all (!?) the components are open-sourced and it’s possible to run Datalab locally and it’s possible to use Visual Studio Code to debug Datalab solutions. So much for the theory, it was more difficult to find this documented in one place… until now.

The real magic in this solution is the Jupyter Notebook. We’ll combine it with Google’s Python library “datalab”, Visual Studio Code (I assume there’s a similar solution if you’re not yet using this impressive editor) and Visual Studio Code plugins by Don Jayamanne for Jupyter and Python.

Foundations

The foundation is to run Jupyter locally including Google’s Python “datalab” library. Don’t let the naming overly confuse you. Cloud Datalab is a solution provided by Google. It’s integrated with Google’s Cloud SDK and Compute Engine and provides an easy way to spin up GCE VMs that run containers that include Jupyter w/ Google’s Python library “datalab”. No muss, no fuss!

However, if you want to run the equivalent solution locally without Visual Studio Code, or you want to use Visual Studio Code to debug code, you can install the components yourself.

You’ll need Python 3.x installed. I’m running Linux and have Python 3.5.3. If you haven’t previously, you’ll need to install python3-dev to run Jupyter:

I recommend you use virtualenv to keep this project and the libraries separated from other work, though this is optional:

The virtualenv directory name (venv3) is entirely your prerogative. Because I’m generally using Python 2.x, I’ve added “3” to this to remind me that it’s different.

Jupyter and Google’s datalab library are available from PyPi:

requirements.txt:

Then to install these:

Google’s instructions tell us to issue the following command to “enable datalab’s frontend in Juypter”. You need only do this one after installation. Unless you delete and recreate your virtualenv, you need not repeat it:

Alright…. that’s it for the foundation.

Now you should be able to:

test.py:

and:

No Sine? No working :-(

If you don’t see a window open with a sine curve, something has gone wrong. Please retrace your steps to determine what it was and try to correct it. You can always trash the virtualenv and start again.

But, there’s more… Jupyter includes a web server that provides a browser client with which you may interact with it. From your virtualenv, run:

and you should see output similar to the following:

All being well, Jupyter should also open a browser window for you:

Jupyter client

You’ll see “test.py” listed here. Opening it here isn’t what we wish to do. Rather we need to create a Jupyter notebook to host this Python code. Create a new Jupyter notebook:

Choose “Python 3”

Now you should have a blank notebook:

And then (!) you may copy the test.py code here:

And then click the “▶” (play)

I think this is a really powerful tool!

OK, if you followed yesterday’s post, you may have code that interacts with Stackdriver Monitoring, querying the byte-count logs-based metrics and summarizing the results as a (pandas) dataframe. That code may be run here:

Visual Studio Code

Let’s switch gears and build upon these foundations to now use Visual Studio Code. I assume you have Visual Studio Code installed. You will need to add Don Jaymanne’s extensions:

There are various ways to install these but, from within Visual Studio Code, you may “CTRL-P” and then:

I find it most effective to exit and restart Visual Studio Code. You should do this from within the shell that’s running your virtualenv, so:

Visual Studio Code

You may open the “test.py” Python file created earlier.

NB Whereas in the Jupyter client, we could not run Python files directly but had to create notebooks, with Visual Studio Code we must open the Python file directly. The Jupyter notebooks (extension “ipynb”) can also be opened in Visual Studio Code but they are rendered as raw JSON and otherwise not useful to us for this exercise.

NB that the plugin reflects Jupyter’s concept of “cells”. Jupyter cells represent distinct blocks of code that may be run independently. In order to demarcate cells in Visual Studio Code, use “#%%”. I recommend, if nowhere else, you place one of these markers at the top of your code. Visual Studio will interpret it and provide you with the ability to run the cell.

Using “Run cell” is the recommended way to run Jupyter solutions in Visual Studio Code:

Code annotated with “#%%”

I annotated “test.py” with multiple code blocks here. You’ll note that, above each “#%%”, Visual Studio code has inserted “Run cell” commands and, clicking one of these, will run the “cell” below it. Very neat!

After “Run cell”, you should see the sine curve plotted as before:

And, of course, you may open yesterday’s Stackdriver example here too, prefix the code with “#%%” and “Run cell” to run your code within Visual Studio Code.

Conclusion

The documentation is sparse but hopefully this post will help you use Datalab locally and with Visual Studio Code. The combination of these tools is very powerful and provides a more compelling way to interact with large amounts of data. The examples used here included a simple Matplotlib test example and a way to use Stackdriver’s log-to-metrics to summarize log aggregation for Google Cloud Platform projects. Have fun!