Visual Studio Code Jupyter Datalab
How to debug Datalab solutions in Visual Studio Code
Following up on yesterday’s post describing a way to aggregate log byte-counts using Stackdriver’s built-in logs-based metrics, I had another opportunity to enjoy Cloud Datalab. Spelunking the twists and turns of time-series is enhanced by this tool’s powerful rendering and visualization capabilities.
While I encourage Google Cloud Platform users to explore it, all (!?) the components are open-sourced and it’s possible to run Datalab locally and it’s possible to use Visual Studio Code to debug Datalab solutions. So much for the theory, it was more difficult to find this documented in one place… until now.
The real magic in this solution is the Jupyter Notebook. We’ll combine it with Google’s Python library “datalab”, Visual Studio Code (I assume there’s a similar solution if you’re not yet using this impressive editor) and Visual Studio Code plugins by Don Jayamanne for Jupyter and Python.
The foundation is to run Jupyter locally including Google’s Python “datalab” library. Don’t let the naming overly confuse you. Cloud Datalab is a solution provided by Google. It’s integrated with Google’s Cloud SDK and Compute Engine and provides an easy way to spin up GCE VMs that run containers that include Jupyter w/ Google’s Python library “datalab”. No muss, no fuss!
However, if you want to run the equivalent solution locally without Visual Studio Code, or you want to use Visual Studio Code to debug code, you can install the components yourself.
You’ll need Python 3.x installed. I’m running Linux and have Python 3.5.3. If you haven’t previously, you’ll need to install python3-dev to run Jupyter:
sudo apt install python3-dev
I recommend you use virtualenv to keep this project and the libraries separated from other work, though this is optional:
cd /your/preferred/working/directoryvirtualenv venv3
The virtualenv directory name (venv3) is entirely your prerogative. Because I’m generally using Python 2.x, I’ve added “3” to this to remind me that it’s different.
Jupyter and Google’s datalab library are available from PyPi:
Then to install these:
pip3 install --requirement requirements.txt
Google’s instructions tell us to issue the following command to “enable datalab’s frontend in Juypter”. You need only do this one after installation. Unless you delete and recreate your virtualenv, you need not repeat it:
jupyter nbextension install --py datalab.notebook --sys-prefix
Alright…. that’s it for the foundation.
Now you should be able to:
pip install matplotlib
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as npx = np.linspace(0, 20, 100)
If you don’t see a window open with a sine curve, something has gone wrong. Please retrace your steps to determine what it was and try to correct it. You can always trash the virtualenv and start again.
But, there’s more… Jupyter includes a web server that provides a browser client with which you may interact with it. From your virtualenv, run:
and you should see output similar to the following:
Serving notebooks from local directory: .../datalab
0 active kernels
The Jupyter Notebook is running at: http://localhost:8888/...
Use Control-C to stop this server and shut down all kernels
All being well, Jupyter should also open a browser window for you:
You’ll see “test.py” listed here. Opening it here isn’t what we wish to do. Rather we need to create a Jupyter notebook to host this Python code. Create a new Jupyter notebook:
Now you should have a blank notebook:
And then (!) you may copy the test.py code here:
And then click the “▶” (play)
I think this is a really powerful tool!
OK, if you followed yesterday’s post, you may have code that interacts with Stackdriver Monitoring, querying the byte-count logs-based metrics and summarizing the results as a (pandas) dataframe. That code may be run here:
Visual Studio Code
Let’s switch gears and build upon these foundations to now use Visual Studio Code. I assume you have Visual Studio Code installed. You will need to add Don Jaymanne’s extensions:
There are various ways to install these but, from within Visual Studio Code, you may “CTRL-P” and then:
ext install python
ext install jupyter
I find it most effective to exit and restart Visual Studio Code. You should do this from within the shell that’s running your virtualenv, so:
(venv3) you@host:~/datalab$ code --new-window .
You may open the “test.py” Python file created earlier.
NB Whereas in the Jupyter client, we could not run Python files directly but had to create notebooks, with Visual Studio Code we must open the Python file directly. The Jupyter notebooks (extension “ipynb”) can also be opened in Visual Studio Code but they are rendered as raw JSON and otherwise not useful to us for this exercise.
NB that the plugin reflects Jupyter’s concept of “cells”. Jupyter cells represent distinct blocks of code that may be run independently. In order to demarcate cells in Visual Studio Code, use “#%%”. I recommend, if nowhere else, you place one of these markers at the top of your code. Visual Studio will interpret it and provide you with the ability to run the cell.
Using “Run cell” is the recommended way to run Jupyter solutions in Visual Studio Code:
I annotated “test.py” with multiple code blocks here. You’ll note that, above each “#%%”, Visual Studio code has inserted “Run cell” commands and, clicking one of these, will run the “cell” below it. Very neat!
After “Run cell”, you should see the sine curve plotted as before:
And, of course, you may open yesterday’s Stackdriver example here too, prefix the code with “#%%” and “Run cell” to run your code within Visual Studio Code.
The documentation is sparse but hopefully this post will help you use Datalab locally and with Visual Studio Code. The combination of these tools is very powerful and provides a more compelling way to interact with large amounts of data. The examples used here included a simple Matplotlib test example and a way to use Stackdriver’s log-to-metrics to summarize log aggregation for Google Cloud Platform projects. Have fun!