Jetbrain DataSpell — Using Jupyter Notebooks inside the IDE

Peter Allen
Analytics Vidhya
Published in
4 min readSep 13, 2021
DataSpell — a new IDE for Data folks

In my previous article I introduced the new IDE, DataSpell by Jetbrains. I covered the use of the IDE for direct connection to and querying of a Database without the need for a separate SQL editor. Now a few days later and having used it professionally I can say, I haven’t needed to open up my SQL editor once. The previous article can be read here:

The other feature I was excited to use was the inbuilt Jupyter Notebooks and connections. To get started in this is really easy, just click File > Add Jupyter Connection as shown in the below image, this will open a dialog which allows you to start a new connection, connect to an existing local connection or even connect to a Jupyter Server. One hint, make sure you have connected to the working directory you wish to use before starting a new local connection and it’ll automatically start the new connection in that directory.

Starting a new Jupyter Connection

The next step, add a Jupyter notebook, also in the File menu. This will open a new notebook, inside the IDE, no external console window or browser window required (the terminal is also available in the IDE if needed).

To test out the the functionality I wanted to do some basic typical usage and explore the dataset we connected to in the first article. This connection in the notebook was handled with the use of two packages from google which you’ll need to install using the following line of code (which you can do in the inbuilt terminal):

pip install — upgrade ‘google-cloud-bigquery[bqstorage,pandas]’

After this is installed you can set up your authentication to connect to the bigquery database (see how to create a project and download a file in the following article if you haven’t got one you want to use yet).

When you have your project and key file for this project ready to go, the following code will authenticate the connection and then run a query to check that it is working on this specific database. Note you will have to enter the path to your keyfile.

from google.cloud import bigquery
from google.oauth2 import service_account

key_path = "<ENTER PATH TO KEYFILE HERE>"

credentials = service_account\
.Credentials\
.from_service_account_file(key_path,
scopes=["https://www.googleapis.com/auth/cloud-platform"],)

bqclient = bigquery.Client(credentials=credentials, project=credentials.project_id,)

## Download query results:
query = """
SELECT
date, country_name, subregion1_name, subregion2_name,
new_confirmed, new_deceased,
new_recovered, new_tested, cumulative_confirmed,
cumulative_deceased, cumulative_recovered,cumulative_tested
FROM `bigquery-public-data.covid19_open_data.covid19_open_data`
WHERE country_name = 'Australia'
AND subregion1_name IN ('Victoria','New South Wales')
Limit 10;
"""

dataframe = (
bqclient.query(query)
.result()
.to_dataframe(
create_bqstorage_client=True,
)
)

dataframe

The dataframe is printed in the editor but there is a link to open in it’s own tab in the IDE as shown in the below screenshot.

Now that we have the connection we can use it to directly pull whatever query we want into dataframes in our notebook.

Next, I wanted to see how some basic visualisations looked in the IDE and to keep it simple I decided to plot the new confirmed cases by date in New South Wales and Victoria using the Seaborn plotting package using the following code (note the query used earlier provides this data, just remove the “limit 10”):

import matplotlib.pyplot as plt
import seaborn as sns

sns.set(rc={'figure.figsize':(14, 6)})
dataframe_ts = dataframe.copy()

fig1 = sns.lineplot(data=dataframe_ts, x='date', y='new_confirmed', hue='subregion1_name')
fig1.set(xlabel='Date',ylabel='New Confirmed Cases')
fig1.legend(title=None)

This output the following image in the editor, bit hard to read, but not bad for the default settings and it all worked seamlessly.

Conclusion:

So before I get carried away and do a full analysis on this dataset, I’ll end this article here with my thoughts on using Jupyter notebooks inside DataSpell with the following dot points:

  • Setting up a connection and notebook inside the IDE is easy and seamless
  • Functionality of the notebook is the same as using it in your browser
  • You get all of the DataSpell editing functionality out of the box including autofill, code suggestions, auto-imports
  • Installing packages in the built in terminal is very useful
  • Plots work just as expected from a notebook

--

--

Peter Allen
Analytics Vidhya

Data Analyst in Melbourne Australia. Ex-mechanical engineer who transitioned across due to the love of all things data. Beekeeper. DIY. Tinkerer.