Nodebooks: Sharing Data Between Node.js & Python

Connecting to a Cloudant database for analysis (part 2)

Glynn Bird
Sep 12, 2017 · 3 min read

In part one of this series we saw how we could:

  • Use Node.js code inside of Jupyter notebooks by adding pixiedust_node
  • Use print and display in JavaScript and Python code to output and visualise data
  • Add npm modules into our notebook and build functions using callbacks or promises

In this blog, we’re going share data between our Node.js and Python code and use notebooks to explore data in a Cloudant database.

Editor’s note: Sharing data between Node.js and Python cells is now even easier. For more, see this 2018 article:

Sharing data between Node.js & Python cells

Let’s say we have some data in a Node.js cell, in this case a time series with some waveform data in a variable called wave:

%%node
var wave = [];
for (var i = 0; i < 1000; i++) {
var x = 2*Math.PI * i/ 360;
var obj = {
x: x,
i: i,
sin: Math.sin(x),
cos: Math.cos(x),
tan: Math.tan(x)
};
wave.push(obj);
}

We’ve already seen how to call print(wave); to output the data as JSON and display(wave); to send the data to PixieDust's visualisation engine, but there is a third way: store(wave, 'w');.

The store JavaScript function takes two parameters:

  1. The JavaScript variable to use (wave)
  2. The name of the Python variable you want to send it to (w)

If we run this snippet in a cell:

%%node
store(wave, 'w');

Then in the next Python cell, we can access w, which is a Pandas DataFrame ready for analysis in Python. We can mix and match Node.js and Python code in the same notebook, sharing JavaScript data with your Python code!

# count of dataframe
print w.count()
# maximum value of the sin wave
print w['sin'].max()
# minimum value of the sin wave
print w['sin'].min()
# average value of the tan wave
print w['tan'].mean()

Accessing a Cloudant database from a notebook

To access data stored in a Cloudant database, we can use the cloudant-quickstart npm module:

npm.install('cloudant-quickstart')

With our Cloudant URL, we can start exploring the data in Node.js. First we make a connection to the remote Cloudant database:

%%node
// connect to Cloudant using cloudant-quickstart
var cloudantqs = require('cloudant-quickstart');
var cities = cloudantqs('https://reader.cloudant.com/cities');

Exploring the data using Node.js in a notebook

Now we have an object cities that we can use to access the database. If we know the IDs of documents, we can retrieve them singly:

%%node
cities.get('2636749').then(print);
// {"name": "Stowmarket", "country": "GB", "longitude": 0.99774, "latitude": 52.18893, "timezone": "Europe/London", "_id": "2636749", "population": 15394}

Or in bulk:

%%node
cities.get(['4562407','2636749','3530597']).then(print);
// [{"name": "York", "country": "US", "longitude": -76.72774, "latitude": 39.9626, "timezone": "America/New_York", "_id": "4562407", "population": 43718}, {"name": "Stowmarket", "country": "GB", "longitude": 0.99774, "latitude": 52.18893, "timezone": "Europe/London", "_id": "2636749", "population": 15394}, {"name": "Mexico City", "country": "MX", "longitude": -99.12766, "latitude": 19.42847, "timezone": "America/Mexico_City", "_id": "3530597", "population": 12294193}]

Instead of just calling print to output the JSON, we can bring PixieDust's display function to bear by passing it an array of data to visualize:

%%node
cities.get(['4562407','2636749','3530597']).then(display);
Image for post
Image for post
The cloudant-quickstart npm module querying a Cloudant database, from within the context of a Python Jupyter Notebook.

We can also query a subset of the data using the query function, passing it a Cloudant Query statement:

%%node
// fetch cities in UK above latitude 54 degrees north
cities.query({country:'GB', latitude: { "$gt": 54}}).then(display);

Aggregating data

The cloudant-quickstart library also allows aggregations (sum, count, stats) to be performed in the Cloudant database.

Let’s calculate the sum of the population field:

%%node
cities.sum('population').then(print);
// 2694222973

Or compute the sum of the population, grouped by the country field:

%%node
cities.sum('population','country').then(print)
// {"BD": 25149982, "BE": 7224564, "BF": 2381615, "BG": 4401796, "BA": 1790001 ....

We can even move the data to Python using the store function:

%%node
cities.all({limit:25000}).then(function(data) {
store(data, 'y');
});
// y stored

Then we can access y in a Python cell:

y['population'].sum()
# 2694222973

The call to store is useful for reasonable amounts of data, but as all of the data has to reside in memory, it is not suitable for very large data sets.

What’s next?

In the next part, we’ll look to bring further npm modules into our notebooks, building our own custom visualisations.

Center for Open Source Data and AI Technologies

Things we made with data at IBM’s Center for Open Source…

Thanks to Mike Broberg and Teri Chadbourne

Glynn Bird

Written by

Developer @ IBM. https://glynnbird.com

Center for Open Source Data and AI Technologies

Things we made with data at IBM’s Center for Open Source Data and AI Technologies.

Glynn Bird

Written by

Developer @ IBM. https://glynnbird.com

Center for Open Source Data and AI Technologies

Things we made with data at IBM’s Center for Open Source Data and AI Technologies.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store