Nodebooks: Sharing Data Between Node.js & Python

Connecting to a Cloudant database for analysis (part 2)

--

In part one of this series we saw how we could:

  • Use Node.js code inside of Jupyter notebooks by adding pixiedust_node
  • Use print and display in JavaScript and Python code to output and visualise data
  • Add npm modules into our notebook and build functions using callbacks or promises

In this blog, we’re going share data between our Node.js and Python code and use notebooks to explore data in a Cloudant database.

Editor’s note: Sharing data between Node.js and Python cells is now even easier. For more, see this 2018 article:

Sharing data between Node.js & Python cells

Let’s say we have some data in a Node.js cell, in this case a time series with some waveform data in a variable called wave:

%%node
var wave = [];
for (var i = 0; i < 1000; i++) {
var x = 2*Math.PI * i/ 360;
var obj = {
x: x,
i: i,
sin: Math.sin(x),
cos: Math.cos(x),
tan: Math.tan(x)
};
wave.push(obj);
}

We’ve already seen how to call print(wave); to output the data as JSON and display(wave); to send the data to PixieDust's visualisation engine, but there is a third way: store(wave, 'w');.

The store JavaScript function takes two parameters:

  1. The JavaScript variable to use (wave)
  2. The name of the Python variable you want to send it to (w)

If we run this snippet in a cell:

%%node
store(wave, 'w');

Then in the next Python cell, we can access w, which is a Pandas DataFrame ready for analysis in Python. We can mix and match Node.js and Python code in the same notebook, sharing JavaScript data with your Python code!

# count of dataframe
print w.count()
# maximum value of the sin wave
print w['sin'].max()
# minimum value of the sin wave
print w['sin'].min()
# average value of the tan wave
print w['tan'].mean()

Accessing a Cloudant database from a notebook

To access data stored in a Cloudant database, we can use the cloudant-quickstart npm module:

npm.install('cloudant-quickstart')

With our Cloudant URL, we can start exploring the data in Node.js. First we make a connection to the remote Cloudant database:

%%node
// connect to Cloudant using cloudant-quickstart
var cloudantqs = require('cloudant-quickstart');
var cities = cloudantqs('https://reader.cloudant.com/cities');

Exploring the data using Node.js in a notebook

Now we have an object cities that we can use to access the database. If we know the IDs of documents, we can retrieve them singly:

%%node
cities.get('2636749').then(print);
// {"name": "Stowmarket", "country": "GB", "longitude": 0.99774, "latitude": 52.18893, "timezone": "Europe/London", "_id": "2636749", "population": 15394}

Or in bulk:

%%node
cities.get(['4562407','2636749','3530597']).then(print);
// [{"name": "York", "country": "US", "longitude": -76.72774, "latitude": 39.9626, "timezone": "America/New_York", "_id": "4562407", "population": 43718}, {"name": "Stowmarket", "country": "GB", "longitude": 0.99774, "latitude": 52.18893, "timezone": "Europe/London", "_id": "2636749", "population": 15394}, {"name": "Mexico City", "country": "MX", "longitude": -99.12766, "latitude": 19.42847, "timezone": "America/Mexico_City", "_id": "3530597", "population": 12294193}]

Instead of just calling print to output the JSON, we can bring PixieDust's display function to bear by passing it an array of data to visualize:

%%node
cities.get(['4562407','2636749','3530597']).then(display);
The cloudant-quickstart npm module querying a Cloudant database, from within the context of a Python Jupyter Notebook.

We can also query a subset of the data using the query function, passing it a Cloudant Query statement:

%%node
// fetch cities in UK above latitude 54 degrees north
cities.query({country:'GB', latitude: { "$gt": 54}}).then(display);

Aggregating data

The cloudant-quickstart library also allows aggregations (sum, count, stats) to be performed in the Cloudant database.

Let’s calculate the sum of the population field:

%%node
cities.sum('population').then(print);
// 2694222973

Or compute the sum of the population, grouped by the country field:

%%node
cities.sum('population','country').then(print)
// {"BD": 25149982, "BE": 7224564, "BF": 2381615, "BG": 4401796, "BA": 1790001 ....

We can even move the data to Python using the store function:

%%node
cities.all({limit:25000}).then(function(data) {
store(data, 'y');
});
// y stored

Then we can access y in a Python cell:

y['population'].sum()
# 2694222973

The call to store is useful for reasonable amounts of data, but as all of the data has to reside in memory, it is not suitable for very large data sets.

What’s next?

In the next part, we’ll look to bring further npm modules into our notebooks, building our own custom visualisations.

--

--