Insights from designing Datahub-ui in D3.js
In a previous post, I talked about D3 modularity.
D3 is now modular
“D3 is now modular”. That’s how the D3.js v4.0.0 release note starts. What does it mean? Why should I care?
Now I want to share some modular patterns that I’ve been using for a while in my datavis engineer job at Planet OS. On top of our environmental data platform called Datahub, we build nice dashboards, mostly for clean energy companies like wind farms, that we call Powerboard.
I already wrote about a pipeline pattern that I was using for Cirrus.js.
Why I Built Cirrus.js - Planet OS
I'm a D3.js fan. I've had the opportunity to use it to build multiple charts libraries for data-oriented companies in…
I continued using this pattern for Datahub-ui, the internal UI toolkit we are developing to quickly build Powerboards. Here are some lessons I learned, after almost 2 years using this pattern.
Patterns vs Abstractions
I like this pipeline pattern because it’s just a way to structure code. The same way I like D3 because it encourages the use of declarative and functional patterns to describe how data should be mapped to graphics. I prefer to focus on patterns like MV* architecture and unidirectional data flow than on abstractions like digest cycles and Virtual DOM.
I like to be able to encapsulate functional components at a very low level. A component can be as simple as configuring a D3 scale:
These tiny modules are piped together using a simple pipeline function. Each module receives a clone of the global object config and can add items to it to pass it down the pipe.
It All Starts with Data
One of the main tool we provide at Planet OS is called Datahub. It’s basically a consistent interface to access Earth science data. Most of these datasets are already available online, but are usually pretty hard to access and manipulate. Having a consistent interface really shines when you need to search and compare datasets, merge data from multiple sources and use common tools to directly use clean data in a simple format.
Embedding maps with multiple layers of raster and point data, streaming multidimensional timeseries, computing alerts by comparing finance data with weather forecast, that’s the kind of data merging we do everyday at Planet OS. Modularity at the dataset level is an important subject that I don’t see discussed so often in the D3 community.
In my experience, the data straight from an API is rarely in the best format for the way the charts may want to receive it. Sometimes a chart has to aggregate some values, split them into layers, group into a hierarchy, just in the process of mapping to graphical elements. When I design charts, I try to choose a data format that makes sense from the datavis pipeline standpoint. Then I write adapters to transform from data API format to chart input format. So the first modules in my pipeline are usually data adapters and validators.
With each charts library, I also write a data generator. It can be used as dummy data for developing the charts, but also for testing various cases, for smoke testing, etc. Datahub-ui has generateTimestamps and generateTimeSeries for example, that are made from lower-level data type generators.
The next module you see in this example pipeline is the template. One trick that I often do when I start a new chart is to implement it in plain html/svg/css first, without touching a single line of D3. Then I extract the base template from this prototype so I don’t have to generate it with D3. Since this base template is static, I just load it once and apply it to the DOM on init. Then I replace all other pieces of my prototype by D3 code that generates it.
Here is part of my template module code:
Yes, I know that html in a variable is ugly. Maybe loading the template from an external file would make it a little cleaner. But even there, I realized that I have less trouble reading this than the equivalent description in D3, especially when I need multiple levels of nesting.
Generic vs Multi-specific
In this pipeline, you can see some items coming from the “common” namespace. That’s where I place items that are easy to reuse across multiple charts.
When I start a new chart, I usually copy one of my chart file and just replace the modules I need to change. I can for example replace an ordinal scale by a time scale, adding a data validation layer, changing some default config. When I’m tempted to add a conditional switch somewhere in my code, I replace the whole module by a new one instead. After a couple of years of working like this, adding modules instead of overloading an existing one, I’m still surprised by the benefits. It was way more complicated before, when I was trying to be generic, adding conditionals everywhere, multiplying config options and fixing all side-effects every time I introduced a small changes.
I really like the freedom to make a new pipeline, replace some modules, just make my specific chart work the way I want, then take the time to unify some modules if I need too. That’s what I call multi-specific instead of generic.
Modular File Structure
The pipeline pattern is not the only example of modularity in the Datahub-ui code. We also need a modular file structure. Here is an example of a boilerplate I often use:
Using this wrapper, I can load files one by one with html script tags or I can use npm to distribute it as a bundle.
Splitting the code in multiple files, into namespaces, into functions that are assembled to form higher-level functions are just some examples of modular patterns I like to use. I also split my css files in at least two types. I keep a separation between the css describing the layout, all the features that are needed for the charts to work, and the theme, all colours and styling that you would want to eventually override. Even the semantic colouring of internal elements of the charts, like the color of each line of a line chart, which are typically set in D3 with inline styles from accessor functions, is left to the css.
Whatever strategy is used for keeping the code modular, it also needs to expose a consistent API to the outside world. There’s a difference between the internal API, for example this pipeline pattern, and the external one. I like to design simple external APIs that doesn’t need much on initialization, usually just the DOM container. The options could be passed on initialization, but can also be set after.
One feature that can be challenging to get right is exposing events. I use d3-dispatch internally with a simple way to expose it to an .on() method for listening to events.
In the example pipeline, there’s an “eventsPanel” module that is in fact a rectangle on top of all other elements that catches all user events and collect the appropriate information to return to the event listener. For example, I don’t rely on SVG or DOM events for detecting mouseover. I grab the mouse position and look-up what’s under the cursor directly from the data space. That trick works well for example when using canvas, or when you have a chart that combines a line and a stack of bars. The eventsPanel will grab all data available at the closest timestamp in a consistent way.
Both the internal modules and external APIs could be unit tested. I only wrote external API tests for Datahub-ui, because my main concern was to make sure the chart usage is clear and exposes a consistent contract with the outside world. I wrote a bit about unit testing D3 charts in a small book: Developing a D3.js Edge. But I also often use another form of testing: every time I develop a new feature, I add it to a giant visual test sheet. I can use it as a demo of all the features or to tweak the style with the designer, but it also makes it easy to visually find bugs that are otherwise hard to catch with unit tests.
Datahub-ui is open source, but we didn’t design it as a generic toolkit for everyone to use. It has all the features we need to connect to Datahub data, quickly build Powerboards, and in a close future, it could become a self-serve tool to build one yourself. But we want to share it early, even as an internal tool, to show how using simple patterns is a key to quickly deliver products at Planet OS. I invite you to look at what we are doing with Datahub and Powerboard.