How we build data visualizations for a global audience

Ryan Shackleton
IHME Tech
Published in
9 min readFeb 5, 2020

The Institute for Health Metrics and Evaluation (IHME) is an independent research organization whose mission is to improve the health of the world’s populations by providing the best information on population health. A simpler way to summarize what IHME does is:

We collect, process, and distribute big data for global population health.

Part of that mission is to distribute data in ways that inform a diverse user group including expert global health researchers, ministers of health, policymakers, and the general public. To that end, we provide a suite of more than twenty data visualization products that run the gamut from static charts to highly interactive tools with dozens of controls and chart types. The variation in our data visualizations reflects the diverse needs of our audiences, from the layperson who prefers a curated story to the expert user who wants the flexibility to ask their own questions of our population health datasets.

Data visualizations page at IHME
The first few entries in a list of IHME’s more than twenty data visualization tools.

On the Data Visualization team, we commonly get this question from users of our products:

How did you build that visualization?

The goal of this article is to answer that question in as much depth as we can provide, and hopefully provide a starting point for anyone interested in building their own web-based visualization.

TL;DR

  • Rather than using an off-the-shelf data visualization platform, we build bespoke data visualizations using JavaScript, D3, and React, including two of our own open source libraries: IHME-UI and ScrollyTeller
  • Our back end technologies have historically used the LAMP stack, but we are moving toward microservices architectures with NGINX and Node/Express back ends

Who built that?

Our visualizations are created by the Data Visualization team, which currently consists of six developers, a technical product manager, a development manager, and a team lead. We are also supported by multiple technology teams including database developers that maintain our MySQL databases, an infrastructure team that manages the hardware and virtual environments we use to deploy our web technology, and a central computation team that handles organization-wide computational assets.

What we build is guided by requests from health data researchers at IHME, external collaborators at health ministries, academic institutions, and the general public, who also supply IHME’s data in the first place. In that sense, the entire IHME community has a hand in building the tools we create. Functionally speaking, the Data Visualization team at IHME operates as many software development teams do, using agile development practices, two-week sprints, and code reviews. We track work efforts using agile project management software and use Git for source control. Our developers are very much Full Stack in the sense that we write our own API’s, create and manage small databases, write the visualization code, and manage application deployments.

How we build it

A major difficulty in summarizing how we build visualizations is that it’s not entirely consistent across our twenty-plus applications. Like many organizations that have been building web applications for more than a decade, we have a mix of legacy code and newer technologies that we are moving toward. With that said, the following sections generalize the most common technologies we use on the front end and back end of our applications.

How do you make those impressive charts?

Most people asking “How did you build that?” are probably most focused on how we created the chart itself. In other words, “How did you convert rows of data into that great bar chart/tree map/line chart/scatter plot?”.

A pyramid chart, or dual stacked bar chart from IHME’s GBD Compare tool.
A pyramid chart view of death rates by cause for selected countries in the GBD Compare visualization.

The short answer is: from scratch using JavaScript and CSS. The longer answer is: we use custom D3.js code in older applications, and commonly use D3.js in combination with React.js in our newer applications.

Vanilla JavaScript & D3.js

Some examples of D3.js code can be found on bl.ocks.org or ObservableHQ.com. Many of these code samples offer a good starting point for learning code patterns to build charts in JavaScript, but generally aren’t modular enough, don’t handle component state very well, and don’t follow our code style practices (a modified AirBnB style) to be very usable. Thus, many of our older D3.js components are coded from scratch and are not open source. This US map and a derived choropleth on ObservableHQ are examples with code patterns similar to the way some of our internal D3.js code is structured.

React with D3.js

To standardize some of our visualization code, the Data Visualization team created an open source repository called IHME-UI. Like many React-based data visualization frameworks, IHME-UI uses React to manage component rendering to the DOM, while leveraging D3.js for low-level chart scaling, layout, and map transformation functionality. Elijah Meeks has an excellent discussion of the tradeoffs of this approach in his article and book on the subject. Amelia Wattenberger also has an excellent instructional blog post on the topic.

The example below is from the IHME-UI demo files, and shows how a chart is composed from individual React components that represent different parts of the chart such as scales and symbols (lines in this case). An <AxisChart> component encloses <XAxis>, <YAxis>, and <MultiLine> components to compose a complete line chart. The <XAxis> and <YAxis> components leverage D3.js to compute transformations from x/y space to pixel space, and format ticks and tick labels. The <MultiLine> component uses D3.’s svg path function to place the lines in the appropriate position in pixel space within the svg element.

An example from the demo files of IHME-UI showing React code to generate the chart.
An example from the demo files of IHME-UI showing React code to generate the chart.

IHME-UI was created primarily to unify the look, feel, and behavior of some IHME charts, but is far from complete in terms of its available chart types. For those interested in building some of the more common chart types using React, several other React-based charting libraries like Victory, Semiotic, React-Vis, and Recharts use similar approaches to customizing axes and chart symbols. Other React-based libraries like Nivo offer higher level implementations of individual charts where props determine axis and shape behaviors.

Scrollytelling

Scrolly-what? Several of our visualizations, such as the child mortality and tobacco control visualizations, are designed to guide the user through a story on a given topic, with the primary interaction being that the user scrolls to continue through the story. Whereas many of IHME’s visualizations are exploratory to allow experienced users to ask their own questions of data, scrollytelling visualizations are explanatory to appeal to users with less experience in a given global health topic.

A scrolling data story about mapping global child mortality rates.
A scrolling data story about mapping global child mortality rates.

To create these visualizations, our team wrote an open source library called ScrollyTeller, which provides a framework for dynamically creating a scrolling data story from configuration files and tabular data containing the story “narration”. See this link for a scrollytelling data story that explains how ScrollyTeller works.

What about the back end?

As most web developers are well aware, the front end code that renders our visualizations couldn’t exist on the web without a significant amount of development infrastructure. Visualizations with interactive, dynamically changing charts and multiple views require robust web technology, with flexible backend APIs to organize and serve the data from databases, not to mention varying amounts of web traffic, data caching, etc. We won’t go into too much detail about how we deploy, but for anyone interested, we describe some generalities about the types of development methods we use. In other words, what does our stack look like?

A typical stack: LAMP & JavaScript front end

The LAMP acronym might not mean much to those unfamiliar with web technologies. In our case, it stands for Linux, Apache, MySQL, PHP, which forms both the web server and the back end API for many of our older applications. In most cases, these projects are monorepos containing all of the code necessary to build and deploy the applications. An easy way to explore our stack is to break down the file structure of one of these monorepos for a typical project. The main components are:

  • An index.php file, which is the entry point into the web server, and routes the web and api server using Apache or sometimes using the Slim PHP framework.
  • An api folder to host backend PHP files. The API connects to IHME’s MySQL databases and uses SQL queries to query data via http routes consumable by the front end.
  • A php-templates folder to serve the base web components (usually just header, footer, body in HTML format).
  • A Docker folder + Jenkinsfile to support automated builds via Jenkins, a well known open source automation server. The Jenkinsfile is written in Groovy and orchestrates the containerization of the Linux/Apache web server environment, which is deployed via Rancher.
  • A src directory containing the front end JavaScript code, CSS, and any static resources such as images or icons. Most of our D3.js (and/or React) code that renders the data visualizations is here, along with any CSS and sometimes static resources like images and icons.
  • A variety of .files and other configuration files that configure Composer (PHP dependency management), Node (JavaScript dependency management), and Webpack/Babel for transpiling and bundling our front end code.
  • A README.md file to tell our developers how to set up the project.
File structure for a generic IHME-application to illustrate API, source code, developer tools, and deployment.
File structure for a generic IHME-application to illustrate API, source code, developer tools, and deployment.

This type of architecture has served us well, and because many of our projects are structured in this way, we can get up and running relatively quickly with this stack. That said, many of our developers like to work exclusively in JavaScript and sometimes Python, which has prompted us to explore some different project setups.

A more modern stack: LEMN(?) stack & React front end

Many of our more recent projects have replaced Apache web servers with NGINX, and replaced PHP/Slim back end servers with Node/Express. Thus, many of our stacks use variations of Linux/NGINX/MySQL/Node. In most cases, we still stick to using monorepos, but break each of the services up into their own containers that run separate Node or Apache servers. The main components of this stack are:

  • An app directory containing the front end JavaScript code. The typical React/Redux file structure may look familiar here, and may vary depending on the project. Upon deployment, the app files are bundled and copied to a separate app container (using Docker/app.Dockerfile).
  • A variety of .files and other configuration files that configure Node (JavaScript dependency management), and Webpack/Babel for transpiling and bundling our front end code.
  • A server folder that hosts an Express backend. In this case, the backend is a separate Node/Express application that is built into its own container upon deployment (using Docker/api.Dockerfile).
  • A Docker folder + Jenkinsfile that contain code to containerize the Linux/NGINX web server environment, and configure NGINX to route (proxy) traffic to each of the app and api containers. Again, all of this is deployed via Rancher.
  • It’s worth noting that this configuration still uses a containerized version of Apache (hence theindex.php file) for the web server, which exists primarily to conform with some existing Apache-based infrastructure at IHME.
  • A README.md file to tell our developers how to set up the project.
File structure for more modern applications using NGINX/Express/Node back ends
File structure for more modern applications using NGINX/Express/Node back ends.

Microservices architecture in Local Burden of Disease visualizations

The map-based Local Burden of Disease visualization.

A notable exception to our development environments are the Local Burden of Disease visualizations, which are map based and are deployed using a microservices architecture with completely separate components in different code repositories. Because the same codebase is used to display many different health indicators, the codebases must also be highly configurable. Configuration is accomplished via a JSON based configuration language that defines the dimensions and shape of the data, database specifications, and how different controls and chart components should be rendered in the view. This stack is partially open sourced as the Choroscope platform, which is summarized nicely here. On the front end, the Local Burden of Disease visualizations also utilize Leaflet.js, or more specifically, React-leaflet to handle functionalities such as basemaps, map layering, and zooming/panning.

Got all that?

Hopefully this gives a flavor for the complexities of building, deploying, and maintaining the twenty-plus visualizations that IHME distributes. Please visit the IHME website for the latest on what the organization is up to, our data visualizations page for a complete list of our tools, or our careers page if you are interested in working with us.

--

--