2Q17: How we built a dataviz of Google search interest in the German election
Wahl 2Q17 was a joint effort by data visualization freelancers Moritz Stefaner, Dominikus Baur and Christian Laesser, with the Google News Lab (Isa Sonnenfeld, Jörg Pfeiffer and Simon Rogers) and project advisor Alberto Cairo.
Our goal was to visualize the Google search interest around the German elections end September 2017.
Over the course of the project, we launched a lot of different smaller and bigger visualizations, from daily to yearly views of the top searched terms for the candidates on our project site 2Q17.de over embedded widgets on external sites to special interfaces for live events and debates and images and movies for social media.
→ Find a good overview of all the products we produced here.
After previously discussing our design process in detail, we now want to talk about what you can’t see in the finished project: all the various decisions that went into the actual implementation.
2Q17’s main site is based on a relatively simple feed-forward mechanism: we get data from Google Trends, turn them into our own format and store it in the backend, from which the frontend can grab and display them.
The frontend itself is a static React + mobx-based website built with webpack.
Yet, the devil is — as always — in the details. So let’s dive right in!
Backend
The bigger the project, the simpler you want to keep individual aspects. We’ve tried to stick to this premise when it came to the backend:
Starting out, we got daily snapshots of the data in its raw form directly from Google Trends (and sorry, there’s no public API available for obvious reasons). Since this data still needed some treatment (see “Text data needs gardening” section in the previous Medium post), we had a dedicated Linux virtual machine set up on the Google Compute Engine that ran various python scripts daily to get the data into app-compatible form. The resulting files would land on Google Cloud Storage and be directly accessed from the frontend which itself was served from there.
One of the nice surprises was how well Google Cloud Storage performed as a super-simple webhost. By tweaking the DNS settings right you can use it to very easily host a static website. And the performance is great: even once tens of thousands of people hit the site after the Spiegel Online timeline article went live, there was no noticeable slow down. We ended up hosting everything there, from the data to the site itself and the embeds.
Again in the spirit of keeping things simple we early-on decided against using a database. While databases are useful for manipulating data or retrieving very specific aspects of a data set, we had neither of these things: our backend scripts would create TSVs and the frontend would read them whole and display them.
Even when the complexity of the data grew — from daily snapshots, to weekly and monthly versions, and finally to four hours-old “real-time” data sets — we were still fine.
There’s two types of data files we’re working with: the “latest” versions, which shows the latest daily- or 4h-data, and the “archive” versions, which contain data for every day, week and month in 2017 (up until the election).
Everything is stored as files on Cloud Storage and can be retrieved with a simple fetch operation. While the site loads the latest data automatically, changing the timeframe from days to something else triggers these on-demand requests. Similarly, fixed embeds with a specific date also load one of the archive data. Fortunately, these additional requests are pretty fast, so that changing weeks or months only results in a very subtle delay.
Code support during the design process
When it comes to working with data, we’re always facing the (potential) break between ideas and data. Once you put the actual data into your designs, lofty visions might shrivel quickly. That’s also what makes datavis design closer to designing games than static graphics: it’s about creating rules and seeing what pictures the data draws with them. And in the end, the data always wins.
That is also why code support and quick prototyping is critical during the initial designs — to create shorter feedback loops between ideas and the resulting data-driven graphics. But code can also work as a playground where it becomes easier and faster to explore new ideas.
Working with a much-maligned type of chart definitely increases the motivation to getting it right. That’s why we set up a dedicated prototyping environment just to play around with various types of word clouds and their animations. While word clouds can be usually found as static graphics (thanks Wordle!), they’re arguably much more expressive in an animated form. In our case, the frequent switching between sets of words (e.g., from one candidate to the next or one day to the other) just had to be supported with suitable animations. But that opened up various questions:
Where do the search terms come from? Where do they go? How can we differentiate between search terms that stay on the stage and the new or leaving ones?
Thanks to our custom prototyping environment that let us play around with different layouts and animations in short pieces of code we could relatively quickly explore different ideas and visual metaphors:
This environment took care of layout-independent tasks (mainly transitioning between terms and rendering them) and let us quickly churn out variations manipulating every aspect of the DOM we could think of.
After implementing various ideas, we settled on an internal rule set for those transitions. The final word cloud show the taken decisions and how those transitions can support the visualization:
2Q17.de
Similarly to the last few projects we did, 2Q17’s frontend is built on a combination of React + mobx.
One common problem for complex web applications is how to manage the internal state. Javascript is flexible enough to easily create spaghetti code of the highest order (it definitely feels easier to produce than the properly organized variety).
So, using some state management library can help in constraining the code from delving into pasta-territory. While mobx is less known than its ideological siblings Redux und Flux, it has similar ideas: there’s a centralized state that controls how the application looks. Interaction, new data, etc. change the state which is reflected in a re-rendered application.
Mobx is a lot less dogmatic when it comes to the actual implementation, though, (that’s what mobx-state-tree is for) which it gives it a nice, low-impact feel. It basically provides you with ways to create observable values, automatically computes other values based on them and re-renders React components when they’re affected by value changes. It also works great with React, which anyway favors a “pure function” approach, where components themselves should have as little state as possible and be just dumb containers for feeding from a global state.
In the 2Q17 case, our overall architecture is based on two main mobx “pipelines”: one for DATA, one for STATE. The data pipeline keeps track of data loading and processing (deriving values from the TSV files), while the state manages changes through interaction: selecting a different day or candidate or clicking on one of the pulse bubbles.
Both pipelines are merged in the dataAPI class, which creates props for the React components based on info from both. For example, dataAPI’s “wordcloudTerms” function filters the wordcloud data from the dataStore according to the currently selected candidate and timeframe in uiState and spits out an easily digestible array as prop for the React WordcloudComponent. If the values in uiState change due to clicks or taps, the wordcloudTerms function is re-evaluated and the WordcloudComponent is re-rendered with an updated array.
Mobx automatically keeps track of all variables that a given React component relies on and through this dark magic, all changes in either of the pipelines automatically re-render affected components. So, everything a component sees from a state change is that their respective props have changed. Sticking with the WordcloudComponent as example, it is initially rendered with just an empty array for its “terms” prop which causes it to stay blank. Once the data pipeline is done loading the relevant TSV-files and the dataAPI with processing them into a WordCloud-compatible format, mobx triggers a re-render on the component with the new — now filled — “terms” prop. This results in the new terms floating into view. Further changes to the terms (e.g., by switching candidates), trigger more re-renders with new data and so on.
This automated re-rendering really is the most convenient aspect of mobx, since developers no longer have to keep track of which components are affected by a certain state change. We even went so far as to make as many components as possible stateless (easier to debug, easier to re-use, …), to take advantage of the centralized mobx state.
Performance
Facebook’s React framework is great for building complex web applications since it encourages modularity (Components) and enables a very convenient fire-and-forget rendering approach with its virtual DOM.
In reality though, there are still some pitfalls when it comes to performance. While building a simple, non-animated website with React is very easy, having something that performs well even with hundreds of animated elements floating around is decidedly harder.
Without going too much into details: since React is relatively naive when it comes to re-rendering components, shouldComponentUpdate is a must to tell it explicitly when a re-render is necessary. By defining this function in a component, you as a developer can decide when the component should re-render. This can be crucial when hundreds of components would be re-rendered each frame even though nothing (visible) has actually changed.
Along those lines, minimizing the render functions by splitting everything up into tinier and tinier components might seem petty, but does wonders for the performance (especially when being very decisive in re-rendering them). When you have a look at our word clouds, even single terms are actually two nested React components, to separate the position- and scaling-animations (which are cheap CSS-transforms) from the more costly color- and font-animations:
While working on performance, you sometimes also learn something new about tried-and-true approaches:
Everything usually gets faster when work can be split into separate threads. Since this is only possible through Web Workers in Javascript, at some point I created a version of the word cloud that performed the d3-force calculations in a web worker. But unfortunately, moving the data for hundreds of nodes between the background and foreground threads once a frame actually caused more overhead than performance gain (despite JSON.stringifying everything). So I ended up dumping this approach, with hopes for SharedArrayBuffers in future browsers.
Talking about tried-and-true approaches: one common thing I like to do when trying to make things fast is switching from the DOM/SVG to either Canvas or even WebGL. With browsers becoming better and faster, however, this year it actually made sense to stick with the DOM:
In the mobile word cloud, tags are entering from the left and exiting on the right side of the screen. And if you look closely, you can see that the tags are initially even outside of their container and twisted a bit, thanks to some rotateY- and perspective-CSS magic. These effects would be very hard to recreate with Canvas or WebGL, so we decided against dropping the DOM.
What helped us with the performance in this regard was that word clouds become cluttered and hard to read with less space. That’s why the number of terms (and thus the number of costly to render elements) drops with screen width. You can try it yourself (and feel like a web developer) by resizing the window — the smaller the window gets, the more terms leave the stage and the less new ones get introduced.
Wrapping up
If somebody would have shown us the eventual extent of the 2Q17 project when starting out — who knows if we would have actually done it! Dozens of different data visualizations working across phones, tablets and desktops, varying timeframes with close to real-time data, and half a year of sometimes quite intense efforts led to a project that — looking back — became quite grand in its ambition.
Code was a helpful and sometimes frustrating companion along the way. By keeping things as simple as possible (skipping a complex backend, relying on React + mobx for a straightforward frontend architecture), we could postpone some of the inevitable complexity towards the end. Similarly, focusing optimization efforts only on parts of the code that were never going to change or never being re-used, eased the workload and made us throw away less code.
And so in the end, an abundance of discussions and experiments on both the design- and development-sides let us explore our very varied data source in detail and learn almost more than we ever wanted to know about Germany’s interest in their political candidates.
This article is a joint production by Dominikus Baur, Christian Laesser and Moritz Stefaner.
If you enjoyed this write-up, make sure to check out also:
- Christian’s Behind the scenes article
- Moritz’s reflections on lessons learned and the portfolio page at http://truth-and-beauty.net/projects/wahl-2q17
- http://2Q17.de for an archived version of the site.