Döner charts, eye roll gifs and word clouds — some things we learned visualizing Google Trends data around the German elections
Google Trends provides a fascinating data source, and I’ve had great fun working with it in the past. So when the Google News Lab reached out to think about ideas for a joint project around the German elections 2017, I was immediately interested.
How can we capture the search interest of people related to the parties, the elections, and candidates? What do people care about? How quickly does attention shift? These are all questions we can now answer empirically.
So I teamed up with my long-term collaborators Dominikus Baur and Christian Laesser, and together with the colleagues at Google News Lab (Isa Sonnenfeld, Jörg Pfeiffer and Simon Rogers) and project advisor Alberto Cairo started to explore possible directions for the project.
It soon became clear that we did not want make a tool to see merely “who’s ahead in the race” — in fact we wanted to work against the notion that would be all that counts.
Rather, we were aiming to provide a launch pad for observations and investigations into individual topics, and a guide to formulating new questions. We were, ourselves, curious and excited what this live data source would reveal.
Over the course of the project, we launched a lot of different smaller and bigger visualizations, from daily to yearly views of the top searched terms for the candidates on our project site 2Q17.de over embedded widgets on external sites to special interfaces for live events and debates and images and movies for social media.
If you did not follow the project while it was live:
→ Find a good overview of all the products we produced here.
→ There’s loads of experiments and design that did not make into the final product, some of which you can explore in our upcoming design process article (coming soon!).
Here’s a few reflections on the things we found and learned!
Quantitative + qualitative = ❤
Studying a lot of trend curves in the early beginnings of the project, it became that a driving principle of our work should be to bring quantitative (Who gets how much attention?) and qualitative aspects (Why?) together. Nothing is more frustrating to see a spike in a graph and not to understand why it’s there!
This leads us to using word clouds as the primary visual device for our project. A bold choice, as word cloud are an infamous chart type, often seen as the croc shoes of data visualization: (somewhat) functional, but cheap, clunky, and tasteless.
However, if done right, word clouds can reveal an amazing depth of textual data at a glance, and quickly provide an overview over a weighted set of terms. Because we knew they represent a design cliché and can quickly flip into a messy “letter soup”, we put extra in care into their design and behavior.
Text data needs gardening
So, working with text and qualitative data is great, but, as anyone who ever worked with it, knows: text data is super rich, but also highly unstructured and messy. In our case, the data was not even traditional texts, but Google search phrases, so we needed to find out how to treat it best. We ended up with a quite simple and lo-fi approach, in order to make sure to leave the original data as intact as we could.
Technically speaking, all searches run through a series of regular expressions. Some fix common misspellings:
petri → petry
oe?zdemir → özdemir
martin schult?z → schulz
others remove common German stop words:
\b(bei|zu|im|in|mit|und|für|von)\b → “”
and of course, we also had to filter out a few profanities.
We also removed the own candidates’ names from the queries — things would become quite rendundant otherwise. But, we also kept a list of phrases to keep intact, such as “merkel gegen schulz” or “merkel muß weg”.
Unserstanding the mix of topics around the candidates was one of the most interesting parts of the project. Looking at the types of terms early on, we saw they could be categorized into just a handful of categories:
Search interest range from political topics to gossip (like Lindner’s hair), from short-lived memes to terms that have constant level of attention (such as “refugees” for Merkel). We actually used this finding to improve the design of the site: in the tag cloud, we now group persons, media, places, and gossip separately.
One other corollary in this area: As it seems, appearances and gossip topics as important as political contents. Politics and show business might be close together after all.
Different time frames surface different phenomena
Choosing the right perspective is major 🔑 in data visualization, and in our case, different temporal aggregations made quite a difference. We were looking at data from the 5 minute intervals to monthly aggregations, and found different patterns on each level.
For instance, in a daily perspective, local campaign events were often the most searched term:
Weekly terms seemed to provide a lot of the actual discussed contents, cutting through the high frequency noise
Large time frames highlight evergreen search terms: “merkel muß weg”, “refugees”, “age”, “wife/husband” etc
Accordingly, it can play a huge role how you aggregate the data.
Even choosing 5 minute slots or 10 minute slots for the live TV debate coverage made a difference in terms of terms surfaced and volatility of the data, so its worthwhile to run a few tests with different aggregations.
Top or trending?
Likewise, a crucial decision was if we wanted to show top searched or most trending terms. We debated this point many times during the project, and actually built a full mirrored version of the site to compare the two approaches.
In the end, we settled for showing the top most searched terms, for the following reasons:
First, the basic premise of our project is that we show what Germany is searching with respect to the candidates, and our key visual should reflect exactly that — a summary profile of what people are interested in, and how much.
Running the site in test mode for a couple of weeks, we observed the top trending dataset sometimes surfaces interesting, more specific stories (such as the Merkel’s eye roll gif) but this comes at the expense of losing a few really obviously important terms for a day (such as G20 hamburg as a main topic, which lasted several days) and also bringing to the forefront some really obscure and unrelated queries.
Finally, sizes and areas are strongly perceived as visualizing amounts. I am certain that many people would misread a “trend” word cloud and assume the biggest terms are also the most searched ones — regardless how many footnotes and explanations we would add there. So, also from a information design point of view, it’s actually quite problematic to visualize an indirect measure without showing the base amount, and especially using sizes and font-weights.
That said — mixtures are possible, too. For instance, in the candidate cards, we show the top 7 trending terms among the top 15 searched ones, and this worked really well to find the most defining searches for a day and candidate, for instance when Alice Weidel made a big splash leaving a TV show. In this case, we don’t vary the font-size, but just hint at the ranking with subtle color and typographic differences.
Spread & diffuse
It was really rewarding to see individual components get picked up across media and sites, beyond our own website.
Some media remixed our visual ideas and data — others, like ZEIT Online embedded our widgets in live blogs:
SPIEGEL Online featured a great analysis of the timeline:
And also, investigative journalism platform Correctiv, looked at our word clouds in depth:
And, we were even on morning television 😬
Finally, some media even seemed to have used our data creatively to drive traffic to their site: Neue Osnabrücker Zeitung made whole candidate portraits based on the top search terms — a smart trick to answer the most common questions for readers, but also perfecting their SEO (Search Engine Optimization) strategy.
Thinking how your products can happen elsewhere than your own site is definitely worthwhile. On the other hand, two things to consider: building a well-embeddable object is REALLY HARD, there’s a a lot to consider technically, so our advice here is to start small and grow with the requirements. Second, even if you have a technical embedding mechanism in place, it does not mean people will automatically use it. A big part is to actually be in touch with potential users, talk to them and help them achieve their goals.
Hindsight is 20/20
We are overall really happy with the outcomes of the project. It was quite a ride and we really learned a lot along the way. Yet, here’s some things we might have looked into, were we to do it again.
Was the decision to focus on candidates a good one? One concern is that the focus on the personalities surfaced also a lot of superficial search terms, and left the political contents a bit behind. We did consider widening the scope to parties and topics, but then felt it might dilute the product too much. And the big advantage of a person-related query is that it can be formulated quite precisely, while searching for concepts or even parties is actually much trickier to get right. So I think the call was right, although I sometimes wonder how the product might have looked with another “data angle”.
One thing we should have planned for: an uneventful race — because this is what it was for a long stretch of time. It was really quiet until mid August, and unfortunately, Martin Schulz never really managed to threaten Merkel. The last two weeks before the election became somewhat interesting, but we were hoping (and designing) for many more accents, and turning points. It’s hard to design for prima facie unexciting data, unless you start to editorialize and pull out individual stories manually, for instance in direct collaboration with journalistic partners. So, this is something we might have done in a different setting or had we known how the data would develop.
As you can see in the portfolio page and the process post (coming soon), we produced a lot of different interfaces and design and products. Our original plan was to launch a very lean first product very early on and then build up parternships and look into specific collaborations on individual topics, but the project developed differently. Overall, we might have spread ourselves a bit thin, or overdesigned/-engineered some of the components. For instance, the top terms “skewer” (or Döner :) layout and the embedding took a lot of design and engineering resources, and in hindsight, we are not sure if they were ultimately worth it. Then again, as they say, in pretty much any project you could have achieved the same with 20% of the effort — but you never know which 20% until you tried the other 80% :D
Last year, after Brexit and Trump, I was wondering out loud:
And, here in 2017, and after an election that saw a sweeping success of the far right, I am still wondering how we can bridge the gaps between the data-haves, and the data-have-nots, the ones who benefit from digitization and globalisation, and those who are left behind.
I don’t think we have the answer yet, but I hope that this project points in a few promising directions — by opening up data sources and making them available in a direct, accessible way, by working with qualitative contents beyond just “numbers and stats”, by challenging people’s assumptions, and inviting people to research and hypothesize and explore themselves.
We will need new seismic sensors for changing political landscapes, and I hope we were able to contribute a little to the journalistic digital toolbox in this area with this project. But, clearly, it’s a marathon, not a sprint.
→ Read more about the project at Truth & Beauty.
→ There’s loads of experiments and design that did not make into the final product, some of which you can explore in our upcoming design process article.