How and why we made our first audio data visualisation

This week marks the four-year anniversary of the Saudi-led air raids on Yemen. Since the start of the aerial campaign on March 26, 2015, Yemen has been pounded by over 19,000 air raids across the county.

To mark this event, we produced DEATH FROM ABOVE: Every Saudi coalition air raid on Yemen.

DEATH FROM ABOVE: Every Saudi coalition air raid on Yemen.

We started this project exactly one month ago, after our Yemen subject specialist came to us with a spreadsheet showing roughly 19,000 air raids.

This was compiled by the Yemen Data Project and is publicly available. The data set, which has been widely used by news organisations, contains a detailed cross-referenced account of every raid across the country since 2015.

Once we had the data, we began storyboarding our project’s main editorial goals. When working on any data-driven story, it is very important to define which elements of the data should be focused on. This process creates a lens through which to mould the story.

Project goals

  1. The main goal of this project was to document the devastation wreaked on Yemen by the more than 19,000 air raids over the course of 4 years.
  2. Secondly, the project had to be built with reusability in mind so it could be updated as and when new data becomes available.
  3. Thirdly, the piece had to provide sufficient contextual elements to help readers grasp the extent of the human cost of the war.

Data analysis

The first point of action when dealing with a large data set such as this one is to conduct an initial exploration of the data. This is a large part of what data journalists mean when we say “interview your data”.

Google Sheets allowed us to collaborate on initial findings

During this phase, we listed down all the questions that we wanted our data to answer. This included big-picture questions such as “how many air raids were recorded each year?” to more specific questions such as “was there a decrease in raids on religious holidays or during ceasefires?”

We chose to first upload our data into Google Sheets which allowed us to collaborate and query the data with several points of view on how to interpret its meaning.

After generating several worksheets of summaries, pivot tables, and follow-up questions, we knew that it was time to move our data into R for a more thorough analysis.

R’s Tidyverse suite of data wrangling packages.

R is a very powerful statistical programming language popular among data journalists. It contains many amazing data wrangling packages and allows you to create a reproducible data analysis process — something that we knew we would need in the future.

The basic steps involved in the data analysis were:

  1. Import the data and data wrangling packages
  2. Filter the data that we’re interested in
  3. Reformat columns as dates and integers
  4. Find answers to all the questions we asked
  5. Consolidate the output data required for the presentation

A step-by-step walkthrough of the process is available here: https://github.com/megomars/YemenRPreprocessing/blob/master/Processing.rmd

Resisting making “dots on a map”

In any data visualisation, there are many different ways to highlight specific elements of your story. If location is what matters most, then some sort of map probably makes sense. If time is the most crucial factor, then a timeline might be fitting.

Our 2018 data visualisation of air raids across Yemen (as seen on an iPhone 6/7)

One of the biggest temptations was to present the raids as circles on a map. We’ve done this before but resisted going down the same path for three main reasons.

First, the locations of the air raids are important but were not the most important aspect of our story.

Second, areas of circles are generally difficult to compare and accurately depict especially in concentrated areas.

Third, the shape of Yemen is rectangular — this makes it very difficult to see the map especially on mobile.

Usability vs Shareability vs Complexity

One of the toughest decisions with any interactive is to find the right balance between making something that is easy to use, easy to share and easy to understand.

This is where the user experience (UX) design comes into play. With such an open format comes an infinite number of designs and technical functionality to present your story. Whereas other platforms have specific constraints built into their medium (text articles, videos and photo galleries), interactives tend to be an open canvas. This can make what we do really expressive and creative but can also make the story hard to consume.

Another consideration is how to make the data visualisation sharable as native content on social media platforms. As part of our content delivery strategy, every interactive should have accompanying bespoke content (vertical videos for Instagram, cards for Twitter, and videos for Facebook).

Finally, when dealing with 19,000 data entries you really want to try present as much meaningful information without overly complicating the visualisation. Despite constantly bugging people around the office to “please take a look at this and tell me what you see” we still had reactions ranging from “I have no idea what’s going on here” to “oh, wow this is amazing”.

How the story was told on Instagram Stories: https://www.instagram.com/aj_labs/?hl=en

Why we chose audio as our vehicle

With every interactive comes the impulse to try something new or unique. We’ve always wanted to produce an audio data visualisation but we knew it would have to be for the right story.

The basic idea behind an audio visualisation is that if you closed your eyes you should be able to listen to the story unfold. Just like with data visualisations where each visual is mapped to a data point, an audio data visualisation would need to be mapped to an individual audio file.

Generating the right audio proved to be quite tricky. Through several iterations — starting with piano notes — we quickly realised that there’s quite a fine line between communicating a story with the sounds and creating something that is either too monotonous or melodic. After doing some basic arithmetic, we realised there was no way to play back 19,000 individual sounds in a reasonable time period. We actually listened to many different samples of speedcore music to find out what the maximum number of beats per minute we could produce with a reasonable playback length. Even at 250bpm, 19,000 beats is over an hour of playback. The most feasible solution was to aggregate the data per day. This allowed us to get 19,000 sounds down to 1408 (March 26, 2015–31 January 2019). With this more manageable number, we could better pace our audio to convey a meaningful story.

To design the actual audio tracks, we relied on the expertise of a sound designer, who specialises in EDM.

The brief was to generate a set number of unique sounds with increasing levels of intensity. Once mapped to the data, it would provide an audio equivalent of the choropleth calendar that we had settled on in the data visualisation. Based on the daily range from 0 to 52 daily air raids, this turned out to be 12 distinct audio tracks in 5 step increments.

12 distinct audio tracks in 5 step increments.

Data-driven audio

To present 1,408 days of consecutive air raids (March 26, 2015 — January 31, 2019) we needed to find a way to programmatically stitch up all the individual sound files based on our data.

After a bit of research, we settled on the swiss-army knife of command line video editing tools, FFmpeg. Using this free software we were able to write a few Terminal scripts which:

  1. Combined the audio files
  2. Generated a one-frame-per-second video to match the audio
  3. Blended the audio and video
  4. Trimmed yearly segments for each of the five years we were presenting

The scripts are available here: https://gist.github.com/megomars/ee6d82bd05ef1de22cbe007b2c6489bc.js

ffmpeg -f concat -i audioscript.txt -c copy output.mp3ffmpeg -y -r 1/1 -f concat -safe 0 -i “videoscript.txt” -c:v libx264 -vf “fps=1,format=yuv420p” “out.mp4”ffmpeg -i audiomix64.mp3 -i out.mp4 mixout.mp4ffmpeg -ss 00:00:0.0 -i mixout.mp4 -c copy -t 00:04:41.0 output2015.mp4ffmpeg -ss 00:04:41.0 -i mixout.mp4 -c copy -t 00:06:06.0 output2016.mp4ffmpeg -ss 00:10:47.0 -i mixout.mp4 -c copy -t 00:06:05.0 output2017.mp4ffmpeg -ss 00:16:52.0 -i mixout.mp4 -c copy -t 00:06:05.0 output2018.mp4ffmpeg -ss 00:22:57.0 -i mixout.mp4 -c copy -t 00:00:31.0 output2019.mp4

Silly technical quirks

The curse of the one-second audio file

Surprisingly, we really struggled to generate exactly one second of audio using Audition, Premiere, and even FFmpeg. No matter what we tried, even directly specifying the export as one second we always ended up with an audio file that was either just shy of or just above one second. We ended up generating the complete audio track and then speeding it up by a tiny percent to reach 23:28 (1,408 seconds).

This is supposed to generate a one second audio file:

ffmpeg -stream_loop -1 -i 11f.mp3 -vcodec copy -ss 00:00:00.000 -t 00:00:01.000 11.mp3
Not quite one second of audio

<canvas> maximum height

In one of our early prototypes, we wanted to show a single data visualisation from 2015 up until 2019 using D3’s canvas feature. Using the canvas element instead of the default SVG can significantly improve the page’s performance. However, even at 25px height per day, our visualisation was going to end up being over 35,000px. It turns out that the maximum canvas height is 32,767px.

Scrubbable audio on mobile

In another one of our later prototypes, we were hoping to generate a draggable audio scrubber which would directly connect the audio and data visualisations. Unfortunately, the problem we faced here can be best described as incredibly tricky. On the desktop, the solution probably would have worked ok but on mobile, especially on very small screens, the draggable touch action interface which would play back the specific audio snippet based on scroll depth was just too fiddly. Rather than spend precious time on polyfills and excessive cross-browser testing we decided to render a standalone video and accompanying data visualisation.

React + Nivo.rocks

To wrap up the whole interactive we spun up a boilerplate React application (create-react-app) and began creating the respective components using ReactStrap, React-Player and Nivo-Calendar. We’ve been using React to build all our interactives over the past eight months and have really enjoyed the benefits this reusable component development environment brings. One such component is Nivo.rocks’ D3.js calendar component which we used to quickly generate the data visualisation.

Nivo.rock’s D3 calendar component

Next steps

We hope that this retrospective serves at least one of these three outcomes. Firstly, to help other data journalists navigate the sea of data visualisation techniques. Secondly, we hope that these write-ups can help bridge the gap between technical and editorial teams by revealing exactly what goes into producing a story like this. Finally, we’d really love to learn from other’s experiences and suggestions for making better data-driven stories, especially with audio. We would also like to experiment further with how to better integrate timelines with audio to retain visual and editorial functionality. Please follow us on Instagram and Twitter.

Written by Mohammed Haddad

Data, visual storytelling and experiments team @AJEnglish. https://t.co/OE18n7JIJp

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store