Sonification For Impact: Turning New York City COVID-19, Climate Data and Social Vulnerability Index Data into Sound
Sonification And Speech
If you have never heard the term sonification before, it is defined as the use of non-speech audio to convey information or perceptualize data. The process of sonification involves turning data sets into audible sound using pitch, duration and other properties. Using this restrictive definition of sonification we find that the technique by itself is limited to certain use cases. Scientists at MIT have turned the structure of the Coronavirus into music. A TED talk by blind astronomer Wanda Diaz Merced explains how she uses sonification to listen to the universe to detect new astronomical discoveries.
For the purpose of extending the impact of data storytelling to broader audiences, it is more useful to expand the definition to include additional elements of speech to build an easily understandable audio soundtrack that is data-driven and can give additional context, narration and explanation. GPS is an example of speech translating data to create a sonic map; a virtual voice communicates location-based data to lead you in the right direction.
A legend is a graphic design term. A graph legend usually appears as a box positioned in the top right or left of a graph to give context. The box contains small samples of each color on the graph as well as a short description of what each of these colors mean. A key problem of sonification is how to replicate this legend effectively to give context to the sounds without pictures or text. The addition of speech in this element of the workflow seems to be the most effective means of describing the sounds for the listener.
Exclusively visual methods are fit for purpose for data professionals communicating with other data professionals but have had limited success in driving reach and engagement among broader audiences. I believe that instead of measuring and evaluating the usefulness of data-driven audio solely in contrast to its visual counterpart, it is useful to view methods for sonification and speech from the perspective of storytelling with the goal of extending impact, reach and engagement. This years’ coronavirus COVID-19 health crisis highlighted the urgent need to address these limitations at a time when access to information could have life or death consequences.
Building strategies to extend impact will require reframing the problem: to move from information design to data-driven storytelling. It will also require expanding the conversation to allow data scientists and designers to gain insights from journalists, audio content creators and the blind/visually impaired (BVI) community. Furthermore, it will require us to move beyond the limited definition of sonification.
The Decline Of The Dashboard
Sonification is still largely overlooked in the field of data visualization where spreadsheets, text, charts, maps and graphs are regarded as wholly sufficient. But data analytics is being disrupted by new technologies and approaches. Alan Smith writing in The Financial Times in 2019 stated that “Some might be inclined to dismiss sonification as a novelty, but a new generation of screenless devices with voice interfaces, such as Amazon’s Alexa, marks the end of silent interaction with computers. It is perhaps naive to think that data will continue to just be seen and not heard.”
In 2020, Gartner published their Top 10 Trends in Data and Analytics. Predicting the decline of the dashboard they argued that “By 2025, data stories will be the most widespread way of consuming analytics, and 75% of stories will be automatically generated using augmented analytics techniques.”
Augmented analytics is the use of enabling technologies such as machine learning and AI to assist with data preparation, insight generation and insight explanation to augment how people explore and analyze data in analytics and BI platforms. Examples include ‘Lexio’ by Narrative Science, a language-based augmented analytics product that turns business data into interactive plain-English stories and Arria Answers by Arria NLG a conversational-AI platform that integrates with business intelligence dashboards and Amazon Alexa — giving businesses real-time access to key insights from data — using natural, spoken language.
My research project created during a 2020 Fellowship at the Urban Systems Lab (USL) in New York City is part of my ongoing work. It does not offer solutions to the problem but aims to present an outline of tests and experiments that demonstrate how sonification and other audio techniques can be used in an effort to extend impact, reach and engagement in data-driven storytelling.
In my previous company, our team developed TwoTone, a free and open source data sonification tool, with the support of Google News Initiative. Using this browser based web application, you can easily upload a dataset and turn it into sound or music. Projects we created included the Basque Country EUSTAT social behavior data to create a ‘Song of the Day’ and we also sonified all of NYC’s rodent complaints since 2010. Since its launch, I have continued to use this tool for academic research and educational purposes in data and technology courses I instruct at The New School, the Schools for Public Engagement and Parsons School of Design, additional research projects I have been awarded such as Parsons Design For Aging Research Fund, and as an online instructor for the Knight Center for Journalism in the Americas for the first trilingual MOOC on Data Journalism and Data Visualization Using Free Tools. TwoTone will also be a key tool for our research team at Sonify in the ‘Data-Driven Storytelling: Making Civic Data Accessible through Audio’ project supported by the Knight Foundation.
Benefits of Data-Driven Audio
- Understanding — Makes visual information easier to understand with an audio narrative
- Accessibility — Extends access to users who are blind or visually impaired, or seniors
- Distribution — Allows information to be shared across audio-first and screenless devices
Introduction to Research Project
Data sonification for impact using COVID-19, Climate and Social Vulnerability Index Data was the focus of my research project as a 2020 Faculty Fellow at the Urban Systems Lab, an interdisciplinary research, design and practice space at The New School. What began in January as a broader research project to examine and evaluate multiple technologies and methods (using visualization, virtual reality and audio tools) had transformed by March to an exploration of COVID-19 data.
As a New York City resident for 15 years and new mother to then a five month old baby girl at the start of quarantine, my research needed to respond to the unprecedented moment I was living in and the COVID-19 data that was unfolding in real-time on a global scale, experienced through map-based data visualizations — mainly the COVID-19 Dashboard by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU) and journalistic reports such as The New York Times Covid World Map. These visualization platforms became increasingly valuable at a national and local level for viewing infection rates, hospitalizations and deaths data in the United States by County, by State and Zip Code.
These highly detailed and interactive maps presented critical information to aid in understanding the spread of the virus but I felt that the exclusively visual rendering of the information could restrict both the type and size of the potential addressable audience for this material. Could these maps use an accompanying soundscape to share data sonification and insights through audio narration for journalistic data stories? Could this data be filtered to use sound to alert us when the infection rate was increasing in our city? Would it be useful to listen to the fluctuations, patterns and anomalies in data across the world, our country and other counties to monitor changes in real-time?
The decision to focus my research on sonification and audio techniques for data-driven storytelling to communicate findings in COVID-19 data was also propelled by two indirect events:
First, the Urban Systems Lab launched Social Vulnerability, COVID-19 and Climate publishing resources including USL’s own COVID-19 research. The USL team began asking the question — are those people who are most vulnerable to climate impacts in NYC also likely to be most impacted by COVID-19? The USL team published research such as COVID-19 Social Distancing and Air Pollution in New York City, and Urban parks as critical infrastructure, equity and access during COVID-19 using the USL Data Visualization Platform, providing critical reports published in the form of blog posts and research papers. USL also was conducting a nationwide survey on the ‘Perception and Use of Urban Parks and Natural Areas During COVID-19 Social Distancing”, a study on how people are using and perceiving urban parks and natural areas during the pandemic and how this may affect their mental and physical wellbeing. The study was led by USL with project partners Building Healthy Communities NYC, The New York State Health Foundation, and The Nature Conservancy. The resources being shared through our lab Slack channels and the public facing USL website were inspiring, the pace at which the lab was conducting analysis and publishing research was uplifting and hopeful — I was a part of this team and I wanted to contribute my research efforts to exploring COVID-19 data during this uncertain time. I had shared initial COVID-19 United States daily counts time series data sonification experiments with USL colleagues at one of our internal lab meetings which sparked invaluable feedback and collaborations, including offers to explore adding sonifications to map-based visualizations and spatial data stories.
Second, the additional support of my data sonification research by Knight Foundation through a civic data innovation project grant was announced in June of 2020. The project is a collaboration between my company Sonify, Inc, the Wichita Community Foundation, Envision and local newsrooms in Wichita, Kansas on a year long project titled ‘Data-Driven Storytelling: Making Civic Data Accessible through Audio’. The goal for this project is to learn how to use data-driven audio as a way to enable storytelling, to share news and information with people who are blind or visually impaired and examine how to communicate data with sound, with the potential for wider applications.
What does the societal impact of Coronavirus COVID-19 sound like in urban environments, specifically in my home of New York City that was once the epicenter of the virus? How might the technique of data sonification, the process of turning data into sound, provide a new sensory experience to better understand this data? What value does sonification add to a chart or graph? What can we learn from data by listening to it on its own or in tandem with a data visualization? Could a map of NYC have an accompanying soundtrack to communicate health disparities and structural inequities of the geographic spread of pandemic per zip code? How might sonification complement traditional visuals such as a line chart? Is a map or bar chart enough, or does sonification and visualizations with sound provide a useful and meaningful resource to understand multiple dimensions of how COVID-19 may impact communities? Can we expand the limited definition of sonification as non-speech audio to include other elements of speech and sound?
Research Steps & Process — Exploring the Data
Just prior to the pandemic, I began with climate data to test my sonification research experiments. In April 2020 with my focus on COVID-19 data, I began to test the US Daily Counts dataset by The New York Times published on github. I shared this initial sonification test below with my USL colleagues at an internal meeting:
Coronavirus COVID-19 data in United State new reported cases and deaths by day from January 21st to April 29, 2020 and in New York City from March 1st 2020 to April 29, 2020 with Cases represented by Piano and Deaths represented by Oscillator. We can hear the impact of the Coronavirus COVID-19 pandemic as the death toll quickly begins to rise along with the number of infection rates in April. When I produced this sonification on April 29, 2020, the total US Daily Counts for coronavirus cases was 1,039,319 and number of recorded deaths was 55,417. Since the NYT is publishing regular updates to the repository, the new figures are 1,045,399 cases and 60,930 deaths. New York City was the epicenter of the pandemic at this time. According to the most recent data, New York City had 170,124 recorded cases and 17,597 deaths.
As I continued with daily counts tests, I began discussions with Christopher Kennedy, Assistant Director of the Urban Systems Lab, to explore leveraging my sonification practice for the USL COVID-19 research that was underway and invited collaborations from among the core lab team. With USL team feedback, I set out on a plan to sonify the New York Department of Health NYC Coronavirus Disease 2020 (COVID-19) Data, starting with time series data to sonify the number of NYC cases, hospitalizations and deaths. This would provide me with basic sonification examples and a comparison with the visual line charts (see images below for comparison) as published on the Health Department’s COVID-19 data webpage. New York State COVID-19 Data Daily Counts of Cases, Hospitalizations and Deaths data, provided by the Department of Health and Mental Hygiene (DOHMH), was also made available on Open NY and updated on a daily basis. These basic examples were produced using TwoTone and the output included video recordings of the sonification prototypes.
Bar Chart & Line Chart Sonification Examples
The Sound of Coronavirus Cases in New York City
Daily Counts (thru September)
- Single Instruments (turning data into sound)
Cases (Piano), Hospitalizations (Double Bass), Deaths (Church Organ)
- Multiple Instruments Combined (turning data into music)
- Adjusted Duration Experiments (increasing the tempo of the track)
Below are the Sonification Examples and corresponding DOH Graphs. The first series shows the sonification test using single instruments to easily understand each column of data, cases, deaths and hospitalizations by listening to each column individually, just as you would sort through the visual chart. The second series is a test adjusting the duration of single instruments and the third series is a test combining the instruments and also adjusting the duration which turns the data into music. I created these examples in September and October, using the latest available COVID-19 NYC Daily Counts data available on the DOH repository.
Sonifications of COVID 19 Daily Counts (Cases, Hospitalizations and Deaths) February 29, 2020 to September 15, 2020 Using Single Instruments
Sonifications of COVID 19 Daily Counts (Cases, Hospitalizations and Deaths) February 29, 2020 to September 15, 2020 Using Multiple Instruments and Adjusted Duration Experiments.
Sonifications of COVID 19 Daily Counts (Cases) February 28, 2020 to October 7, 2020 Single Instrument and Adjusted Duration.
COVID-19 Daily Counts by Borough with Audio Narration
Below is a test of using sonification and audio narration for daily counts by Zip Code. We can listen to the data by borough with the total number of COVID-19 deaths both sonified and narrated.
Additional Sonification Tests
I also developed quick sonification tests building on the work of fellow USL researchers, following the coverage on the USL site as work was published. For example, USL colleague Avigail Vantu developed an analysis on the availability and access to COVID-19 testing kits in NYC published in April using data from the Department of Health and Mental Hygiene. Her analysis shows that on April 8th, the number of positive cases ranged from as low as 8 to as high as 1,728 per zip code and she published the top 5 zip codes with the largest number of people testing positive as of April 9th. Scraping this tabular data from her published research (see graphic below), I quickly developed a sonification of this data giving each zip code an acoustic signature.
Another quick sonification example using analysis by USL colleagues included Ahmed Mustafa and USL Director Timon McPhearsons’ work on COVID-19 Social Distancing and Air Pollution in New York City which delves into complex air pollution and weather data, including satellite measurements of Nitrogen Dioxide (NO2) data, to understand a potential relationship between cleaner air and stay-at-home orders.
Excerpt from the authors analysis:
What Does the Data Show?
Figure 1 shows the monthly average NO2 in 2019 and the first months of 2020. The figure reveals that NO2 concentrations tend to be lower in the spring and summer than in the fall and winter. This is in line with other studies that have detected similar seasonal patterns in Cabauw (The Netherlands) and Calcutta (India) (Demuzere et al., 2009; Mondal et al., 2000). During cold seasons, atmospheric stability, as a result of frequent inversion layer that happens when the upper air layer is warmer than a lower one, leads to the accumulation of pollutants (Tiwari et al., 2015). Although it is normal to see lower NO2 levels in March and April than January and February, there is a notable drop in 2020 compared to 2019. This can be “partly” correlated with the COVID-19 lockdown in NYC which began in mid-March due to COVID-19 stay-at-home orders. The annual drop in concentrations of air pollutants including NOx (Figure 2) in the USA, that is largely driven by federal and state implementation of air quality regulations (Sullivan et al., 2018), can easily confuse the relation between potentially cleaner air and the COVID-19 lockdown in NYC.
More importantly, variations in weather conditions are substantial determinants in NO2 and other air pollutants concentrations (Borge et al., 2019). For example, high wind speed causes the dispersal and dilution of pollutants. Wind can also blow NO2 from areas that have higher NO2 concentrations, e.g., industrial areas, to residential areas causing increased NO2 levels. Precipitation washes out the air and can relatively reduce pollutants in the air whereas air temperatures play an important role in the chemical reactions of pollutants in the air.
Geospatial Example in Collaboration with USL team — COVID-19 Neighborhood Impact Sonification
The above video demonstrates an animated, sonified geospatial data visualization of Coronavirus COVID-19 impact by neighborhood in New York City, sonifying two social impact variables. First, the percent of population below the poverty line and second, the percent of population infected with Coronavirus COVID-19. You will hear a sonic legend. The higher the percentage of positive COVID-19 cases, the higher the pitch. The higher percentage of people living below the poverty line, the higher the distortion. We have selected 10 zip codes in NYC to tell this story in sound. As we travel between each zip code you will hear the sounds of a quiet New York City in quarantine to represent our travel between each distance. Data sources include the New York Department of Health Coronavirus Disease COVID-19 Data by Zip Code and US Census Bureau American Community Survey Social Vulnerability Index data for percentage of people living below poverty.
The purpose of this video excerpt is to present methods for communicating data with both audio and visuals. Narration, text-to-speech, ambient sounds and data sonification are combined with dynamic map-based graphics and text to demonstrate the concept that an audiovisual rendering of data can be more impactful for a non-technical audience than a silent visualization.
For this collaboration, I focused on a geospatial example with USL colleagues Christopher Kennedy, Claudia Tomateo, Pablo Herreros-Cantis using the New York Department of Health Coronavirus Disease COVID-19 data by Zip Code and US Census Bureau American Community Survey Social Vulnerability Index data for percentage of people living below poverty. The Urban Systems Lab had been analyzing a range of key social vulnerability indicators against cumulative percentage of tests with a positive result per zip code in New York City and was publishing this analysis on the USL site. This would be considered a custom sonification requiring the talents of a diverse team bringing together their interdisciplinary design, research and practice in science, urban resilience, social equity and environmental justice, data analysis, visualization, computation, graphic design, animation and sonification.
COVID-19 Data Visualizations by The New York Times
Using the DOH COVID-19 Data by Zip Code, via the data repository of the Health Department, my USL collaborators and I set out to create a geo-located example using COVID-19 NYC data and Social Vulnerability Index (SVI) Data by zip code, specifically from the United States Census Bureau American Community Survey (ACS) 2018 data, 5-year estimates, using the percentage of people below poverty.
We decided for our initial experiment to focus on a basic 2D map to test our methodology, design a production workflow and create a locked video animation as shareable, embeddable content. This would provide our team with a good starting point for collaboration and produce a sonification example that could be iterated on. We focused on ten zip code data points to further constrain our initial test.
Pablo normalized the DOH and ACS datasets and recorded customized audio files using his bass guitar and Abelton, an audio interface tool. He describes the steps: “As in the TwoTone platform I used the pitch (notes) to represent the number of COVID-19 cases, the higher the pitch, the higher the percentage of the population affected by coronavirus. On top of that, I added a filter called a Chorus. The Chorus effect takes an audio signal and mixes it with one or more delayed, pitch modulated copies of itself, creating an oscillation. The higher the percentage of the zipcodes population living under poverty, the higher the influence of this filter. In the audio, the first sound corresponds to a zip code with low covid presence, and the lowest population living below poverty — hence the sound comes out clean. The second sound has the same pitch, because the covid incidence is similar, but a higher percentage of the population is living in poverty. As you can hear, the sound is distorted.”
Below is a screenshot of the ten zip codes, normalized data and corresponding pitch:
Pablo sent the 10 separate audio files through to our team, one per zip code, along with zip code shape files with the column showing the order for the 10 zip codes considered. To clarify the audio files, the field ‘order’ provides the order in which the audio plays them and an audio track for each zip code with that number is used as their ID number. All Zip Codes except the ten defined Zip Codes were categorized as 0. Then, four audio files for the sonic legend were created to accompany the map for the different possible extreme cases (0,0; 100,100, 0,100, 100;0). The first number corresponds to the normalized percentage (100 means the highest value in our 10 zip codes sample) of population infected and the second to the percent of population living below poverty. These four legend settings were defined as below. Once these audio files were generated, the next step was for Claudia to then view all the assets and begin to integrate into the map-based visualization beginning with a base map and then adding animation with zoom in and outs between zip codes and adding additional visuals. We also discussed identifying a ‘traveling sound’ to indicate movement through the spatial visualization from one zip code to the next. For this we decided on the use of white noise so as to differentiate it from the customized audio files. Finally, I recorded audio narration for the project title, description, sonic legend and credits and recorded zip code data using TwoTone text-to-speech and provided these assets to Claudia to edit into the final video.
A graphic visual representation of data is not the only available option. This information in itself is still new and surprising, even for some seasoned practitioners in the field.
While incremental improvements to visualization are necessary (such as moving from 2D to 3D with virtual and augmented reality) and efforts to secure new sources of data are ongoing, there is an urgent need to build strategies that address the problem of impact, to ensure that data platforms can resonate with the communities they seek to inform.
Exclusively visual systems are not accessible for users who are blind or visually impaired and are not usable on the new generation of screenless devices with voice interfaces. An audio option has the opportunity to be highly disruptive and scalable for both commercial and societal impact. The goal of data-driven audio should be that the information can be fully understood by itself or in tandem with a visualization. Collaborating, building and testing ‘with’ instead of ‘for’ the blind and visually impaired community is an essential step towards reaching this goal.
Leading technology companies are making accessibility a priority. Microsoft’s ‘Seeing AI’ is a talking camera app to allow blind and low vision users to read mail and documents, Google’s ‘Live Caption’ helps users transcribe audio and video, and Apple’s ‘VoiceOver’ describes what is happening on a screen. Smaller developers are building skills and actions to extend accessibility for voice assistants like Alexa, Siri, Google Home, and Cortana. Sonification technologies exist in research labs, cybersecurity and analytics companies but are designed for use mostly on individual one-off projects.
Sonification is just one piece of the puzzle for a data-driven audio solution. The workflow at present, as outlined in the sonification examples I’ve shared on this post, is an unnecessarily time consuming and complex manual process at a time when automation is becoming a common element in almost every vertical. Technology is changing rapidly and artificial intelligence combined with sonification, audio and narration could offer exponential future potential for innovation.
I’d like to thank the Urban Systems Lab at The New School for supporting this research through the 2020 Faculty Fellowship and for the entire USL team for their feedback and collaboration. Special thanks to my colleagues Timon McPherson, Christopher Kennedy, Claudia Tomateo, Pablo Herreros-Cantis, and Ahmed Mustafa. I’d also like to thank my Hugh McGrory, my husband and co-founder of Sonify, Inc., for his support while I developed this research project and for his writing edits on this blog post.