OpenAQ’s Remarks at NIH’s Open Data Symposium on Dec 1

These remarks are from the Open Data Symposium for the launch of Phase II of the Open Science Prize. Support open data and open science communities through your vote in the Open Science Prize!

Good morning. I’m Christa Hasenkopf with OpenAQ, based here in DC and this is Olaf Veerman from Development Seed, which is also based in DC, as well as Lisbon, Portugal. We are just two members of our grassroots & global community that is building our open-source platform, OpenAQ, and we’re really excited for this opportunity to share our prototype with you today!

The purpose of OpenAQ is to fight air inequality through open data and community. If you’re not familiar with the term ‘air inequality’, it may be because we made it up... It’s simply a term we use to express the unequal access to clean air to breathe across the world, resulting in 5–7 million deaths each year.

This inequality shows itself on a country-wide level when you look at GDP per capita by country versus annual average PM2.5 — or smoke and dust — concentrations. You can see a clear trend of lower GDP per capita and higher air pollution levels in countries in Asia and Africa.

At a more visceral level, you can see and taste air inequality when you fly into a place like Delhi, where we were last week for a workshop, funded by the Open Science Prize.

And this graph shows air inequality from a research and ultimately a data perspective:

The blue dots are cities that have been relatively well-researched in the scientific literature regarding pollution, and significant number of them have relatively low pollution levels.

Contrast that with these red dot places. These were, in 2011, the top 10 most polluted places on the planet — for which there were data.

And, if you sum up all of the air pollution-related papers published at that point for these red dot places, you get this dashed red line — 41 papers. That’s 5 times less papers — for all of these cities — than there exists for Houston, Texas, which has an order of magnitude of cleaner air.

This research gap isn’t just about ‘fairness’ in scientific research — it’s about a fundamental lack of scientific progress because of this gap.

From multiple large cohort epidemiological studies — many of which have relied on open air quality data from governments, we have a robust scientific understanding of the impact of air pollution on health in places like the US and EU. We lack this same scientific, policy-relevant understanding in places with severe air pollution, where billions live.

And as our Canadian teammate and member of the Core Analytic Team for the Global Burden of Disease, Prof. Mike Brauer points out:

…health scientists are routinely faced with the lack of a comprehensive platform to aggregate air quality data — data which we desperately need to assess health impacts and to develop effective policies.

So, a little over a year ago, we decided to tackle this gap in data that the scientific — as well as policy, media and other public communities — so clearly needed filled. We knew there were millions of data points of air quality publicly shared each day in disparate and often temporary forms by various entities across the world.

We decided to capture these data before they are lost, put them in one universal format, and make them available to anyone through an API. We made the whole platform open-source so others around the world could help us. We had built a robust system to wrangle these data, but we didn’t have an effective way to make the data truly accessible to scientists and the public. That’s where the Phase I funding for the Open Science Prize came in.

Olaf Veerman of Development Seed walks audience through the platform.

Since we launched this platform in September, we’ve seen our traffic triple compared to the same period of time as last year. In addition to the stats here on the platform itself, we also have found 15 open-source repositories that call our API. We have also encountered scientists who have submitted proposals to US science agencies, an EU agency and UNICEF, that use the OpenAQ platform.

And more than these stats, we have realized the data and the platform are just fuel for something much larger: our community. It’s because of our community that Hawa Badlo, an air quality awareness campaign in India, began using data accessed from our platform to reach out to their network of more than 14 million people.

It’s also because of our community that we were asked to submit a commentary to a South African journal on the powerful role governments and researchers can play in sharing their air quality data. 12 scientists from 10 countries, from the US to Egypt to Spain are working together on this piece.

So, this is all to say, one key lesson we have learned so far is that open data and open science are powerful, but they’re nothing, they have no force, without a community around them.

For the future, one of our main priorities is to continue working toward getting complete global coverage of existing available public ground monitoring air quality data.

Our community is eager for us to find ways to connect low-cost sensor and satellite data into our system. This presents a lot of technical ‘big data’ and scientific challenges, but there is a clear need for a system that integrates these different types of data.

We are working toward expanding our in-person workshops in places like Santiago, Hanoi and Warsaw, and developing tool-kits for others to host them.

Lastly, I want to convey on behalf of our community that we feel a true urgency for this work. Not just because lots of data we don’t capture yet are otherwise lost for public access, but the problem itself is so urgent. I mentioned 5–7 million deaths per year are due to air pollution. But big numbers like that are hard to conceptualize. Put another way, that’s 120 lives lost since Olaf and I began talking. To empower people to change this, we firmly believe in the role of open data and community to spur policy-relevant science and public engagement.

