5 Ways to Use Free Data on the Internet for Science

Proto Bioengineering
7 min readMar 26, 2023

--

There is one major roadblock to science when we don’t have real data: we don’t learn that well.

A polar bear standing on its hind legs and waving.
Photo by Hans-Jurgen Mager on Unsplash

When it comes to scientific research, our learning is stifled when we don’t have real data to work with, for both us as individual scientists and the entire field of research. Whether we’re disengaged by the simplicity and irrelevance of fake data or our research is slowed to a crawl due to the replication crisis, the availability of real data empowers scientists of all fields, at all levels.

Fortunately, government agencies and non-profit institutions around the world provide data sources and APIs on the internet, where we can get scientific data for free.

Here are a few examples with tutorials on how to use Python, R, or Bash to get data for science.

After you’re done with this, also check out the Awesome Public Datasets repo on GitHub for 100s of free data sources.

Table of Contents

  1. Ask NASA How the Arctic Ice is Doing
  2. See How Fast Rivers are Flowing in Real-Time with USGS Data
  3. Check the Air Quality in Cities Around the Globe
  4. Get Details on Every FDA-approved Drug
  5. Find Where the International Space Station is Right Now

1. Ask NASA How the Arctic Ice is Doing

Ice height data captured by ICESat-2 and visualized by NASA.

NASA has two ice satellites that track the levels of ice on the North and South Poles with lasers. Yes, space lasers.

NASA’s ICESat-2, launched in 2018, shoots 6 lasers at the Earth as part of a LIDAR system, called ATLAS. ATLAS then measures how quickly the laser light bounces back and calculates the changing height of sea ice, land, and vegetation on Earth’s poles over time.

What’s more is you can get all of this ice data with code from the Open Altimetry API.

The orbital path of NASA’s ICESat-2 ice satellite over a Mercator projection map of Earth.
The orbit of the ICESat-2 satellite

The Open Altimetry API provides data on ice, vegetation, and land height as captured by both ICESat and ICESat-2. It stores data going back to 2003, thanks to the original ICESat that was launched that year, as well as higher resolution data (10,000 measurements per second) from ICESat-2, starting in 2018.

We have a tutorial here on how to get the ICESat-2 data from the Open Altimetry API. This tutorial helps you get the data with R, Python or Bash (the command line language).

2. See How Fast Rivers are Flowing in Real-Time with USGS Data

Photo by Shane Smithrand on Unsplash

The US Geological Survey (USGS) is a team of thousands of scientists, who track the changing American landscape every day. Their work involves using both digital and old school tools for measuring the Earth beneath us with precision.

One of these digital tools is a streamgage, which measures how quickly and intensely rivers around the country are flowing (AKA “streamflow”).

A streamgage along Owyhee River in Crutcher Crossing, Idaho. Image from USGS.gov.

There are over 10,000 of these streamgages scattered along waterways across the United States. You can see all of the measuring stations online at the National Water Dashboard.

USGS APIs are some of the trickiest APIs to understand, probably because geologists would rather be outside hiking than sitting at a computer writing the perfect API. Even navigating the USGS’s site to find the APIs is itself a bit of a maze.

However, we have written an easy how-to for getting data from the Water Services API, which will tell you how much water is flowing at any of the 10,000 points above. Then you can do things like write code to check the streamflow throughout the day, compare streamflow across seasons and years, or correlate streamflow with the health of local flora and fauna.

Check out our tutorial article here.

3. Check the Air Quality in Cities Around the Globe

The quality of the air outside people’s doors is not the same around the globe. Developing nations, like China and India, have particularly bad air quality with pollution scores 10 times higher than Europe and the Americas.

Data on air quality can be correlated with health conditions like asthma and COPD and used to study pollution’s effects on atmospheric science.

Photo by Photoholgic on Unsplash

Two big resources available to check the Air Quality Index around the globe are:

The Air Quality Index is a scale of 0 to 500, with 500 meaning the air will cause health issues in most people. However, a score of just 150 is rated as “unhealthy” to breathe.

The OpenAQ API

OpenAQ provides data for mostly North America and Europe, with a smattering of spots on every other continent.

The OpenAQ interactive air quality map.

A Proto tutorial is not yet available for this. However, OpenAQ has example code ready for you to use on their Recipes page.

A Python example from OpenAQ’s Recipes page.

And each API function has example code for every popular programming language on their documentation page.

Start playing with OpenAQ air quality data here.

The World Air Quality Index map

An alternative with more detail on African, South American and Asian countries is the World Air Quality Project.

They provide specific data for each city that they monitor, down to the levels of each gas in the air.

Air quality data for Beijing, China.

The World Air Quality Project does not provide a single API for its data, but it does provide documentation on the 50,000 different stations around the world that it draws its data from. And each page for each city has detailed graphics for each facet of air quality.

Air quality data per month for Beijing since 2016.

Uses for This Data

Air quality is both an indicator and a predictor of many facets of life on Earth, such as the health of humans and animals, socioeconomic status, atmospheric science, and more. By using air quality data, you can do things like:

4. Get Details on Every FDA-approved Drug

Photo by Roberto Sorin on Unsplash

The US Food and Drug Administration (FDA) approves all of the medical devices and drugs that are manufactured and sold in the United States. In 2023, their database contains over 26,000 different drugs and medications, which we can get the data for using the openFDA API.

The openFDA API is actually a collection of APIs, which each provide data on drugs, medical devices, adverse drug event reports, recalls, tobacco, and more. They also provide interactive charts to explore the data for each API.

The Device Recalls interactive chart from the openFDA Device API.

You can use data from any of these APIs for your own research or to help others’ research along, like Wizmed does, which is an app that simplifies pharmaceutical research.

A preview of Wizmed’s apps from their website.

To get started on your own FDA data project, check out our tutorial on how to get data on the 26,000+ approved drugs from openFDA with only one line of code.

(Update: we have also uploaded the data to Kaggle in ready-to-use CSVs. Open the data in a Kaggle notebook here and start exploring.)

Example apps and ideas of for uses of openFDA are also available here.

5. See Where the International Space Station is Right Now

Photo by NASA on Unsplash

The International Space Station (ISS) is orbiting the Earth at 17,400 mph right now (or 27,998 kilometers/hour). It’s location can be found every second of every day with the Open Notify API.

If you want to get its latitude and longitude as of this second, you can get the API data in your browser by clicking here.

We also have a tutorial for how to get the ISS’s location from the API with code. This tutorial covers how to get ISS data with Python, R, and Bash.

Or check out how to track the ISS’s location with a live map using Python and Plotly.

Additional documentation on the Open Notify API is also available here.

Questions and Feedback

If you have questions or feedback, email us at protobioengineering@gmail.com or message us on Instagram (@protobioengineering).

If you liked this article, consider supporting us by donating a coffee.

More Free Data Sources

Related Articles

--

--

Proto Bioengineering

Learn to code for science. “Everything simple is false. Everything complex is unusable.” — Paul Valery