5 Ways to Use Free Data on the Internet for Science
There is one major roadblock to science when we don’t have real data: we don’t learn that well.
When it comes to scientific research, our learning is stifled when we don’t have real data to work with, for both us as individual scientists and the entire field of research. Whether we’re disengaged by the simplicity and irrelevance of fake data or our research is slowed to a crawl due to the replication crisis, the availability of real data empowers scientists of all fields, at all levels.
Fortunately, government agencies and non-profit institutions around the world provide data sources and APIs on the internet, where we can get scientific data for free.
Here are a few examples with tutorials on how to use Python, R, or Bash to get data for science.
After you’re done with this, also check out the Awesome Public Datasets repo on GitHub for 100s of free data sources.
Table of Contents
- Ask NASA How the Arctic Ice is Doing
- See How Fast Rivers are Flowing in Real-Time with USGS Data
- Check the Air Quality in Cities Around the Globe
- Get Details on Every FDA-approved Drug
- Find Where the International Space Station is Right Now
1. Ask NASA How the Arctic Ice is Doing
NASA has two ice satellites that track the levels of ice on the North and South Poles with lasers. Yes, space lasers.
NASA’s ICESat-2, launched in 2018, shoots 6 lasers at the Earth as part of a LIDAR system, called ATLAS. ATLAS then measures how quickly the laser light bounces back and calculates the changing height of sea ice, land, and vegetation on Earth’s poles over time.
What’s more is you can get all of this ice data with code from the Open Altimetry API.
The Open Altimetry API provides data on ice, vegetation, and land height as captured by both ICESat and ICESat-2. It stores data going back to 2003, thanks to the original ICESat that was launched that year, as well as higher resolution data (10,000 measurements per second) from ICESat-2, starting in 2018.
We have a tutorial here on how to get the ICESat-2 data from the Open Altimetry API. This tutorial helps you get the data with R, Python or Bash (the command line language).
2. See How Fast Rivers are Flowing in Real-Time with USGS Data
The US Geological Survey (USGS) is a team of thousands of scientists, who track the changing American landscape every day. Their work involves using both digital and old school tools for measuring the Earth beneath us with precision.
One of these digital tools is a streamgage, which measures how quickly and intensely rivers around the country are flowing (AKA “streamflow”).
There are over 10,000 of these streamgages scattered along waterways across the United States. You can see all of the measuring stations online at the National Water Dashboard.
USGS APIs are some of the trickiest APIs to understand, probably because geologists would rather be outside hiking than sitting at a computer writing the perfect API. Even navigating the USGS’s site to find the APIs is itself a bit of a maze.
However, we have written an easy how-to for getting data from the Water Services API, which will tell you how much water is flowing at any of the 10,000 points above. Then you can do things like write code to check the streamflow throughout the day, compare streamflow across seasons and years, or correlate streamflow with the health of local flora and fauna.
Check out our tutorial article here.
3. Check the Air Quality in Cities Around the Globe
The quality of the air outside people’s doors is not the same around the globe. Developing nations, like China and India, have particularly bad air quality with pollution scores 10 times higher than Europe and the Americas.
Data on air quality can be correlated with health conditions like asthma and COPD and used to study pollution’s effects on atmospheric science.
Two big resources available to check the Air Quality Index around the globe are:
- the OpenAQ API (Open Air Quality)
- the World Air Quality Index map
The Air Quality Index is a scale of 0 to 500, with 500 meaning the air will cause health issues in most people. However, a score of just 150 is rated as “unhealthy” to breathe.
The OpenAQ API
OpenAQ provides data for mostly North America and Europe, with a smattering of spots on every other continent.
A Proto tutorial is not yet available for this. However, OpenAQ has example code ready for you to use on their Recipes page.
And each API function has example code for every popular programming language on their documentation page.
Start playing with OpenAQ air quality data here.
The World Air Quality Index map
An alternative with more detail on African, South American and Asian countries is the World Air Quality Project.
They provide specific data for each city that they monitor, down to the levels of each gas in the air.
The World Air Quality Project does not provide a single API for its data, but it does provide documentation on the 50,000 different stations around the world that it draws its data from. And each page for each city has detailed graphics for each facet of air quality.
Uses for This Data
Air quality is both an indicator and a predictor of many facets of life on Earth, such as the health of humans and animals, socioeconomic status, atmospheric science, and more. By using air quality data, you can do things like:
- see how air quality affects asthma (for those in the United States, you can use the CDC’s asthma database)
- see pollution’s effects on meteorology around the world, or
- correlate it with this live, animated map of the wind in the United States.
4. Get Details on Every FDA-approved Drug
The US Food and Drug Administration (FDA) approves all of the medical devices and drugs that are manufactured and sold in the United States. In 2023, their database contains over 26,000 different drugs and medications, which we can get the data for using the openFDA API.
The openFDA API is actually a collection of APIs, which each provide data on drugs, medical devices, adverse drug event reports, recalls, tobacco, and more. They also provide interactive charts to explore the data for each API.
You can use data from any of these APIs for your own research or to help others’ research along, like Wizmed does, which is an app that simplifies pharmaceutical research.
To get started on your own FDA data project, check out our tutorial on how to get data on the 26,000+ approved drugs from openFDA with only one line of code.
(Update: we have also uploaded the data to Kaggle in ready-to-use CSVs. Open the data in a Kaggle notebook here and start exploring.)
Example apps and ideas of for uses of openFDA are also available here.
5. See Where the International Space Station is Right Now
The International Space Station (ISS) is orbiting the Earth at 17,400 mph right now (or 27,998 kilometers/hour). It’s location can be found every second of every day with the Open Notify API.
If you want to get its latitude and longitude as of this second, you can get the API data in your browser by clicking here.
We also have a tutorial for how to get the ISS’s location from the API with code. This tutorial covers how to get ISS data with Python, R, and Bash.
Or check out how to track the ISS’s location with a live map using Python and Plotly.
Additional documentation on the Open Notify API is also available here.
Questions and Feedback
If you have questions or feedback, email us at protobioengineering@gmail.com or message us on Instagram (@protobioengineering).
If you liked this article, consider supporting us by donating a coffee.
More Free Data Sources
- “Awesome Public Datasets” on Github
- The Best, Free, Open Data Resources List from freeCodeCamp
- List of Free, Open Data APIs on GitHub
- The AWS Open Data Registry