Universities with the best air quality in the US

Michael Campbell
INST414: Data Science Techniques
3 min readMar 15, 2022
Photo by Ryan Jacobson on Unsplash

As I was getting started, I had a lot of trouble deciding on a topic. I had a rough idea of the API that I wanted to use but not a question that I wanted to answer. My API of choice was MapQuest which gives you all the weather information given at a station given the (longitude/latitude). From here, I decided that I wanted to use this API to look for the universities in the United States with the best air quality. My thought was someone with a respiratory disease might be able to look at this and decide what school they want to go to. In order to do this, I knew that I would need a list of universities in the United States. I had a bit of trouble finding a large list of universities because most of the lists only give you the top 50 schools. The best website that I found was the 4ICU website which gives you a pretty good amount of information. I used requests and BeautifulSoup to scrape the college names, location, setting, ranking, staff count, and student count. I thought that this would all be relevant information for a prospective student looking for a new college. From here, I used the geocoder python library along with the MapQuest API to get the longitude and latitude. This was relatively easy, but I had to convert the schools’ states into their abbreviated forms which was actually pretty easy because I found a dictionary for states and their abbreviations online. From here, I plugged the location into the OpenWeather API to get the air quality index. All the information that I acquired were put into separate lists which were converted into series which then get put into a dictionary so that I can create a data frame through Pandas.

Data Exploration

This is the full dataset:

From here, you can see what the universities with the best air quality with the highest rank.

We can also group the schools by region to see which regions have the best air quality. In this table, you can see mean of the air quality for each region which can help someone decide which region would have the highest air quality.

If you wanted to look through air quality by state, we could also do that:

One of the most surprising things was that urban areas had the most universities with high air quality although rural areas have the lowest air quality. When I thought about it more, it makes sense since they are just more schools in urban areas in general.

Conclusion:

If someone with a respiratory disease wants to find a university with high air quality, they can still find high ranking universities. States with more rural areas tend to have higher air quality but other urban universities still have good air quality.

Problems Encountered:

At the start of the assignment, I had started with another API for the weather information called IQAIR but I wasn’t getting the correct information returned. I was also using Geonames to lookup the locations of each school before but I later found that the link has a lot more information. The thing that took the longest was trying to collect the information from 4ICU website since the information that I wanted didn’t have any special ID or tag related to it.

Link to code: https://github.com/mac5617/414/tree/main/Assingment%201

--

--