Visualize Open Data using MongoDB
Using Python to connect to Taiwan Government PM2.5 open data API and upload batch data to MongoDB — Part 1
Goal
MongoDB is the most popular NoSQL database in the world currently and is quite simple to use.
I gave it a try by quickly documenting the government’s open data API’s PM2.5 monitoring data (which is in JSON format) and upload to MongoDB for both storage and visualization.
The demo looks like this:
What I’m going to do:
- Connect to API
- Select data points I would like to show on my visualization
- Convert time format from ISO-8601 to UTC, also change time to local time zone
- Upload data using MongoDB query operators
- Create dashboard on MongoDB
So, let’s get started.
Process
Import all required libraries:
Connect to API from this page 環保署智慧城鄉空品微型感測器監測資料 with “requests”:
Make sure URL parameters followed API instruction — here I set it as showing only PM2.5 data, only the latest value, randomly choosing n=100 stations’ data to read, and only showing those observation values> 0.
Note that the way how I convert time format is like below (used an example to show here) because I notice those sensors’ time format is ISO-8601 and it’s in Greenwich time (UTC-0) :
iso8601_utc0 = “2020–08–18T00:41:38.000Z”UTC_0 = dateutil.parser.parse(iso8601_utc0)UTC_8 = UTC_0.astimezone(pytz.timezone(“Asia/Taipei”))
Try to print out the very first value to double-check. All looks good:
{'name': 'PM2.5', 'stationID': '10062399613', 'observedArea': {'type': 'Point', 'coordinates': [120.2777716, 23.047355]}, 'iso8601_UTC_0': '2020-08-19T02:33:59.000Z', 'UTC_0': '2020-08-19 02:33:59+00:00', 'UTC_8': '2020-08-19 10:33:59+08:00', 'result': 5.0, 'unitOfMeasurement': 'μg/m3'}
Next, before I started anything on MongoDB Atlas (cloud platform), I used the following steps to launch a free cluster for this demo. Here’s how:
Then, create a database “test” and a collection “test” and configure them to connect my application to this database:
Now we’re ready to connect to MongoDB. Here I uploaded data with .insert_many. It’s one of pymongo python driver methods which allows us to upload multiple data entries on MongoDB.:
All are uploaded:
Note that MongoDB will provide each data entry with a unique ID by default “_id”.
Finally, we can start making charts!
Once we saved all of these charts, they will appear on an interactive dashboard:
Conclusion
So around 10:30 AM on Aug 19, 2020, I connected to Taiwan government PM2.5 monitoring API to see the latest air quality. The randomly chosen top100 data were processed, uploaded, and presented on MongoDB as an interactive dashboard. It looked like the intensity of PM2.5 was higher in the northern part of Taiwan. I can zoom in to see where each data point came from. Lastly, I can see how the average PM2.5 intensity changes over time across the entire island.
That’s all. Hope you find this helpful :)
Have a good day!