3 Learnings from Using GovHK Journey Time Data API
The Good Data Team(TGD) is a local Hong Kong team of technologists and data scientists and thus one of the plan of the team is to explore various Hong Kong government open data API and actually try to use it!
To give some background, the government announced their Open Data plan in 2018 with the target of launching 700 dataset from more than 80 government divisions. Critics on the data has complaining about the data quality, usability and update frequencies. Before we comment on this, we decided to test it out and share with you our findings. The first Open Data API we tested is the “Journey Time Indicator” API. Before we deep dive into our findings, let’s take a quick overview on what is “Journey Time Indicator” API:
There are currently 10 sets of Journey Time Indicators (JTI) installed on major routes on Hong Kong Island and in Kowloon to provide the latest cross harbour journey time and 5 sets of JTI installed on major routes at the New Territories to provide the latest journey time to Kowloon for motorists to make an informed route choice. The displayed journey time refers to the average journey time of vehicles from the JTI to the respective destinations. The journey time XML file is updated every 2 minutes. — Data Specification for Journey Time Indicators
We built a data processing pipeline on Google Cloud (will be explained in another article) for this API. The pipeline grabs the latest data every minutes, transforms it in a way we think would be best practices and store it for later usage. Throughout the process, there are 1 good thing and 3 areas of potential improvements we would like to share.
1 Good Thing
The API is very stable and we encounter 0 downtime during our test (and our data pipeline is still running). Not quite sure if this is due to great scalability of the services or low usage rate.
3 Areas of Potential Improvements
- Data Encoding
According to the data specification, there were multiple data fields encoded by numbers (e.g. COLOR_ID = 1, 2)or meaningless code (e.g. LOCATION_ID = H01, H02). It is totally understandable that it might be trivial internally to use these IDs. However, this might not be best practice to share data in this format for a simple reason — maintainability for data users. Imagine at some point in the future, the publisher want to update their schema for some reasons by changing the definition of 1, 2, 3 or adding an extra code, the data users who have no idea about this might wrongly interpret the data. Congested traffic might become smooth traffic and vice versa. This is especially important for Open APIs that require no registration process as the publisher have no means to update the users for API changes.
2. Unused Data Specification
During the data exploration phase, the first thing to do is of course trying to understand the schema. There was one description caught our attention:
When “JOURNEY_TYPE = 2” — “1” means traffic congested. Traffic congestion bitmap will be displayed.
We are then very interested in how a bitmap will get included in the API so we try looking for peak hours, protest days etc, just to take a look on the bitmap. However, we were not able to find a day with “JOURNEY_TYPE=2”! Here, we are not sure why we were not able to identify an hour with congested traffic but the API specification did not mentioned when “JOURNEY_TYPE” would be 2 neither. If that is something in the future roadmap, it would be great to include a potential launch date so data engineers could decide how and when to support that.
3. Geo-coordinate System
In the specification, a coordinate of the traffic tracker is also included but in HK1980 format (e.g. 835776.133E, 815604.834N). This is not a universally supported format. If we are looking at Google Maps and other mainstream location and geo-related application, WGS84 is a more popular and well-known format at the moment. Data Engineer who are planning to further leverage this HK1980 coordinates will need to go through an extra step in converting the coordinates (we tried and details in another article).
These are all good learnings on designing Open Data APIs or even internal APIs within an organisations. Going forward, we are going to test more of the HK Government APIs with similar approach. Stay Tuned!