Recently, the Eliiza team decided we wanted to wrangle some large, local, geospatial data to build something insightful, useful & pretty.
You can play around with what we ended up building here https://demos.eliiza.ai/melb-parking/
We saw this as a good opportunity to leverage:
- Mapbox — an open source mapping platform for custom designed maps
- BigQuery — GCP’s serverless data warehouse
- City of Melbourne’s Open Data Portal
We were keen to build something we could dogfood.
Oh, and did I mention… we wanted to do all of this in a week.
We decided to loosely follow Andrew Ng’s iterative machine learning life cycle of Idea-Code-Experiment.
We explored the open data available on the City of Melbourne’s portal and settled on the parking sensor dataset since it was comprehensive and met our requirements.
After briefing our loved ones, who confirmed there was an appetite for a parking aide, it was full steam ahead! This was a pleasant surprise since most of us expected a glazed look / yawn, see below.
The value we chose to predict and visualise was the probability of a parking bay being occupied given the day-of-week and hour-of-day.
Hang on a second, isn’t that just a convoluted way of saying the average?
Yep, that’s right, as per the machine learning life cycle we wanted to build and train a basic system quickly. From there, we rapidly experiment and iterate.
34GB of data was recorded from almost 20,000 sensors between 2011–2016. We streamed this data directly from the data portal to a Google Storage bucket in the cloud.
From there, we wrangled the data using BigQuery, pumping out some hefty SQL queries, quickly and cheaply. This is not something we would have been able to achieve on our laptops.
(Watch this space for a more in-depth blog on the work we did in BigQuery)
After spending some time understanding and verifying the data, it was time to build and train our first basic model, ensuring we had accounted for any anomalies that had surfaced during the discovery process. The anomalies consisted of the usual culprits — date & time formatting and missing data.
The results of our basic model were then measured against randomly chosen unaggregated data using the Brier score and…
The model performed better than chance!
Our model’s Brier score was 0.184, successfully beating a chance Brier score of 0.25 which assumes 50% occupancy. Great result, but clearly more work to do.
Using Mapbox’s data-driven styling we assigned a colour to each parking bay throughout Melbourne based on the occupancy rate from the model. The colours range from red — “probably occupied” through to green — “probably available”. Select a different day/time and the map updates the occupancy colours accordingly.
Check it out here.
Not surprisingly, the map suggests staying away from the CBD during business hours and the Queen Victoria Markets on the weekend, assuming your goal is finding a park. Although not super insightful, this gave us confidence that our model was behaving correctly.
(Watch this space for a more in-depth blog on the work we did in Mapbox)
- add more features… holidays, events and weather
- use the live data stream from the sensors to improve the model
- feed the data into more sophisticated models… linear regression and XG Boost
- get feedback from the users
- roll out to different cities
- lots of valuable open data waiting to be explored
- the tools required are available at a reasonable cost if not for free
- there’s an appetite for a parking aide
- it’s possible to ship an idea within a week