Simulating Jakarta’s BRT Real-Time Bus Tracking Using Kafka and Docker

Favian Hazman
Analytics Vidhya
Published in
4 min readDec 30, 2019

Better utilization of technology has enabled cities to interact and provide better services for their citizens. One of the main problems of cities around the world, and of course Jakarta, is mobility. By acquiring public transportation data at lowest possible latency such as location streaming and density, it would improve citizen’s experience in commuting and even better interoperability between public transportations. We built a real-time visualization platform for Jakarta’s very own bus rapid transit, Transjakarta (Thank you Trafi for the inspiration!).

Real-time tracking for Transjakarta Buses

On my final year as an IS student, i took Big Data Management course. For the final project, we’re challenged to build a paper and proof-of-concept (PoC) of a big data use-case, be it data engineering or analytics and prediction scope with a technology of our choice. Our team of 4 (kudos to Wikan, Usama and Edwin!) chose Apache Kafka as the main technology for our project.

As an introduction, Apache Kafka is a distributed streaming platform built atop of a message queue mechanism of publish and subscribe. According to Apache Kafka’s official documentation, Apache Kafka is fully scalable, distributed and fault tolerant. It means that is is suitable for high volume yet high availability use-cases such as real-time tracking.

The whole architecture for the project looks like this

For the streaming system, we run two brokers (instances) of Apache Kafka to proof the high availability and fault-tolerant aspect of the system.

Since we got the data as a REST API fetches, we have to simulate the “asynchronous” part as the producer. For this case, we use Apache Airflow; a scheduler plaform to fetch the endpoint and publish messages every five seconds. The simulator script divides the message into three topic streams:

  • The latitude and longitude of the bus, directly produced from the script.
  • Single trip and bus information (trip id, bus code, bus color, trip corridor) stored to PostgresSQL, listened the write-ahead-logs (WALs) and produced via Debezium. Debezium is a Kafka connector to capture data changes in a persistent database (CDC) as stream messages. This is done to show the ability to stream a persistent database such as PostgreSQL.
The user interface of Apache Airflow. The green dots means the successful producer jobs, while the yellow one represents running jobs.

On the consumer side, we have two use cases. The first one is visualization and realtime bus following. The second one is data storing to simulate stream data ingestion on data warehouse.

Following a single bus location on the visualization frontend. Thanks to websocket technology👏

For the data storing use case, we only use SQLite3 as the database due to time constraint (we admit, we are deadliners at the time😅). However since the trigger is eventual, we haven’t handled the possible cases of data loss in the system. For future improvement, it is possible by commiting the message offset between the consumer and the broker.

Our attempt in accessing the bus data on the SQLite database

We had so much fun while building the consumer’s platform using Django since we applied so much new things such as threading, queuing and Kafka’s Python API and websocket. And also, Mapbox GL’s Javascript library is the MVP of this project🔥🔥🔥.

We made this diagram to guide ourselves on building the consumer platform

With limited computing resource and development timeline, Docker helped us on building, provisioning and orchestrating the instances. We’ve managed to run Airflow, PostgreSQL, Zookeeper, multiple Kafka brokers, Django, Redis, Debezium (plus Kafka Connect) without any hassle (except hours of figuring out the docker-compose script and the network configuration)

We recommend anyone who haven’t tried Docker to learn and utilize it in any of upcoming projects, especially on projects that need multiple instances orchestration!

Moment of relieve // Project presentation (Thank you Adil for documenting!)

We have so much fun on this project from brainstorming to presenting the project. Please reach us out via the comment section if you have any feedback or anything to share about this project or data engineering. The source code for the producer can be found here, and here it is for the consumer part. Let’s collaborate and learn😀

Recommended readings and references:

[PDF E-Book] Kafka: The Definitive Guide

[Medium] Routing Kafka messages to browser over web socket

--

--