Using Deep Learning to tackle Traffic Safety in Jakarta — a collaboration with University of Chicago and Jakarta Smart City


CCTV footage from Jakarta

Pulse Lab Jakarta together with Jakarta Smart City recently had the rare and wonderful opportunity to participate in the University of Chicago’s Center for Data Science and Public Policy annual Data Science for Social Good (DSSG) fellowship, a summer programme training aspiring data scientists to work with government and non-profit partners on innovative projects with social impact. Our project, which proposed analysing CCTV data in Jakarta for the purpose of improving traffic safety, was selected as one of the global challenges that the fellows took on for their three-month programme.

We were privileged to work with enthusiastic data scientists who were honing their expertise on real-world challenges, collaborating in teams and learning from mentors who are from the industry and academia. The data scientists were current (or recent) graduate and undergraduate students from quantitative and computational fields — from computer science and machine learning, to statistics, mathematics, physical sciences and engineering, to social sciences, public health and public policy.

The Problem

Jakarta Smart City, which was set up by the Jakarta provincial government in 2015, works on technology-based services for residents. One of the main problems in Jakarta, which will come as no surprise to anyone who has visited, is the city’s notoriously congested roads with the numbers of cars and motorcycles rising annually. This contributes to traffic congestion and adds burden to the city’s infrastructure that was not designed to accommodate such numbers. Jakarta Smart City, in collaboration with Pulse Lab Jakarta, sought to improve traffic safety by harnessing data gleaned from raw, closed-circuit television video (CCTV) footage positioned at various intersections throughout the city. While the Jakarta city government maintains these cameras, the amount of footage is too voluminous for manual monitoring.

Enter the DSSG Fellows

A collaboration with DSSG fellows was a fantastic opportunity to produce some degree of automation for the city’s CCTV network in order to encourage effective and efficient resource allocation. The objective was to tap into the smarts of the data scientists contributing their valuable time and experience, using the data made available by Jakarta Smart City to come up with a modelled system to improve traffic safety and resource allocation strategies around road/traffic safety. In particular, we wanted to build a video-processing pipeline to extract structured information from raw traffic video footage.

The Process

The DSSG programme is well structured with letters of exchange and project charters in place before the programme kicks off. All the partners to the project know and understand their role well, as well as what they are expected to provide and support. A project deliverable was defined as a video processing pipeline that could automatically receive and process CCTV footage and then create structured output suitable for downstream applications (potentially integrating non-video data sources such as traffic data or weather data). Clear project milestones and timelines were worked out and agreed to and regular communication channels put in place, including Slack and weekly check in calls with all partners.

How it worked

Jakarta Smart City’s Head, Setiaji, provided domain expertise and in-depth knowledge of how planning applications and decisions are facilitated so that the DSSG team could understand the context in which decisions are being made. The project relied on deep learning to identify objects in images — a task that humans can do well but one that is labour-intensive and hard to scale, making computer vision a more efficient approach. This initiative was essentially about converting unstructured video data into structured traffic data that could then be used for identification purposes — object detection, classification and descriptions.

The project involved four main tasks: object detection, object classification, motion detection and semantic segmentation. These tasks are a bit technical, but here is our go at explaining each. Object detection is the spotting of various objects in a given video frame, while object classification is aimed at accurately categorising the objects identified. For these two tasks, the YOLO3 model was used, whereby a rectangular box is placed around objects in the video frame and a list of possible categories for the object is given; for instance, car, motorbike, truck, etc.

Objects are detected and classified from raw CCTV footage

The motion detection task relied on the Lucas Kanade Sparse Optical Flow Method to calculate optical flow (which in simple terms is the pattern of moving objects between two consecutive video frames caused by the movement of an object itself or the camera). Lastly, semantic segmentation was deployed through a combination of WideResNet38 and the DeepLab3 methods which helped in separating surfaces such as roads and sidewalks within the video frame. These four tasks helped in the realisation of a pipeline that converts raw, unstructured video frames into data that is ready for analysis.


Due to time constraints, the team has not yet trained or tested other object detection and classification models. This resulted in certain limitations such as the inability of the current model to correctly classify bajaj motorcycles (a common mode of transportation in the city), thus omitting bajaj from the final count. However, to balance such shortcomings, the team endeavoured to include as many tools as possible to aid in the collection of data, as well as the validation and training of the process in the future. This deliverable included detailed instructions on how to use the Computer Vision Annotation Tool (CVAT) along with much of the code required to finetune and run a model once it has been trained.

Where do we go from here?

We are grateful to the fantastic team at DSSG for their support and enthusiasm for this project. Working across continents, many time zones and in different languages complicated the task a little, but with a bit of perseverance and patience we generated the first iteration of the video analysis pipeline.

The 2018 DSSG programme consisted of 24 aspiring data scientists in Chicago and 15 further data scientists from across the world convened in Lisbon. We hope to have further discussions on how we can bring the DSSG programme to Jakarta in the years to come!

Our sincere gratitude goes to our partners in Jakarta Smart City, Setiaji and his awesome team as well as Rayid Ghani, Katy Dupre and Joe Walsh at the University of Chicago for their technical guidance and oversight and keeping us in the right direction. Above all, thank you to the data science fellows who worked on this traffic safety project including Alex Fout, João Caldeira, Aniket Kesari and Raesetje Sefala. Looking forward to the next collaboration!

We’ve also made the full technical report available below in case you’d like to read more about the project:

Pulse Lab Jakarta is grateful for the generous support from the Government of Australia.



UN Global Pulse Asia Pacific
United Nations Global Pulse Asia Pacific

UN Global Pulse Asia Pacific is a regional hub that aims to drive data innovation and sustainable development to ensure that no one is left behind.