Developing A Data Platform for Cities: An Introduction

Blog-Photo

In November 2016, Open Data Durban (ODD) embarked on its largest project to date, the first phase of development for a data exploration and management platform targeted at city officials. This project, the South African Cities Open Data Almanac (SCODA) was done as a continuation of a proof-of-principle web-app built for the South African Cities Network (SACN) and saw collaboration with, the Japanese International Cooperation Agency (JICA), and with an independent demographer. As with all ODD projects, the fundamental goal of this development was that we have a positive impact on how cities use data to drive decision-making and draw insights into which strategies will best aid with development, inclusivity and service-delivery at a municipal level. Four major challenges that currently face data-centric activities in cities are:

  1. Data sets are often disparate and/or siloed, thereby reducing a user’s ability to utilise multiple sources in an analysis
  2. While city officials are often highly capable problem solvers, they often lack coding expertise and struggle to manipulate often unwieldy datasets (e.g. the 2011 Census) efficiently into formats necessary for informative visualisation and downstream analysis
  3. As a consequence of (2), the contemporary modelling approaches utilised for analysis of city data, e.g. internal populations migrations/dynamics, are limited to linear methods and Excel-compatible file-sizes viz sub-gigabyte level
  4. Training in responsible data-use strategies and statistical best practices have seldom been provided to officials, or to most people for that matter, at secondary and tertiary schooling levels

The platform we envisaged would attempt to alleviate these challenges by means of the following built-in functionalities:

Multiple datasets would be made accessible within a single interface. These would include:

A generalised visualisation framework that generates an interactive dashboard for any dataset passed to it and that facilitates the exporting of graphs and data tables in a variety of popular file formats

A modelling framework by which researchers can contribute and collaborate on novel, open-source approaches for city-level analysis. As a prototype for this infrastructure, we sought to develop a novel “open-system” demographics model that leverages Big Data information sources to produce a continuously improving analysis framework for studying migration within cities

A user-centered, easily accessible data processing experience that has data fidelity checks and balances built in so as to ensure that all information combined for an analysis has passed a sanity check.

Over the course of the coming weeks I will discuss each of these key functionalities in a series of technically-focused posts on how we built them into the platform, the cool things we discovered, and the lessons learned from challenges we faced. These articles will often delve into pieces of code used, however, I will attempt to ensure that the narrative allows anyone to understand the significance of each component. If you’d like to take a gander at the code and follow along, it can be found on our GitHub.

-Matt

Matt Adendorff is the lead technologist for Open Data Durban and is a big fan of free t-shirts.


Originally published at Open Data Durban.