Signal 2: Importance of Open Data to facilitate Machine Learning in Public Organizations

Karan Saini
Civic Analytics 2018
2 min readSep 18, 2018
http://maximumgovernance.com/perspectives/strengthening-democracy-through-open-government-data-platform/

One of the most important components of machine learning is the availability of good quality data. Machine learning engineers often achieve more accuracy by feature engineering (extracting new information from the existing data) rather than steps like parameter tuning for improving the model. The advancements in Deep Learning have enabled building multi-layer deep neural networks which work very well with large data-sets. All these factors have led to a demand for high quality and quantity data.

The example I explored in my previous article led me to the Open Data page for CMPD. This page has open data for CMPD related shooting incidents with detailed descriptions of each data-set as well as attributes in each data-set. Another helpful feature is the descriptions of primary identifiers which help readers understand the attributes needed to merge two or more data-sets. This is the perfect example of an ideal Open Data Project which I believe is making data accessible in a usable and comprehensible format.

The key data-sets which I identified for my machine learning model to prevent adverse police events include “CMPD Officer-Involved Shootings — Individuals”, “CMPD Officer-Involved Shootings — Incidents” and “CMPD Officer-Involved Shootings — Officers”. The field “INCIDENT_ID” serves as a primary identifier to merge all three data-sets.

Next steps include merging the data-sets and feature engineering for the machine learning model.

Source: https://dssg.uchicago.edu/wp-content/uploads/2016/04/identifying-police-officers-3.pdf

http://clt-charlotte.opendata.arcgis.com/datasets?q=safety

--

--