[WEEK 1–2] Data Gathering

muhammet özgür
bbm406f17
Published in
3 min readDec 4, 2017

Abstract

In the past decade machine learning solved countless every day problems in the society. Crimes are one of those problem we can try to solve it with machine learning. Every city has properties such as population distribution by gender, race, income and police force that we can define by budget, policeman per 100k citizen etc. Using data we already keep we can predict cities needs of law enforcement or changes the city needs. Let`s start an example. Imagine a city council trying to find a way to spent their budget in a best way that reduces the crime rates. Every member of the council gives an argument that their way of spending is the best. One of them supports his/her argument with our machine learning tool in which we can see the prediction that the crimes rates are lower than current year with his/her budget. After a year the prediction might be totally wrong or just slightly but it does not matter. We now have new years data that we can train our tool with it. This estimation in time will get only better and might give city counsel tie-breaker in their arguments! Of course this is only one example usage of our machine learning tool. We can use it to estimate crimes for next year by giving properties that we have not much control over and estimate cities feature!

Introduction

Above we have given the abstract which idealistically written that might shed some light to our thinking process. Let`s start by defining our problem. Basically any classification problem is between our features, let`s call them x, and our labels, let`s call them y. We want to find that given city features ‘x’ we can find crimes ‘y’ and their estimated numbers. Finding y was the easy step in which we obtain from FBI yearly crime rates city by city. However obtaining data for x was a nightmare. We did underestimate it big time which we should have guest because related works on this subject was scarce and not really specifically what we wanted. In turn we had to mash up stuff when our data gathering began. Initially, which did not make sense to be honest, we tried to collect x from the news article on which there was couple of related paper but did not really fit to what we envision. Subsequently we start to look into city data from cities using census.gov or respective web site of the city. This was exhausting to gather. We find out immediately that there is no shortcut to any ‘x’ data gathering mission. Almost every feature we needed was in different servers and even to put it in a yearly bases was tedious. Also we start to feel like this is not an ideal project to continue because of other problems. To name a few, classification challenge — every few feature to guest correctly — and having few related works to seek help etc. Of course we did find features to our classification links are given below.

Data Resource

These are most useful resources that we found. There are tons of resource that are very little help. We did not include those to not boggle down our the blog.

--

--