Data_is_Power
3 min readJun 21, 2019

Investigation of Crimes in NYC — gganimate version

Introduction:
The Altair version of the analysis can be found here & here. The objective of this github repo was to explore and learn the functionality of gganmiate in R. The dataset used in this analysis are also available in ‘Data’ folder of the repository. File titled as “adult” contains data about the total number of crimes in NYC between 1970–2017.
The File titled as “nyjob1” contains data about the employment history of various counties in NYC between 1976–2018.

Analysis
From Altair version of the analysis, it was noticed that 4 boroughs with highest crime were Bronx , Kings, Queens and New York City.
Thus, it was decided that to explore and learn gganimate functionality only for those 4 boroughs will be used.
First, the base plot was plotted where Total Crime was plotted on Y-axis and 4 boroughs were plotted on X-axis.The size of the dot represents the total crime. Below animation was obtained

Total crimes in 4 boroughs of NY

Next, It was decided to plot line-chart where Year can be on X-axis and Total Crime on Y-axis, and total crimes from all 4 boroughs on same plot, with each line representing crime from boroughs. Below animation was obtained.

From above animation it is very hard to see which line represents which borough’s crime. So the `color` of the lines/dots were changed using `color` functionality available in ggplot2.

Next, `facet_wrap` function was used to plot total crimes in each 4 borough separately. Below animation displays the output.

Since, the dataset contained different types of crimes such as felony,Drug, DWI and other Misdemeanor. It was decided to plot all crimes in faceted plots for all 4 boroughs, and see if any visible trend can be find.

Form the above animation it is very hard to detect any trend. Thus, it was decided to plot faceted plot of 4 boroughs and all crimes.

Finally, the Employment data was merged with total crimes dataset. For the below animation, Year between 1990–2017 were subsetted from both dataset because the second dataset did not contain sufficient data for those 4 boroughs. Additionally, log transformation was performed for both the dataset, this was done to normalize the dataset.

Results
Even though Unemployment rate was high during 1990’s in 4 boroughs, the crime rate was much lower (similar conclusion was obtained in Altair version too). Thus, there can be other factors such as Poverty, Education, and Population demographics that might have caused increase in crimes in NYC during 1990's.

What’s next ?

Find dataset that involves population demographics and try to correlate with crime dataset.

Code is available here

Feel free to provide comment/feedback.

Love. Share. Care. Peace

Data_is_Power

Data enthusiast currently learning and exploring world of Python, R and ML techniques.