Crime Analysis in Rio de Janeiro

João Gustavo
Analytics Vidhya
Published in
8 min readMar 19, 2021

It is well known that Rio de Janeiro is known as the “Cidade Maravilhosa”, being one of the biggest tourist centers in the country.

Despite the wonders of Rio de Janeiro, the state has high rates of violence and trafficking, problems that the police have difficulty overcoming. According to the summary Violência no Rio de Janeiro — Desafio do Estado é vencer o tráfico, trafficking alone does not justify the high crime rate.

Practically every major metropolis in the world has an illegal drug trade. It is estimated that the cocaine consumer market in New York, for example, is twice as big as Rio.

However, Rio is one of the few cities in the world that witnesses scenes of war on the streets almost on a daily basis. The reason is armed factions fighting for control of territories, favored by a network of corruption and the historical neglect of the public power in relation to the favelas in Rio’s hills.

Rio de Janeiro

Rio de Janeiro is a large Brazilian city by the sea, famous for the beaches of Copacabana and Ipanema, the 38m-high statue of “Christ the Redeemer” on top of Corcovado, and ‘Pão De Açúcar’ a granite peak with cable cars to its summit.

The city is also known for its exciting Carnaval, with floats, extravagant costumes, and samba dancers, it is considered the largest in the world. As striking as the Christ or the ‘Pão de Açúcar in the carioca landscape, the favelas are spread throughout Rio’s hills. According to data from Censo2010, there are 160 urbanized districts and 763 favelas, in which more than 1.3 million people live, almost a quarter of the city’s population.

Getting the data

The data used were obtained from the ISP Dados.RJ (Public Security Institute of Rio de Janeiro) website. Therefore, the dataset to be analyzed is a file.csv(DOMensalEstadoSince1991.csv)-Security statistics: monthly historical series in the state since 01/1991

Initial Data Analysis

Knowing the Data

Here we will make sure that the reader gains knowledge about how the data is structured, so that he or she can feel at ease during the analysis, since they will be able to understand what they are reading.

A great start is to check how our DataFrame is distributed, so we know how many inputs and variables we will have to work with.

It was possible to identify 361 entries and 56 variables, since we will be working with numerical data, the variables are classified as floats and int.

Variables Dictionary

Let’s take a look at the variable names that will appear throughout the article and put together a dictionary in order to clear up any possible doubts.

  • ano — Year of occurrence report
  • mes — Month of occurrence report
  • hom_doloso —Malicious Murder Records
  • lesao_corp_morte — Records of Bodily Injury with Death
  • latrocinio — Robbery Records
  • cvli — Intentional Lethal Violent Crimes (bodily injury followed by death)
  • hom_por_interv_policial — Homicide Records by Police Intervention
  • letalidade_violenta — includes cases of intentional homicide, robbery, bodily injury resulting in death, and death due to intervention by a state agent
  • tentat_hom — Homicide Attempts
  • lesao_corp_dolosa —Bodily Injury with intent
  • estupro — Rape Cases
  • roubo_veiculo — Vehicles Robbery
  • roubo_em_coletivo — Collective Robberies
  • furto_veiculos —Vehicle Theft
  • recuperacao_veiculos — Recovery of stolen or stolen vehicles
  • apf — Act of Arrest in Flagrant

The dictionary of ALL variables, you can find in the analysis notebook, present at the end of the article.

First Entries

Next we will finally take a look at the first 5 entries of our DataFrame.

First Entries

After checking the first entries, it is possible to notice the absence of some values, which can compromise the visualization and interpretation of the data, since they would be presented in a way that does not correspond to reality.

Missing Values

Looking for the missing values? Yeah, me too!

We can see that roubo_bicicletaand apfhave the highest percentages of missing values, being about 76% and 50%, respectively.

We don’t know why this data is missing. It could be due to an error, loss of data over the years, not being accounted for, or other reasons.

Copying the DataFrame (2011–2021)

Due to the missing values and the need to make an analysis as consistent as possible with reality, we will only analyze the last 10 years, and some comparisons will be made between this period and previous years.

To do this, we will create a copy of the DataFrame and use the data for the period 2011–2021, regarding 2021, we only have data for the month of January, as we have not reached its end yet(by The Time of article publication).

Checking for Missing Remaining Data

We will then check again what percentage of values are missing in each of the columns.

It can be seen that the amount of missing data has decreased dramatically, with only two variables having missing values.

  • roubo_bicicleta e furto_bicicleta — still have about 30% of their values missing.

Let’s check how our DataFrame looks after fetching more recent data.

First Entries 2011–2021

With these values, it is possible to make a relevant analysis and relate it to what happens today, we can also have a more didactic visualization.

Statistical Information on Violence in Rio de Janeiro

Let’s check the Stats!!!

In order to get a general summary of the statistical information in our dataset we will use the function df.describe() , which returns the mean of each variable, the standard deviation, maximum and minimum values, the number of entries for each variable, and some percentiles.

Disregard the values of the anoand mescolumns.

Averages

It is possible to check the average of some variables, so we can better understand the situation that Rio de Janeiro finds itself in. Let’s take a look:

Vehicles Robbery

After viewing the statistical data, we were able to extract that about 2880 vehicles are stolen every month! Therefore, an average of 96 vehicles are stolen per day, about 35,000 per year.

Vehicles theft

Another piece of data that we were able to obtain was the number of vehicles theft, about 1340 vehicles per month. Although many people are not aware of it, theft and robbery are different crimes, since in theft there is no violence, but in robbery there is.

Vehicle Recovery

However, the amount of vehicles recovered, on average, is only 2072. This value, represents about 50% of the 4220 vehicles robbery and theft, per month.

Maximum and Minimum Values

We can also find the month and year when there were the most and the least cases of a certain crime, we will look at the cases of Murder with intent.

Maximum Value 2011–2021

In March 2014 the number of cases was one of the highest ever recorded, 510 cases in total. This is 135 more cases than the monthly average, 375 cases.

Maximum Value 1991–2021

For comparison purposes, the largest number of cases present in our entire data set was in January 1995. In this period, 831 cases were registered.

Minimum Value

In September 2020, 239 cases were recorded, and this is the lowest amount posted. This is the lowest amount recorded for Murderous Homicides. The reason for this low number of occurrences, is justified by the high amount of Sars-CoV-2 cases, in this period.

Data Visualization

After processing and analyzing the data, we were able to understand the situation in which Rio de Janeiro finds itself. Therefore, we can plot graphs that will help us visualize the data differently.

Histogram of “hom_doloso”

The plotted histogram can show us that there are extremes when it comes to the amount of crime. You can see that while some months will have more than 500 cases, there will be months when there will be less than 250, but for the most part we will have between 300 and 400 incidents.

Line graph for the variable “roubo_em_coletivo

The line graph shows us that theft in buses has been increasing since 2012, when the numbers were close to 345. However, since 2015 the occurrences have increased by about 372%, as the amount of robberies in buses exceeded 1600 cases in August 2017.

Impressed by this increase in occurrences?

Conclusions

After the analysis, it can be observed that in 10 years some data have changed, but unfortunately there are sectors that are still alarming, given that their numbers are still very high. There are also data where there was a decrease in cases, but after a short period they went back up and are still worrying.

However, it is up to the Government and the Police Force of Rio de Janeiro to make the city safer and make the home of the Christ the Redeemer statue increasingly attractive, even to people who know the city as a violent place. Therefore, increased patrols, more police in the field and task forces combating armed factions may be a solution. After all, the population needs to feel safe and the country needs the “Cidade Maravilhosa” to be really wonderful.

In such a way that the state would only benefit from it, since it has been going through difficulties in the last few years. Therefore, solving the security problems can cause a domino effect, and not only this sector would improve, but all the others as well.

If you want to have access to the project, click here! Follow me on LinkedIn and keep an eye on my GitHub, there you can find more projects in the future.

--

--