Analyzing Pelotas citizens requests to the city hall.

Vítor Resing Plentz
3 min readFeb 19, 2018

--

The contributors to this article are Lorenzo F. Antunes, Nelson Dutra and I.

In this project we choose to investigate the citizens request to the city hall.

The dataset is available at http://www.pelotas.com.br/portal-de-dados/ and in the git repository is here.

We choose two approaches, the one of them is an text analysis, and the other one it’s a try to find insights from all the data.

The first approach (Portuguese text analysis):

In the dataset there is a column that describes peoples request as text, this text follows no specific structure as shown bellow.

Peoples requests sample.

After reading some of the requests, we wanted to get the whole context from the words, try to find some patterns using only the text decryption. To achieve this we followed some steps:

Step 1: Make text easier to analyze.

  • Make all text words lowercase;
  • Transform the texts in a list of words, so it’s possible to make a frequency analysis;
  • Remove portuguese stop-words (stop-words are usually words connectives such as “e”, “ou”, “mas”, …);
This is how we did the steps described above.

Step 2: Applying Frequency distribution in our list of words, and analyzing the 100 most used words.

WordCloud of the most used words

The list of the 100 most used words can be found in the notebook, decided not to post it here, to keep this article readable.

Analyzing the words, we found that even removing the stopwords, much of the words shown didn’t give us much of the context we wanted. To achieve that we filtered the words to have a minimum size of 4 and frequency above 200.

Step 3: Re-analyzing the filtered content:

To have a better view of the word frequency we’ve plotted a bar graphic (also can be found in the Notebook, same reason). Taking a look at the most used words in the texts we found 2 things:

  • Some of the city neighborhoods were in the list (Centro, Fragata, Laranjal, Areal and Três Vendas);
  • Some of the words pointed to the same context, so we decided to group some of them in 4 groups “saneamento”, “limpeza”, “poda” and “iluminação” (which are sanitation, cleaning, pruning and lighting).
Grouped frequency

We found that the most common problems are related to the groups we found and the neighborhoods that complain most also were present in the text:

Step 4: Verifying what we found:

To verify what we found we matched it with other columns content;

Verify step

Conclusion of first approach:

Using the requests decryption we were able to find out some of the city problems and some neighborhoods that are requesting improvements to the town hall. Haply our analysis match with the other columns in the dataset content, taking a look of the request classification (shown as “tipo”) and our problem grouping it’s noticeable that they are very close to each other, and the 5 most cited neighborhoods (shown as “descricao”) match perfectly with the 5 most cited in our frequency analysis.

Our try to find insights using all available data is here (in portuguese).

Thanks, for the reading!!

--

--

Vítor Resing Plentz

Data Engineer interested in Tech, Entrepreneurship and People. Previously: Data Startup Founder at Elixir AI. linkedin.com/in/vplentz/