First day trying Python as a going-to be-DS
Ok, ok, I can’t call myself a data scientist yet(I’ve just written a poor article and got a 68% accuracy on a diabetes dataset) but I’m trying to become one, okay? So, I’m on vacation from University so I took a time to learn by myself a bit of Python and what it can do in order to code stuff.
First, I’ve to look up for resources and one book called my attention: “Python for Data Analysis”, seemed to be pretty good.
First Things First
So, my first day starts, however, I don’t know a lot about python myself, I’m still struggling to understand how to struct a small project(imagine a bigger one). Here are the packages I’m using right now: pandas and it dependencies(Numpy per example) and matplotlib to plot the graphs I want to build.
The dataset!
I’ve chosen a dataset about consumers’ complaints in Brazil(BR), you can download the data at the official open data site of BR: http://dados.gov.br/dataset/reclamacoes-do-consumidor-gov-br and another resource(more updated though) is trough the official website: https://www.consumidor.gov.br/pages/dadosabertos/externo/ .
One challenge: Python doesn’t “handle” latin that well, if you use the method read_table from pandas without specifying the code to “latin-1” then you’ll get an erro message: “UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xe3 in position 4: invalid continuation byte”, bothersome.
Analysis
So, as you can see in this article, if there are few elements, our eyes are good at retrieving information. So, to initiate I used the pyplot class from matplotlib package in order to plot some graphs about the data given. I used the 2015–06.csv(the dataset from june,2015) in order to test a few functions, the output and analysis come below:

So, as you can see people from Brazil seems to be pretty binary ones, from around 9215 examples who gave their grade, more than half of it were weather 5's or 1's, but we can see they were more positives than negatives which seems that this tool is quite good.
But I think I’d like to aggregate more value to the discussion: Which company receives most of 5's? and which one received most of the 1's?
I’ve a little a bit difficult in this part due to my inexperience at Python, one error that I was find is that all the values at the “UF” Series are strings, so I had to find a library which could give me the ability to count all the values (and then plot a histogram/pieChart to understand). And I found my simple solution: “from collections import Counter”, and this tutorial helped me out a lot. From the following two lines:
count = Counter(dataset["UF"])
print(count)
{‘PR’: 4438, ‘SP’: 2928, ‘RJ’: 1475, ‘MG’: 1205, ‘BA’: 1175, ‘RS’: 781, ‘GO’: 589, ‘SC’: 554, ‘MT’: 531, ‘CE’: 526, ‘DF’: 456, ‘ES’: 377, ‘PE’: 292, ‘MA’: 187, ‘PA’: 127, ‘MS’: 120, ‘AM’: 108, ‘RN’: 92, ‘PB’: 87, ‘AC’: 66, ‘SE’: 44, ‘PI’: 38, ‘AL’: 33, ‘RO’: 31, ‘TO’: 22, ‘RR’: 14, ‘AP’: 13}.
Good enough,hun? but I couldn’t find a way to plot a graph where we could find who are the biggest 5's and 1's scorers, however, I found a straight solution:
result = dataset.groupby('Nota do Consumidor')
print(result["Nome Fantasia"].describe())
Nota do Consumidor
1 count 2840
unique 134
top Oi Fixo
freq 306
2 count 650
unique 72
top Oi Fixo
freq 83
3 count 1205
unique 100
top Vivo - Telefônica
freq 192
4 count 1469
unique 104
top Vivo - Telefônica
freq 248
5 count 3051
unique 127
top Vivo - Telefônica
freq 513As you can see, the champion of positive votes is “Vivo — Telefônica” and the title of worse stays with “Oi Fixo” (as I’ve noticed on my previous post https://medium.com/@felipebormann/quem-os-brasileiros-n%C3%A3o-gostam-520a8e589b26).
So I guess now is a good time to explore more of the data and try new things, see you guys on the next post. I hope you’ve enjoyed my brief analysis and my poor ability at Python haha, I hope I become better at it as the time goes and the more focused people could see that I did not use IPython Notebook this time, I’m still getting used to it but wait guys, the time will come where I’ll be recognized as part of the top 5 Data Scientists from Brazil.