#DS 02 . Pandas, DataFrames and plot

Saulo Toledo Pereira
Geek Culture
Published in
5 min readJun 16, 2021

--

In my last medium story, I told how I discovered the Data Science world and in the next posts, I’ll describe what I had learned about it.

=>Gihub repository<=
https://github.com/saulotp/ds02

I’m focusing to study python to be a simple but powerful language, actually, I believe that python is the ‘entrance door’ for the programming world. Looking for study material about python in Data science, I found about Spyder and Jupyter, which are amazing tools for Data analysis. Furthermore, these tools will help me to show my project's evolution.

First I’ll tell about the project. I got an excel file (the file will be in GitHub repository) somewhere from the internet, this file contains sales data from some malls scattering in brazil. With this data, I want to answer some questions like:
- Wich mall had more sales
- What the total value of all sales from all malls
- What the mean of all sales from malls
- Verify some data throughout a date interval
- Plot some data

First I have to import pandas library and make python read the .xlsl file:

After reading the excel file and have created a DataFrame ‘dfmain’, we can print to see the result:

We can also use the command ‘display’ to style the output (looks more cool, I’ll use display instead print):

In the code below, I convert the date column from string to a format that pandas read as ‘date’. After this, a new DataFrame was created to shows only the ID mall and total value from all sales for each mall sorted in decrescent order.

Result below:

The table shows us that the mall with more sales value was ‘Shopping Vila Velha’ and the worst was ‘Shopping Morumbi’

For the next, I would like to know how the sum of all value sales from all malls, then I get a method that can sum one or more columns, and sum the ‘Valor Final’ Column. After this, I created another column on DataFrame to show the values in percent format.

To answer the question about the mean of sale I used the same code but with ‘mean()’ method:

In the next section I decided to write a script that can extract data for each mall only changing the [‘ID Loja’] == ‘Selected Mall’

To know how much each product was sold I created another DataFrame from (dfmall).

This table shows that the product with more value sales was ‘Terno Linho’ with a total value of ‘R$ 60.000,00’ representing 4,08% of all product sales from this mall.

If we want to analyze a specific date period, we can create a filter to determined date period, and then, extract data more precisely. With the code below I found which product was more sold in January, but the date can be changed for any period, for example: from `2019–02–18` until `2019–07–23`

As result we have:

The table above shows us that in January, the item that was more sold was “Terno” with 6,4% of all sales. And the item with fewer sales was “Chinelo Liso” with 0,04%

To view the total value of sales per month we can use the command “resample(parameters)” and select M (month) as a parameter in “data” (date) column. Thus, we will create another DataFrame with data selected:

In this case, I removed the index “Valor Total” because I have encountered some errors when I’m trying to plot =/
This table shows the total value of all sales for each month

Finally, the plot time. To plot we have to import another python library :

Full line code: “pt = dfmalldatessumlsales.plot(style=’.-’, ms=15, color=’red’, figsize=(20,10), lw=5, legend=False, xlabel=’’, title = ‘Total Sales’)”

As result, we have an image representing the data with the sum of all sales per each month:

With this image, we can observe that the month with more value sales was May and the month with fewer sales was August.

We can do the same to plot the mean of sales:

Mean of sales per month

Well, “This is it”.

Study alone can be so hard sometimes, but it is possible.
Now, answering the previous questions:
Q: Wich mall had more sales?
A: Shopping Vila Velha R$ 1.615.271,00

Q: What the total value of all sales from all malls?
A: R$ 38.959.752,00

Q: What the mean of all sales from malls?
A: R$ 1.558.390.08

Q: Verify some data throughout a date interval
A: January from 2019–01–01 until 2019–01–31, shopping Morumbi was sold a total value of R$ 9.100,00 to “Terno”.

Q: Plot some data
A:

=>Gihub repository<=
https://github.com/saulotp/ds02

Contact me :
- saulodetp@gmail.com
-Ig : saulodetp
- Linkedin: saulodetp

--

--

Saulo Toledo Pereira
Geek Culture

PhD student trying to learn some code and practice my English. Can we talk five minutes?