# #DS 02 . Pandas, DataFrames and plot

Jun 16 · 5 min read

In my last medium story, I told how I discovered the Data Science world and in the next posts, I’ll describe what I had learned about it.

=>Gihub repository<=
https://github.com/saulotp/ds02

I’m focusing to study python to be a simple but powerful language, actually, I believe that python is the ‘entrance door’ for the programming world. Looking for study material about python in Data science, I found about Spyder and Jupyter, which are amazing tools for Data analysis. Furthermore, these tools will help me to show my project's evolution.

First I’ll tell about the project. I got an excel file (the file will be in GitHub repository) somewhere from the internet, this file contains sales data from some malls scattering in brazil. With this data, I want to answer some questions like:
- Wich mall had more sales
- What the total value of all sales from all malls
- What the mean of all sales from malls
- Verify some data throughout a date interval
- Plot some data

First I have to import pandas library and make python read the .xlsl file:

After reading the excel file and have created a DataFrame ‘dfmain’, we can print to see the result:

We can also use the command ‘display’ to style the output (looks more cool, I’ll use display instead print):

In the code below, I convert the date column from string to a format that pandas read as ‘date’. After this, a new DataFrame was created to shows only the ID mall and total value from all sales for each mall sorted in decrescent order.

Result below:

For the next, I would like to know how the sum of all value sales from all malls, then I get a method that can sum one or more columns, and sum the ‘Valor Final’ Column. After this, I created another column on DataFrame to show the values in percent format.

To answer the question about the mean of sale I used the same code but with ‘mean()’ method:

In the next section I decided to write a script that can extract data for each mall only changing the [‘ID Loja’] == ‘Selected Mall’

To know how much each product was sold I created another DataFrame from (dfmall).

If we want to analyze a specific date period, we can create a filter to determined date period, and then, extract data more precisely. With the code below I found which product was more sold in January, but the date can be changed for any period, for example: from `2019–02–18` until `2019–07–23`

As result we have:

To view the total value of sales per month we can use the command “resample(parameters)” and select M (month) as a parameter in “data” (date) column. Thus, we will create another DataFrame with data selected:

Finally, the plot time. To plot we have to import another python library :

As result, we have an image representing the data with the sum of all sales per each month:

We can do the same to plot the mean of sales:

Well, “This is it”.

Study alone can be so hard sometimes, but it is possible.
Q: Wich mall had more sales?
A: Shopping Vila Velha R\$ 1.615.271,00

Q: What the total value of all sales from all malls?
A: R\$ 38.959.752,00

Q: What the mean of all sales from malls?
A: R\$ 1.558.390.08

Q: Verify some data throughout a date interval
A: January from 2019–01–01 until 2019–01–31, shopping Morumbi was sold a total value of R\$ 9.100,00 to “Terno”.

Q: Plot some data
A:

=>Gihub repository<=
https://github.com/saulotp/ds02

Contact me :
- saulodetp@gmail.com
-Ig : saulodetp

## Geek Culture

A new tech publication by Start it up (https://medium.com/swlh).

Written by

## Saulo Toledo Pereira

PhD student trying to learn some code and practice my English. Can we talk five minutes?

## Geek Culture

A new tech publication by Start it up (https://medium.com/swlh).