Brazilian E-Commerce Public(EDA)

Yamen Shabankabakibou
Analytics Vidhya
Published in
5 min readJul 30, 2020

It has been 3 awesome weeks since Istanbul Data Science Bootcamp has started and finally, the first projects’ time has arrived.
We have been asked to find a dataset that suits our goal and try to implement Exploratory Data Analysis(EDA) on it to extract instincts that help describe the business intended to focus on.

About The Dataset:

This is a Brazilian e-commerce public dataset of orders made at the Olist Store. The dataset has information about 100k orders from 2016 to 2018 made at multiple marketplaces in Brazil. Its features allow viewing an order from multiple dimensions: from order status, price, payment and freight performance to customer location, product attributes and finally reviews written by customers. We also released a geolocation dataset that relates Brazilian zip codes to lat/lng coordinates.

This is real commercial data, it has been anonymized, and references to the companies and partners in the review text have been replaced with the names of Game of Thrones great houses.

Exploring Before Starting:

Before I start making my hands dirty with the analysis 😋, I should look into the data, examine it, find features types, find missing values, and do some cleaning.

Pandas Info()

as we can see using info() function provided from the pandas library, the number of missing data in the dataset is too low (Lucky Us😁). This is something we don’t encounter every day.
There are some features that contain DateTime data but have been interpreted as object type by pandas read_csv() which is something that needs to be fixed if we are going to use these features for further analysis.

Starting With EDA:

1- Converting Features Into DateTime:

In the orders data frame, there are some features that represent dates and times, but for this analysis, I will be using only the ‘order_perchase_time’ feature which represents the date and time the customer has requested the order. I have decoupled the values in that column into multiple columns which represent(Year, Month, Day, Hour, Year/Month, TimeOfDay) as shown in the picture.

2- Convert Products Categories From Portogies Into English:

This step is being done easily thanks to the table ‘product_category_name_translation’ provided from the Olist Store, which contains every product category presented in the dataset in Portogies and what it represents in the English language.

3- Where Our Customers Belong To:

  • As a start, I decided to look into the customer’s/seller’s tables for the distribution of our business’s customers/sellers based on the state they are located in.
  • Most of the customers are located in São Paulo(SP), Rio de Janeiro(RJ) and Minas Gerais(MG).
  • On the other hand, about 60% of our sellers are located in São Paulo(SP) and the rest is distributed between Paraná(PR), Minas Gerais(MG), Rio de Janeiro(RJ).

4- What Type of Products Are We Selling:

  • So I started to look for the most desirable product categories from the customers.
  • bed-bath-tables with 9.4% were the most desirable product category, sports-leisure with 8.9%, furniture-decor with 8.2%, health-beauty with 7.6%, housewares with 7.2, auto with 5.9%, computer-accessories with 5.1%.

5- About Customers and There Orders:

  • From the orders data frame, we can see very obviously that most of the orders have been delivered which means that this dataset mostly contains data about delivered orders or ‘that our company is doing its job will 🤓’.
  • Most of our customers have paid for there orders using their credit card with 73.92% of the orders.
  • About half of our customers tend to pay the value of there orders as 1 installment, on the other hand, the other half of the customers are paying as (2–10) installments.

6- Orders Distribution Based On Week Days:

  • In terms to study the customer's behavior plotting the count of orders across WeekDays and PartOfDay was the perfects place to start from.
  • This type of analysis will help to produce more effective advertising plans.

7- Orders Comparison Between(Jan to Aug):

  • To see the growth in the number of orders made between Jan-Aug in 2017 and 2018 I’ve plotted the count of orders for these periods.

8- Bussiness Growth:

Conclusions:

1- Most of our customers are located in São Paulo(SP), we can advise the stores to invest in advertising in different communities and more cities.

2- Most of our sellers are located in São Paulo(SP) too, to get the maximum profit sellers can invest in more store branches.

3- We can advise the new sellers with the most wanted categories of products (bed bath tables, sports leisure, furniture decor, health-beauty, housewares, computer-accessories).

4- Most of our customers use their credit cards to pay with a minimum amount of installments, in this case, sellers can invest with banks and provide some offers and discounts to encourage customers more and more.

5- Most of the orders are made at the beginning of the weekdays during the afternoons.

6- There is great growth in profits between 2017 and 2018, our orders reach their peak at 11/2017.

--

--

Analytics Vidhya
Analytics Vidhya

Published in Analytics Vidhya

Analytics Vidhya is a community of Generative AI and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

No responses yet