What Can You Do with E-Commerce Dataset (Part 1)

Muhammad Sifa’ul Rizky
Curious with Data
Published in
4 min readJun 25, 2020
Photo by Luke Chesser on Unsplash

Hi, welcome back to my article, and today I want to create something, like some analysis about data that I assume is big for some people and complicated. I want to analyze the E-Commerce data, yeah there is a data that talking about some sales, and then how was the sellers, rating and something similar. That’s why I say that the data is complicated because there are so many column and factors. I split into several parts because to make easier for you, to read and know which part is important to getting knowledge from it. Hopefully, you can join me to analyze this data because I guaranteed that this data is really good and there are many insights hidden from this data.
For this first part, I just making some introduction about what is our data and some description of it. Let’s start!

To create a data science project, many people would think that it is harder to find the data they want to use. Getting Kaggle, data.world should help to get the data, but sometimes the data is almost clean, which is not indicated in the real world when you always find messy data. But this data is different, yeah I found on Kaggle, but it is based on real data (of course with some deleted privacy data like name of customers or something similar). So I think it should be a good project if you can analyze and do something with this data.
The data we are using coming from Olist (company from Brazil) that having 100,000 orders from 2016 until 2018 made at multiple marketplaces in Brazil. So basically it is coming from Brazil so it is fine if some terms of data are in the Portuguese language. One thing that I loved from this data is how rich the information is, you can get order status, the payment they used, the product itself, seller and buyer and even geolocation coordinates based on latitude and longitude. It will be a long journey of analytics and I wondering that the real data is much bigger than this, which this data is having around 120 MB.


I mentioned before that the data is so rich, having many things like order status, seller, buyer, and something like that. The data is so much, separated by multiple datasets like this picture.

Olist Data Schema (Source: Kaggle)

From this schema, we can imagine that there is a lot of work, for joining the data, or find what should we gather for getting data about customers review based on the product they bought. It will need the orders, customers and seller to gain information that this seller is recommended because of the product they sold is good, or you can thinking for another insight.

So, there are so many possibilities to do, from CLV (Customer Lifetime Value), RFM (Recency-Frequency-Monetary), Customer Segmentation, and many more, I will take some of them into next part. How is the analysis and of course what insight that we get for the business?


Because this data is based on Brazil, so you would find so many not-familiar products name because the default is in Portuguese. Thankfully, the dataset is provided translations for them so it is really helpful for us.

Sneak peek of translation product from Kaggle

Additional Data

If you think that this dataset is not enough, don’t worry, because they have added additional data (Marketing Funnel by Olist) which is very nice because you can sum it up with your previous insight, you can analyze how was the funnel and you can track it at what places people bought a product from Olist. It is so many data and of course, with our creativity, we can gain so much hidden insight.

Sneak peek of Marketing Funnel from Kaggle

I think this is enough for an introduction with data, if you want to know more, just check it from Kaggle in here. You can analyze as much as you can do because the data is diverse and you can get information from the seller, what product do they sell and buyer, what kind of buyer, type of payment until geolocation for the places.

In the next part, I would analyze the data, from what perspective? Stay tuned for the next part, because it would be a long journey of this data. Follow my Medium, my Curious with Data publication and Linkedin here and share it if you see it is very helpful. Feel free to ask me on Linkedin and see you soon in another article. Keep learning!

