Nerd For Tech
Published in

Nerd For Tech

Order Profile Analysis

Shopping has consequences

Image: sabinevanerp on pixabay

Olist is a Brazilian department store platform which operates in the e-commerce segment (Software as Service). The service consists of management of the sales process between shopkeepers and clients, and also includes a customer satisfaction report. The advantages for the shopkeepers is a better market presence and transparent reputation metrics. The driver for the business is to attract more clients and raise the quality of the process. The motivation in this project is to support this effort.

Order- and SKU profile

A shopping bill represents a small example of an order profile. It tells about what items are bought in which quantities. This seems ordinary and not very meaningful. But things change if you would buy for your two or three neighbors in the same trip. Strategies to bundle and combine items and how to plan the path through the shop will form. That saves time and money. This is the beginning of what is happening in distribution centers and in supply chain in general. The difference is that on a bigger level warehouse processes, order fulfillment strategies and also technology choices are impacted.


The goal of this analysis is to extract information that supports Olist’s business objectives. These objectives are: Attracting more shopkeepers by enhancing the service and attracting more end-customers through a broader product spectrum and higher customer satisfaction. The analysis is also part of a business case analysis that is looking into expanding the service to also include logistics and warehousing.

In the previous article (An e-commerce Data Story) Olist’s general business trend was investigated. The Order Profile Analysis now gives a deeper understanding about order composition which is an essential part of developing a successful warehouse- and logistic process. E-commerce proves to be a very challenging business field in that aspect due to the dynamics and often unpredictable daily order volumes. There is a range of “traditional” analysis outputs which provide a logistics specialist with insights into the operation: Pareto diagram, ABC classification, XYZ classification, Units-per-order distribution, Lines-per-order distribution, category breakdown are the most common ones. These are important metrics because they directly provide insights about warehouse concept and system sizing. Beside this, it is in many cases useful to understand the data in a bigger context, often including the characteristics of users and their shopping choices.

The questions for this part are in detail:

  1. What conclusions can we draw from order- and SKU-profile?
  2. Conclusions for the fulfillment process and inventory
  3. Product category association


The article is summarizing the major findings of an analysis which can be found in GitHub. Following steps and methods are applied:

  • Data preparation and wrangling
  • Exploratory Data Analysis
  • Order Profile Analysis
  • SKU-Velocity Analysis (Pareto)
  • ABC-XYZ Classification
  • Conclusion


The provided data consists of historical order data from 2016 to 2018 and contains 100,000 orders. There are 8 files available. The below data model displays high level the references between these data- and lookup tables. For the analysis in this post not all data tables will be used. The data was generously provided by Olist under the license CC BY-NC-SA 4.0 and can be found here in Kaggle.

Image by Kaggle/ Olist

Data Preparation

One already order data table that was prepared in the recent analysis will be imported. As always, the complete code can be reviewed and downloaded here. Because working with clean data is a key requirement, I always take care of some basics:

  • Understand the content of the data
  • Check missing data
  • Check if there are multiple records
  • Check data formats
  • Question if the results you see make sense
  • Comment your steps
  • Meaningful naming of variables
Photo by Aleks Dorohovich on Unsplash

Exploratory Data Analysis (EDA)

The conventional order profile analysis for warehousing consists of a set of outputs which describe the order composition, item properties and a order frequency of products. These metrics can be directly translated into warehouse processes. Let’s review some of the aspects around the products in a brief exploratory data analysis)

Top product categories

There’s a narrow group of the top moving product categories. As a refresher, we looked into the top moving product categories before:

Conventional retailers are often focused on only one specific market segment. This can be fashion, food or general merchandize. This is the logic consequence of maintaining a streamlined and effective process chain which is tailored to segment-specific characteristics. Olist on the other hand, serves a very wide market with apparently no targeted segment. When strategizing about incorporating warehousing services it needs to be clarified if specific categories need to be included.

How many orders come from the top customers?

There are over 96,000 customers. The big majority of these only placed one order, so that the average order frequency is just above 1. But there is a small group of customers which placed more than one order with a maximum of 16 orders.

A small experiment for a potential recommendation algorithm or Market Basket Analysis: Finding product category similarities

Matrix factorization is used to calculate distance (similarity) between categories. The method is based on comparing all ordered product categories in a purchase order to all other orders. The greater the overlap between the orders the greater the similarity. This method is also used in recommender engines. If a customer buys a product, the algorithm recognizes similarities to another product which might be interesting to buy as well. The recommendation function will be a part of a web app which will be developed for this project at a later stage.

Step 1: Create a matrix that contains the categories on the index and unique orders as the columns. The process is generating a so-called “Sparse Matrix”.

Step 2: A function takes in one category row and calculates the dot-product with all other category rows. The result is a list of dot-products which can be sorted according value. We find out which products are “closer” to other products in terms of the order history.

Step 3: Enter a few categories and review the results.

The examples show that “sports_leisure” is similar to “health_beauty”, “housewares” and “watches_gifts”. There is a whole field of analytics which tries to make use of the Affinity Analysis theory, because knowing that specific products are “going along nicely” or even trigger each other’s purchase attractiveness can impact marketing strategies and even inventory management in the warehouse.

Conclusion EDA

  • There is a broad mix of 73 product categories
  • Within these categories there is a small group which mainly generates Olist’s revenue
  • With matrix factorization product similarities can be calculated
Photo by Sikai Gu on Unsplash

Order Profile Analysis

What conclusions can we draw from order- and SKU-profile?

The order profile describes the composition of orders. How many products are contained in orders? What is the order quantity? How are these metrics distributed across the total dataset? Extracting this information for all orders in the data leads to three distribution charts:

  • Units per Order
  • Units per Order Line
  • Order Lines per Order

The above charts show what is very typical for e-commerce: Small orders with mostly one or two items. At the same time there are almost 33,000 SKUs (products). This has some implications:

  • A Market Basket Analysis will likely not produce a very representative result.
  • The order fulfillment concept will deal with a large proportion of SIOs (Single-Item-Orders), where every piece is an order by itself. This opens picking strategies which group those orders into batches which then can be processed together.
  • The order-consolidation process area, where all items of an order are collected and bundled to be shipped together, will be very small or even not existing.
  • 79% SIO (single-item-orders), 93% SLO (single-line-orders).

SKU-Profile Analysis

Pareto Chart

The pareto chart visualizes how individual SKUs (or products) contribute to either the order volume or revenue. The values are displayed as percentages and cumulate to 100% in each axis. When sorted by rank a curve appears that is know as Pareto chart. Depending how flat or steep the curve is an expert can combine this with other data and translate this into business- and warehouse processes.

Pareto for revenue:

Pareto for order volume:

The pareto chart for Olist shows a 70/20 profile for order volume (75/20 for revenue). That means 20% of all SKUs generate 70% of the total order volume (and 75% of the total revenue). 20% is in this case round about 6,600 SKUs.

A 20/80 velocity profile is a typical observation in many businesses. Under other circumstances this would be a good base for a product allocation with potentially a variation of picking methods. Olist’s order volume is not very high which puts this into a different perspective.

ABC-XYZ Classification

Typically, in the process of generating the Pareto chart SKUs are classified based on their velocity (relative selling volume) into ABC classes. Velocity of a SKU appears also as the slope of the Pareto curve. The fast moving SKUs (“A” in the chart below) have a steeper slope than the other SKUs.

There are following ABC classes are defined:

  • AA: This class is not always used
  • A: Very Fast Movers
  • B: Fast Movers
  • C: Medium Movers
  • D: Slow Movers

There is the temptation to already conclude a warehouse concept from this chart. In the conventional warehouse planning the pareto curve was in the past often done with the moved cube metric (how much cubic meter is leaving the warehouse) and then sorted to specific pick location types based on individual cube: Pick from pallet for the fastest movers, pick from flow rack for medium movers, pick from shelf for slow movers etc. For modern warehouse planning this is, by far, not a sufficient method anymore. New technologies, such as fast shuttle systems, robotics and AI change the options and the expectation level on the resulting KPIs are different.

Photo by Sigmund on Unsplash

The ABC classification is only a snapshot in time. Products in a warehouse are coming and going. A product that is today a fast movers can be a slow mover next week. We see the effect with seasonal products, such as lawn- and garden or winter clothes. Extreme can this effect become with fashionable and fast changing product categories. Velocity changes can also be self-inflicted through sales events.

In summary, the Pareto curve can be volatile or, in other words misleading. This can have severe consequences for warehouse concepts that are depending on a stable SKU profile. To address this challenge the logistics expert needs to be familiar with the business he is analyzing and potentially also conduct a so-called XYZ classification.

In the XYZ classification process SKUs are analyzed in respect to how frequently a product is picked, which is not necessarily the same as how much of the product is picked. If a product is picked frequently, let’s say every day, it is considered an X-mover, and in most of the cases therefore also a fast moving product (A-mover). Sugar and milk are such products. But in rare instances a product can be classified as fast mover but turns out to also be a Z-mover which means it is extremely active for only a short period of time. Firework products or items that are on final sale can land in such a category. Typically, in shops we find those products on a pallet in the aisle. The ABC-XYZ classification is therefore a very powerful tool for warehousing and logistics and required to spot such rare outliers.

The ABC-XYZ profile for Olist resulted in the following chart:

The steep distribution as seen in the Pareto chart is highly volatile. Products are sold very infrequently. This outcome should not be surprising considering the business model.

The current order profile is dealing with high uncertainty when it comes to inventory management. A majority of the SKUs fall into the DZ- or CZ category.

Olist’s warehouse concept is considered an entry level solution. But for these cases there is nowadays more than just planning with fixed installed equipment. There are new emerging technologies that can rapidly adapt to a fast growing e-commerce warehouse operation. AGVs and AMRs is one example. AI-powered object tracking is another.

Image: LinkedIn

The image above shows an emerging field in warehousing: Object tracking paired with Artificial Intelligence. The implementation effort is minimal and very fast. The system will learn from what it sees and grows into an advisor by predicting resulting performance.


What conclusions can we draw from order- and SKU-profile?

With almost 80% single-item-orders and 93% of the orders with only one product Olist is a typical e-commerce business. Orders are small and the product spectrum is wide. Olist serves no specific vertical market. The Pareto curve is relatively steep but extremely volatile.

Conclusions for the fulfillment process and inventory

Inventory levels have a high uncertainty and will have risks of overstocking or stock underestimation. The hypothetical warehouse would need have changes in the product mix and a limited selection of product categories to function. The order process would initially be manually with the potential of adding cutting-edge entry-level technologies, like AI-supported object tracking.

Product category association

A small test in the EDA section experimented with product category similarity calculation. Based on the order profile it can be assumed that the results will have not a high grade of precision. But it is a starting point for further investigation.

Next steps

The order structure provides a glance into the order composition and SKU-velocity profile. The results suggest to take a closer look at the customer side to also understand priorities, and geodemographic conditions and segmentation the could further add detail to the business case analysis.

Photo by Kats Weil on Unsplash

Thank you

I hope you found this article interesting. My background is engineering and warehousing and relatively new to Data Science. I’m therefore open and thankful for any kind of advice or suggestion.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store