Tracking Inefficient Processes in E-commerce with Data Science

Emerson Santos
LatinXinAI
Published in
6 min readMar 3, 2024

Digital commerce has brought a marked disruption to traditional physical commerce with the change in the sales channel from physical to digital. This allowed leverage in the consumption of products, greater ease when purchasing, in addition to greater knowledge about what is being purchased.

Now, companies with digital sales channels have much faster and more direct knowledge about the user’s experience with the company and the product purchased through the evaluation tool. In physical commerce, the thermometer for customer satisfaction is repurchase. With e-commerce, the consumer can give detailed feedback on positive and negative aspects of their experience and the company, in turn, can analyze this data through data science and thus promote improvements in both the product and the processes that govern sales online.

Case study

These are data from 100 thousand orders from a Brazilian E-commerce collected between the years 2016 and 2018 and comprise information about the main entities of a digital commerce ecosystem such as products, orders, customers, sellers, reviews, methods of payment and geolocation.

Analysis

As stated, consumer feedback through the evaluation tool provides us with valuable information about the user experience in the online purchasing process. The assessment tool captures two very important pieces of information about the order:

1. Rating Score: User satisfaction rating using a scale of 1 to 5, ranging from “very bad” to “very good”;

2. Review Comment: User comment on aspects such as product quality, delivery and company support.

Thus, initially the proportion between the different degrees of user satisfaction on the rating scale was analyzed in the 100 thousand E-commerce orders using the histogram in Figure 1.

It is noted that 59% of orders received the maximum score (5) from consumers, indicating maximum satisfaction. This means that in 41% of orders, around 41 thousand orders in the period from 2016 to 2018, consumers reported having some type of dissatisfaction regarding some aspect related to the purchasing process, the product, delivery, support or something in this regard.

As the Rating Score only shows us the level of consumer satisfaction, it is essential for the company to also analyze the Review Comments of users who had some type of dissatisfaction, as this way it can understand the main factors that generated them and from there the company can correct them. To this end, the Word Cloud in Figure 2 was generated from reviews with a rating score below 4, where the more frequent the word is, the larger the font it will be plotted.

Translated from Portuguese, the most frequent dissatisfaction phrase in review comments is “I didn’t receive it”, which indicates that there is a high probability that the main factor of customer dissatisfaction is related to the delivery of the product, either due to failure to meet the deadline or due to loss of the product.
Considering that when a product is lost by the carrier, another identical product is sent again, we can deal with both from the point of view of whether or not the delivery deadline is met.

To be more confident in the assertion that there is a causal relationship between failure to meet the delivery deadline and customer dissatisfaction, it is necessary to analyze the data from a different point of view. Figure 3 relates the (number of days of early or late delivery in relation to the time the product would arrive, according to the company) x (average customer rating score). It is noted that customers’ rating scores drop exponentially with the increase in order delays, which reaffirms our previous hypothesis. Therefore, it is necessary to analyze the scenario behind backlogs in more depth so that, from there, the company knows where it should make interventions.

There are many steps that make up the product delivery process:

1. Receipt of the order;
2. Product separation;
3. Issuance of documents;
4. Product packaging;
5. Posting the order to the carrier;
6. Transport of the order to the address.

Steps 1 to 5, which are faster, are carried out by the company itself, while stage 6, which is longer, is carried out by a third-party transport company. The sum of the times for each of the 6 processes is called OFCT (Order Fulfillment Cycle Time), which is the total time between purchase and delivery of the order. Therefore, it is important to analyze, in delayed orders, whether there was a failure on the part of the ecommerce itself in the order processing stages or by the carrier in the order delivery stage.

Let’s reasonably conjecture that the average order processing time in a structured ecommerce is up to 24 hours. Analyzing the processing times for backlogged orders, it can be seen from the distribution and pie chart in Figure 4 that 83% of backlogged orders are outside the ideal processing time of 24 hours, thus indicating that the steps that make up the processing of orders are being executed inefficiently, thus requiring intervention from the company.

From the perspective of transporting orders, Mathematical Modeling was used to infer how long the carrier had stipulated to deliver each order and from there we could analyze whether the stipulations were complied with or not by the carrier. It can be seen from the distribution and pie chart in Figure 5 that the carrier failed to meet the stipulated delivery time in 65% of late orders.

Therefore, it is concluded that delays in orders arise, in different intensities, both from the company’s inefficiency in the order processing stages and from the carrier’s failure to meet the stipulated deadlines.

However, as companies’ human and financial resources for problem solving are scarce, it is important to create a Pareto Diagram to understand which of the two factors, when resolved, would be more efficient in reducing the number of backlogs, this being the ones the company would give highest priority to solving.

As can be seen in the Pareto Diagram in Figure 6, there would be a decrease of 29% and 19% in the number of backlogs in the individual resolution of the processing time and transportation time factors, respectively. The joint resolution of the two factors, however, would result in a 100% reduction in the number of backlogs.

Conclusion

Therefore, it can be concluded that, in order to reduce the main dissatisfaction of E-commerce customers, the company should first make improvements in the order processing stages in order to increase its efficiency and thus process orders within 24 hours. Subsequently, as an alternative to the transportation time problem, the company could hire another third-party carrier that would, for the most part, comply with the stipulated deadlines for transporting orders. Thus, there would be greater customer satisfaction, an increase in the average evaluation of orders and, as a consequence, a boost in the number of E-commerce sales.

I hope this article has somehow provided you with some value. Feel free to leave suggestions or comments below or find me on Linkedin. The Python code used to create and feed the database, as well as generate the analyzes discussed here can be accessed through my Github.

Emerson Santos

LatinX in AI (LXAI) logo

Do you identify as Latinx and are working in artificial intelligence or know someone who is Latinx and is working in artificial intelligence?

Don’t forget to hit the 👏 below to help support our community — it means a lot!

--

--

Emerson Santos
LatinXinAI

As a qualified Data Scientist and Engineer with experience in solving challenging problems, I propose innovative and creative solutions using Data Science.