Measure Logistics Performance in Ecommerce with R example

Published in

Data Room

4 min readJul 12, 2020

( From https://www.pexels.com/, Creator: Tom Fisk)

Logistics in Ecommerce

Logistics plays the most important part of the Ecommerce world. If customers don’t receive the packages on time, the platforms will lose customers’ confidence and trust.

In Taiwan, The most attractive selling point of PChome, an eCommerce platform, is delivering the goods in 24 hours. How good its logistic service is!

How to measure logistics performances? First, find which order is late on delivery time.

Below using the dataset from Shopee code league 2020 to demonstrate.

Define the Problems

How to define the delivery days?

The delivery days vary from the routes between buyers’ (Destination) and sellers’ (Origin) regions.

Below we use SLA (Service Level Agreements) to represent.

(The reference is from Shopee code league 2020)

2. Working days are from Monday to Saturday except for public holidays.

In this example, we have four holidays during this period.

a. 2020–03–08 (Sunday)

b. 2020–03–25 (Wednesday)

c. 2020–03–30 (Monday)

d. 2020–03–31 (Tuesday)

3. The pickup packages dates are defined as Day 0.

4. If the customers’ don’t receive the packages on 1st attempt delivery, the 2nd attempt delivery should be finished in 3 working days.

About Dataset

This dataset is from Shopee code league 2020.

A glimpse of the dataset:

Volume: 3,176,313 observevations

Columns: orderid, pick (the date of receiving goods), X1st_deliver_attempt (the date of the first attempt delivery), X2nd_deliver_attempt(the date of the second attempt delivery), buyeraddress, selleraddress)

Timezone: Asia, Manila, GMT+8

The target result:

Columns: orderid, is_late ( 1 : the delivery is late, 0 : the delivery in on time)

R Example

# Create a empty dataframelate_delivery <- data.frame()# Define the holidaysholidays <- c('2020-03-08', '2020-03-25','2020-03-30','2020-03-31')# Calculate how many delivery days of each order
 
day_diff <- function(x, y) {
  # If there is no second attempt delivery, just return NA.  if(!is.na(y)){    # List out each date between pickup date and 1st delivery date. 
    # Removed the first day.
    # Time columns are only left with date info.(Regardless of more than or within 24 hours, the date is different count as a day)
    # Defining the timezone to prevent the confusions    day <- seq(as.Date(x, tz='Asia/Manila'),as.Date(y, tz='Asia/Manila'),by='days')[-1]    # Convert the format of the date to the same format of holidays
    date <- format(day,"%Y-%m-%d")    # If the days aren't Sunday and not holidays, count the day as a working day
    return(sum(!format(day, "%u") %in% "7" & !date %in% holidays))   }else return(NA)
}# The addresses are left only with city informationcity <- function(x){
  return (tail(strsplit(x, " ")[[1]],1))
}# The volume of the dataset is big, divide the dataset into multiple baches to manipulate easily.for (i in seq(0,3176313,500000)){
  
  # Read the data  data <- read.csv('/Users/chenlyn/Desktop/Logistics/delivery_orders_march.csv', skip = i, nrows=500000, stringsAsFactors = FALSE)  # Set the names of columns  colnames(data) <-c("orderid", "pick", "X1st_deliver_attempt", "X2nd_deliver_attempt", "buyeraddress", "selleraddress")
  
  # Convert the time format to readable time format   data[,'pick'] <- as.POSIXct(as.numeric(as.character(data[, 'pick'])),origin="1970-01-01",tz='Asia/Manila')  data[,'X1st_deliver_attempt'] <- as.POSIXct(as.numeric(as.character(data[, 'X1st_deliver_attempt'])),origin="1970-01-01",tz='Asia/Manila')  data[,'X2nd_deliver_attempt'] <- as.POSIXct(as.numeric(as.character(data[, 'X2nd_deliver_attempt'])),origin="1970-01-01",tz='Asia/Manila')
  
  # The addresses are left only with city information  data$buyeraddress <- sapply(data$buyeraddress,city)
  data$selleraddress <- sapply(data$selleraddress,city)# Add a SLA column depending on the reference
  data[which(grepl('Visayas|Mindanao',data$buyeraddress,ignore.case=TRUE)),'SLA'] <- 7
  data[which(grepl('Visayas|Mindanao',data$selleraddress,ignore.case=TRUE)),'SLA'] <- 7data[which(grepl('Luzon',data$buyeraddress,ignore.case=TRUE) & (grepl('Manila|Luzon',data$selleraddress,ignore.case=TRUE))),'SLA'] <- 5data[which(grepl('Manila',data$buyeraddress,ignore.case=TRUE) & (grepl('Luzon',data$selleraddress,ignore.case=TRUE))),'SLA'] <- 5data[which(grepl('Manila',data$buyeraddress,ignore.case=TRUE) & (grepl('Manila',data$selleraddress,ignore.case=TRUE))),'SLA'] <- 3
  
  # Remove address information to manipulate the date more efficiently  data <- data[,c("orderid", "pick", "X1st_deliver_attempt", "X2nd_deliver_attempt", 'SLA')]
  
  # Calculate how many delivery days of each order  data[,'Timediff'] <- mapply(day_diff,x= data$pick, y = data$X1st_deliver_attempt)
  data[,'2nd-Timediff'] <- mapply(day_diff,x= data$X1st_deliver_attempt, y = data$X2nd_deliver_attempt)
  
  # Decide if the order is late depending on SLA and less than 3 working days between 1st and 2nd attempt delivery dates  data[,'is_late'] <- ifelse(data$Timediff<= data$SLA & (data$`2nd-Timediff`<= 3 | is.na(data$`2nd-Timediff`)),0,1)
  
  # Remain the desired column  data <- data[,c('orderid','is_late')]
  
  # Combine the batches of data together  late_delivery <- rbind(late_delivery, data)
}# Export the data to csv format and prevent data from being shown with exponential numberwrite.csv(format(late_delivery,digits=20), '/Users/chenlyn/Desktop/Logistics/submission.csv', row.names = FALSE)

The final result is score 1.0, but how to optimize efficiency is the next problem.

Thanks for reading this article.

If there is any suggestion, it would be appreciated.

Measure Logistics Performance in Ecommerce with R example

Logistics in Ecommerce

Define the Problems

About Dataset

R Example

Written by Lyn Chen