Measure Logistics Performance in Ecommerce with R example

Lyn Chen
Data Room
Published in
4 min readJul 12, 2020
( From https://www.pexels.com/, Creator: Tom Fisk)

Logistics in Ecommerce

Logistics plays the most important part of the Ecommerce world. If customers don’t receive the packages on time, the platforms will lose customers’ confidence and trust.

In Taiwan, The most attractive selling point of PChome, an eCommerce platform, is delivering the goods in 24 hours. How good its logistic service is!

How to measure logistics performances? First, find which order is late on delivery time.

Below using the dataset from Shopee code league 2020 to demonstrate.

Define the Problems

How to define the delivery days?

  1. The delivery days vary from the routes between buyers’ (Destination) and sellers’ (Origin) regions.

Below we use SLA (Service Level Agreements) to represent.

(The reference is from Shopee code league 2020)

2. Working days are from Monday to Saturday except for public holidays.

In this example, we have four holidays during this period.

a. 2020–03–08 (Sunday)

b. 2020–03–25 (Wednesday)

c. 2020–03–30 (Monday)

d. 2020–03–31 (Tuesday)

3. The pickup packages dates are defined as Day 0.

4. If the customers’ don’t receive the packages on 1st attempt delivery, the 2nd attempt delivery should be finished in 3 working days.

About Dataset

This dataset is from Shopee code league 2020.

A glimpse of the dataset:

Volume: 3,176,313 observevations

Columns: orderid, pick (the date of receiving goods), X1st_deliver_attempt (the date of the first attempt delivery), X2nd_deliver_attempt(the date of the second attempt delivery), buyeraddress, selleraddress)

Timezone: Asia, Manila, GMT+8

The target result:

Columns: orderid, is_late ( 1 : the delivery is late, 0 : the delivery in on time)

R Example

# Create a empty dataframelate_delivery <- data.frame()# Define the holidaysholidays <- c('2020-03-08', '2020-03-25','2020-03-30','2020-03-31')# Calculate how many delivery days of each order

day_diff <- function(x, y) {
# If there is no second attempt delivery, just return NA.
if(!is.na(y)){ # List out each date between pickup date and 1st delivery date.
# Removed the first day.
# Time columns are only left with date info.(Regardless of more than or within 24 hours, the date is different count as a day)
# Defining the timezone to prevent the confusions
day <- seq(as.Date(x, tz='Asia/Manila'),as.Date(y, tz='Asia/Manila'),by='days')[-1] # Convert the format of the date to the same format of holidays
date <- format(day,"%Y-%m-%d")
# If the days aren't Sunday and not holidays, count the day as a working day
return(sum(!format(day, "%u") %in% "7" & !date %in% holidays))
}else return(NA)
}
# The addresses are left only with city informationcity <- function(x){
return (tail(strsplit(x, " ")[[1]],1))
}
# The volume of the dataset is big, divide the dataset into multiple baches to manipulate easily.for (i in seq(0,3176313,500000)){

# Read the data
data <- read.csv('/Users/chenlyn/Desktop/Logistics/delivery_orders_march.csv', skip = i, nrows=500000, stringsAsFactors = FALSE) # Set the names of columns colnames(data) <-c("orderid", "pick", "X1st_deliver_attempt", "X2nd_deliver_attempt", "buyeraddress", "selleraddress")

# Convert the time format to readable time format
data[,'pick'] <- as.POSIXct(as.numeric(as.character(data[, 'pick'])),origin="1970-01-01",tz='Asia/Manila') data[,'X1st_deliver_attempt'] <- as.POSIXct(as.numeric(as.character(data[, 'X1st_deliver_attempt'])),origin="1970-01-01",tz='Asia/Manila') data[,'X2nd_deliver_attempt'] <- as.POSIXct(as.numeric(as.character(data[, 'X2nd_deliver_attempt'])),origin="1970-01-01",tz='Asia/Manila')

# The addresses are left only with city information
data$buyeraddress <- sapply(data$buyeraddress,city)
data$selleraddress <- sapply(data$selleraddress,city)
# Add a SLA column depending on the reference
data[which(grepl('Visayas|Mindanao',data$buyeraddress,ignore.case=TRUE)),'SLA'] <- 7
data[which(grepl('Visayas|Mindanao',data$selleraddress,ignore.case=TRUE)),'SLA'] <- 7
data[which(grepl('Luzon',data$buyeraddress,ignore.case=TRUE) & (grepl('Manila|Luzon',data$selleraddress,ignore.case=TRUE))),'SLA'] <- 5data[which(grepl('Manila',data$buyeraddress,ignore.case=TRUE) & (grepl('Luzon',data$selleraddress,ignore.case=TRUE))),'SLA'] <- 5data[which(grepl('Manila',data$buyeraddress,ignore.case=TRUE) & (grepl('Manila',data$selleraddress,ignore.case=TRUE))),'SLA'] <- 3

# Remove address information to manipulate the date more efficiently
data <- data[,c("orderid", "pick", "X1st_deliver_attempt", "X2nd_deliver_attempt", 'SLA')]

# Calculate how many delivery days of each order
data[,'Timediff'] <- mapply(day_diff,x= data$pick, y = data$X1st_deliver_attempt)
data[,'2nd-Timediff'] <- mapply(day_diff,x= data$X1st_deliver_attempt, y = data$X2nd_deliver_attempt)

# Decide if the order is late depending on SLA and less than 3 working days between 1st and 2nd attempt delivery dates
data[,'is_late'] <- ifelse(data$Timediff<= data$SLA & (data$`2nd-Timediff`<= 3 | is.na(data$`2nd-Timediff`)),0,1)

# Remain the desired column
data <- data[,c('orderid','is_late')]

# Combine the batches of data together
late_delivery <- rbind(late_delivery, data)
}
# Export the data to csv format and prevent data from being shown with exponential numberwrite.csv(format(late_delivery,digits=20), '/Users/chenlyn/Desktop/Logistics/submission.csv', row.names = FALSE)

The final result is score 1.0, but how to optimize efficiency is the next problem.

Thanks for reading this article.

If there is any suggestion, it would be appreciated.

--

--