Analyzing BigCommerce Data with R

Alex Levashov
BigCommerce Developer Blog
4 min readNov 19, 2020

In this blog post, we will explore how to extract transaction data for MBA (Market Basket Analysis) from BigCommerce using REST API.

About R

R is a programming language used in data science and statistics. It is open-source, free to use, and has an extensive ecosystem of libraries for different business and scientific cases.

Market Basket Analysis

MBA (Market Basket Analysis) helps BigCommerce store owners find out what products are good to sell together. The results of the analysis may be used in the initiatives to increase AOV (average order volume).

Examples of such initiatives:

  • Upselling and cross-selling on product and shopping cart pages
  • Automated upselling emails
  • Making product bundles (can be done via pick-lists in BigCommerce)

We will use R package arules created by Michael Hahsler for our analysis.

Getting Started

  1. Install R and RStudio. R Studio, while not necessary, is highly recommended, and a good IDE to work with R. RStudio also has an open-source version that is free to use.
  2. Obtain BigCommerce REST API keys. The process is described in BigCommerce developer documentation. Then save the key to a separate file that has a structure like
client_id <-”YOURID” 
client_secret <- “YOURSECRET”
access_token <- “YOURTOKEN”
store_hash <- “YOURSTOREHASH”

We saved the data above in file “api-keys.R”. To work with API we’ll use two R packages: httr and jsonlite, and will also use packages tidyverse and data.table for data wrangling.

Connect to BigCommerce API from R

Loading required libraries and define parameters for API connection

library(httr)
library(tidyverse)
library(jsonlite)
library(arules)
library(data.table)

Now load bigcommerce API connection details, read only API access is enough.

source("api-keys.R")
base_url <- paste0("https://api.bigcommerce.com/stores/", store_hash)
# auth header creation
auth_header <- c("X-Auth-Client"=client_id, "X-Auth-Token"=access_token)

Let’s now make a very simple request to test our API — obtain store details using “store” endpoint. The response we’ve got is JSON that is in R converted in list.

url <- paste0(base_url,”/v2/store”) 
store_info <- content(GET(url, add_headers(auth_header)), encoding = “UTF-8”)
# check the results, get store name
store_info$name
## [1] “Magenable Sandbox Store”

Getting Orders Data

To get order data we’ll use BigCommerce REST v2 API endpoint orders.

# note that for real store you need to use paging to look though all orders 
url <- paste0(base_url, “/v2/orders”)
resp <- GET(url, add_headers(auth_header))
cont <- content(resp, as=”text”, encoding = “UTF-8”)
c1 <- fromJSON(cont)
# disable scientific number formatting to see IDs correctly options(scipen=999)

The result is a dataframe with 68 columns that contains extensive information about order — it’s time, sub-total, discounts, total, reference to customer etc. That’s a lot of information, we don’t need all that for our MBA analysis. Plus there are no details of the products ordered, so we need to make an extra call. Getting the data for products included in one specific order is possible through an individual call to order.

## getting products for specific order## get URL from specific orderurl <- c1$products$url[1] order <- content(GET(url, add_headers(auth_header)), encoding = “UTF-8”)# check the results, product SKU order[[1]][[“sku”]]## [1] “SKU-113–1.25L”

Now we can collect products ordered for all orders going through all orders.

### get all products sold with only required data
all_orders <- data.frame(matrix(nrow=0,ncol=4))
prod_colnames <- c("order_id","product_id", "sku", "quantity")
colnames(all_orders)<- c(prod_colnames)
n=nrow(c1)
for (i in 1:n){
t <-data.frame(matrix(nrow=1,ncol=4))
colnames(t) <- prod_colnames
url <- c1$products$url[i]
products <- content(GET(url, add_headers(auth_header)))
n_prod <- length(products)
for (j in 1:n_prod){

t$order_id <- products[[j]][["order_id"]]
t$product_id <- products[[j]][["product_id"]]
t$sku <- products[[j]][["sku"]]
t$quantity <- products[[j]][["quantity"]]
all_orders <- rbind(all_orders, t)
}

}
## clean up empty sku
all_orders <- all_orders %>% filter (sku!="", product_id!=0)
## review
head(all_orders)
## order_id product_id sku quantity
## 1 100 113 SKU-113-1.25L 1
## 2 1234000004 112 nmn-hamhs 1
## 3 1234000005 115 115-GR-XL 1
## 4 1234000006 115 115-GR-XL 1
## 5 1234000007 115 115-GR-XL 1
## 6 1234000007 113 SKU-113-1.25L 1

The resulting dataframe contains one product per row, for the cases if more than one product was purchased we have multiple rows for the same order id. In order to run MBA with arules we need to transform the data a bit.

# converting to transaction format
order_baskets <- all_orders %>% group_by(order_id) %>%
summarise(basket = as.vector(list(sku)))
## `summarise()` ungrouping output (override with `.groups` argument)transactions <- as(order_baskets$basket, "transactions")
inspect(transactions[7])
## items
## [1] {240-LV08,MS12-S-Blue}

Let’s review how often different products were bought in general

# Make Item Frequency Plot
library("RColorBrewer")
arules::itemFrequencyPlot(transactions,
topN=20,
col=brewer.pal(8,'Pastel2'),
main='Relative Item Frequency Plot',
type="relative",
ylab="Item Frequency (Relative)")
Relative Item Frequency Plot

Running Analysis With Arules

Now we have the data from the BigCommerce store in the format that can be used for analysis with arules.

# Implementing Apriori Algorithm
rules <- apriori(transactions, parameter = list(support = 0.005, confidence = 0.25))
# Remove redundant rule
rules <- rules[!is.redundant(rules)]
rules_dt <- data.table( lhs = labels( lhs(rules) ),
rhs = labels( rhs(rules) ),
quality(rules) )[ order(-lift), ]
head(rules_dt,5)

The resulting table contains information about what products are good for selling together. lhs is one product, rhs is another product, support, confidence, coverage, and lift are rule parameters. For example, the first rule means that products with SKUs 240-LV08 and MS12-S-Blue are good for pairing.

Further Improvements And References

MBA is an extensive topic, we did very basic analysis here in this blog post. You may learn more about MBA plus more interesting examples with visualizations in this and that articles, which were used as inspiration for this post.

Also note that the code above isn’t optimized for work with big data, at a minimal level don’t forget to implement the logic to look through more than one page of API request results for getting all orders.

--

--