Analyzing BigCommerce Data with R

Published in

BigCommerce Developer Blog

4 min readNov 19, 2020

In this blog post, we will explore how to extract transaction data for MBA (Market Basket Analysis) from BigCommerce using REST API.

About R

R is a programming language used in data science and statistics. It is open-source, free to use, and has an extensive ecosystem of libraries for different business and scientific cases.

Market Basket Analysis

MBA (Market Basket Analysis) helps BigCommerce store owners find out what products are good to sell together. The results of the analysis may be used in the initiatives to increase AOV (average order volume).

Examples of such initiatives:

Upselling and cross-selling on product and shopping cart pages
Automated upselling emails
Making product bundles (can be done via pick-lists in BigCommerce)

We will use R package arules created by Michael Hahsler for our analysis.

Getting Started

Install R and RStudio. R Studio, while not necessary, is highly recommended, and a good IDE to work with R. RStudio also has an open-source version that is free to use.
Obtain BigCommerce REST API keys. The process is described in BigCommerce developer documentation. Then save the key to a separate file that has a structure like

client_id <-”YOURID” 
client_secret <- “YOURSECRET” 
access_token <- “YOURTOKEN” 
store_hash <- “YOURSTOREHASH”

We saved the data above in file “api-keys.R”. To work with API we’ll use two R packages: httr and jsonlite, and will also use packages tidyverse and data.table for data wrangling.

Connect to BigCommerce API from R

Loading required libraries and define parameters for API connection

library(httr)
library(tidyverse)
library(jsonlite)
library(arules)
library(data.table)

Now load bigcommerce API connection details, read only API access is enough.

source("api-keys.R")
base_url <- paste0("https://api.bigcommerce.com/stores/", store_hash)
# auth header creation
auth_header <- c("X-Auth-Client"=client_id, "X-Auth-Token"=access_token)

Let’s now make a very simple request to test our API — obtain store details using “store” endpoint. The response we’ve got is JSON that is in R converted in list.

url <- paste0(base_url,”/v2/store”) 
store_info <- content(GET(url, add_headers(auth_header)), encoding = “UTF-8”) 
# check the results, get store name 
store_info$name
## [1] “Magenable Sandbox Store”

Getting Orders Data

To get order data we’ll use BigCommerce REST v2 API endpoint orders.

# note that for real store you need to use paging to look though all orders 
url <- paste0(base_url, “/v2/orders”) 
resp <- GET(url, add_headers(auth_header)) 
cont <- content(resp, as=”text”, encoding = “UTF-8”) 
c1 <- fromJSON(cont) 
# disable scientific number formatting to see IDs correctly options(scipen=999)

The result is a dataframe with 68 columns that contains extensive information about order — it’s time, sub-total, discounts, total, reference to customer etc. That’s a lot of information, we don’t need all that for our MBA analysis. Plus there are no details of the products ordered, so we need to make an extra call. Getting the data for products included in one specific order is possible through an individual call to order.

## getting products for specific order## get URL from specific orderurl <- c1$products$url[1] order <- content(GET(url, add_headers(auth_header)), encoding = “UTF-8”)# check the results, product SKU order[[1]][[“sku”]]## [1] “SKU-113–1.25L”

Now we can collect products ordered for all orders going through all orders.

### get all products sold with only required data
 all_orders <- data.frame(matrix(nrow=0,ncol=4))
 prod_colnames <- c("order_id","product_id", "sku", "quantity")
 colnames(all_orders)<- c(prod_colnames)
 n=nrow(c1)
 for (i in 1:n){
   t <-data.frame(matrix(nrow=1,ncol=4))
   colnames(t) <- prod_colnames
   url <- c1$products$url[i]
   products <- content(GET(url,  add_headers(auth_header)))
   n_prod <- length(products)
   for (j in 1:n_prod){
     
     t$order_id <- products[[j]][["order_id"]]
     t$product_id <- products[[j]][["product_id"]]
     t$sku <- products[[j]][["sku"]]
     t$quantity <- products[[j]][["quantity"]]
     all_orders <- rbind(all_orders, t)  
   }
   
 }
 ## clean up empty sku
 all_orders <- all_orders %>% filter (sku!="", product_id!=0)
 ## review
 head(all_orders)##     order_id product_id           sku quantity
 ## 1        100        113 SKU-113-1.25L        1
 ## 2 1234000004        112     nmn-hamhs        1
 ## 3 1234000005        115     115-GR-XL        1
 ## 4 1234000006        115     115-GR-XL        1
 ## 5 1234000007        115     115-GR-XL        1
 ## 6 1234000007        113 SKU-113-1.25L        1

The resulting dataframe contains one product per row, for the cases if more than one product was purchased we have multiple rows for the same order id. In order to run MBA with arules we need to transform the data a bit.

# converting to transaction format
 order_baskets <- all_orders %>% group_by(order_id) %>% 
   summarise(basket = as.vector(list(sku)))## `summarise()` ungrouping output (override with `.groups` argument)transactions <- as(order_baskets$basket, "transactions")
 inspect(transactions[7])##     items                 
 ## [1] {240-LV08,MS12-S-Blue}

Let’s review how often different products were bought in general

# Make Item Frequency Plot
 library("RColorBrewer")
 arules::itemFrequencyPlot(transactions,
                           topN=20,
                           col=brewer.pal(8,'Pastel2'),
                           main='Relative Item Frequency Plot',
                           type="relative",
                           ylab="Item Frequency (Relative)")

Running Analysis With Arules

Now we have the data from the BigCommerce store in the format that can be used for analysis with arules.

# Implementing Apriori Algorithm
 rules <- apriori(transactions, parameter = list(support = 0.005, confidence = 0.25))
 # Remove redundant rule  
 rules <- rules[!is.redundant(rules)]
 rules_dt <- data.table( lhs = labels( lhs(rules) ), 
                         rhs = labels( rhs(rules) ), 
                         quality(rules) )[ order(-lift), ]
 head(rules_dt,5)

The resulting table contains information about what products are good for selling together. lhs is one product, rhs is another product, support, confidence, coverage, and lift are rule parameters. For example, the first rule means that products with SKUs 240-LV08 and MS12-S-Blue are good for pairing.

Further Improvements And References

MBA is an extensive topic, we did very basic analysis here in this blog post. You may learn more about MBA plus more interesting examples with visualizations in this and that articles, which were used as inspiration for this post.

Also note that the code above isn’t optimized for work with big data, at a minimal level don’t forget to implement the logic to look through more than one page of API request results for getting all orders.