How to Extract Data from Walmart Open API?

Getting Started : Walmart Data Tutorial Lab #1 Using RStudio

Introduction

Walmart is the world’s largest retailer, and the Walmart Open API provides access to our extensive product catalog, thus enabling digital distribution partners to earn substantial affiliate revenues from customer referrals.

Walmart is an American multinational retail corporation that operates a chain of hypermarkets, discount department stores, and grocery stores. Headquartered in Bentonville, Arkansas, the company was founded by Sam Walton in 1962 and incorporated on October 31, 1969.

Walmart’s picture from the article“Birchbox in Talks With Retailers Including Walmart for Sale” by Reuters

Walmart’s open API is currently in beta. APIs developers can work with a variety of APIs, which are:

  • Lookup API : provides price, availability etc of the item. Require two arguments: item id (required) and format (json or xml.)
  • Reviews API : provides reviews of the item. Require two arguments: item id (required) and format (json or xml.)
  • Search API : allows text search on the Walmart.com catalogue and returns matching items available for sale online. It accepts 5 arguments: search text query, format, categoryId, facet, facet.filter, and facet.range
  • Value of the Day : provides Value of the Day on walmart
  • Taxonomy API : exposes the category taxonomy used by walmart.com to categorize items
  • Store Locator API : Store Locator API
  • Trending API : Returns trending items on walmart.com
  • Paginated API : New paginated API to fetch items
Snapshot of “Developer Console” to try out the API services

In this article, I provide the tutorial on how to extract data from Walmart Open API using RStudio and discuss possible research ideas that anyone can get started analyzing.


Registering Walmart API Instructions

  1. On the top right corner of your browser, please click “Register” (it’s situated next to “Sign In”
  2. Please register for an account. There are instructions for each step. It is important to give name to your new application (which you may change it later if you feel like doing so.) Please tick “Issue a new key for Product APIs” as well.

3. Here’s a snapshot of my application after you have successfully registered your new application.

Below your Application is “your API key”. You must have API key to run the code
Walmart API has a certain limit per application.

4. To make sure everything’s working fine, you can use your API key to run in “Developer Console” as shown in the introduction section.

I tried querying “Ipad” in Search API. The response status is 200, meaning that this request is approved and runs fine. You can see the response body for what sort of variables the API returns.

R in Action : Barging into Walmart Open API

Please make sure you have your R and Rstudio ready in order to replicate this data tutorial. I will walk-through several APIs for you.

Package Dependencies

library(jsonlite) # Convert R objects to/from JSON
library(plyr) # Tools for Splitting, Applying and Combining Data

Taxonomy API

# For Taxonomy API, the only parameter you need to specify is "format" (either json or 
taxonomy_url <- "http://api.walmartlabs.com/v1/taxonomy?apiKey=[Please Add Your API Key Here]&format=json".
taxonaomy_list <- jsonlite::read_json(taxonomy_url)# Return 31 categories as of 4/21/2018
length(taxonaomy_list$categories)
# Create an empty data frame
df = data.frame(Parent_Category = character(0))
# Create a taxonomy dataframe containing parent category id, parent category title, and total number of related categories
for (j in (1:length(taxonaomy_list$categories))){
k1 = taxonaomy_list$categories[[j]]$id
k2 = taxonaomy_list$categories[[j]]$name
k3 = length(taxonaomy_list$categories[[j]]$children)
df2 <- data.frame(Parent_Category_ID = k1,
Parent_Category_Title= k2,
Total_Related_Category = k3)
df <- rbind(df, df2)
}
rm(df2)head(df, 10)
Taxonomy Dataframe Contain Parent Category ID, Parent Category Title, and the Total Number of Related Categories, for example, Arts, Crafts & Sewing has a total of 10 related categories (so-called “Children” as in the API documentation.)

You may wonder what are those 10 related ones to Arts, Craft & Sewing. Here’s a script to get each of them. Some of them are “Art & Drawing Supplies”, “Arts, Crafts & Sewing”, “Beading & Jewelry Making”, and “Yarn” etc. You may run a for-loop operation to dive down to the bottom of taxonomy.


Review API

# 33093101 = Apple Ipad
ipad_query <- http://api.walmartlabs.com/v1/reviews/33093101?apiKey={apiKey}&lsPublisherId={Your LinkShare Publisher Id}&format=json
kyle_query <- jsonlite::read_json(ipad_query)
summary(kyle_query)
Summary of Apple Ipad Queries using “Reviews API”.

We extracted “ItemID”, “Name”, “SalePrice”, “Unique Product Code (UPC)”, “Category Path”, “Brand Name”, “Product Tracking URL”, “Product URL”, “Category Node”, “List of Reviews”, “List of Reviews Statistics”, “Next Page”, and “Available Online”. As an example, we will take a peak in the review data of “Apple Ipad” on Walmart E-Commerce platform.

df3 = data.frame(
name = character(0),
reviewer = character(0),
reviewText = character(0),
title = character(0),
upVotes = character(0),
downVotes = character(0)
)
for (i in 1:length(kyle_query$reviews)){
df4 <- data.frame(name = as.character(kyle_query$reviews[[i]]$name),
reviewer = as.character(kyle_query$reviews[[i]]$reviewer),
reviewText = as.character(kyle_query$reviews[[i]]$reviewText),
title = as.character(kyle_query$reviews[[i]]$title),
upVotes = as.character(kyle_query$reviews[[i]]$upVotes),
downVotes = as.character(kyle_query$reviews[[i]]$downVotes))
df3 <- rbind(df3, df4)
}

Walmart Search API

Let’s experiment this by retrieving all the products that starts with “a”, “b”, “c”, … “z” respectively. Formally, we will call it as an accumulator. (there are 26 accumulators as a result.) The Search API is very powerful and providing a large scale of details ranging from product sale price, short description, long description, to product images and customer rating etc.

accumulator = c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", 
"n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z")
length(accumulator) # 26
accumulator[1] # "a"
accumulator[2] # "b"
accumulator[3] # "c"
...
...
...
accumulator[26] # "z"full_search = data.frame(itemId = character(0),
name= character(0),
msrp = character(0),
salePrice = character(0),
upc = character(0),
categoryPath = character(0),
shortDescription = character(0),
longDescription= character(0),
brandName= character(0),
thumbnailImage = character(0),
mediumImage = character(0),
largeImage = character(0),
productTrackingUrl = character(0),
ninetySevenCentShipping = character(0),
standardShipRate= character(0),
size= character(0),
color = character(0),
marketplace = character(0),
shipToStore = character(0),
freeShipToStore = character(0),
modelNumber = character(0),
productUrl= character(0),
customerRating= character(0),
numReviews = character(0),
variants = character(0),
customerRatingImage = character(0),
categoryNode = character(0),
bundle = character(0),
clearance= character(0),
preOrder= character(0),
stock = character(0),
attributes = character(0),
addToCartUrl = character(0),
affiliateAddToCartUrl = character(0),
freeShippingOver50Dollars = character(0),
maxItemsInOrder = character(0),
giftOptions = character(0),
imageEntities = character(0),
offerType = character(0),
isTwoDayShippingEligible = character(0),
availableOnline= character(0),
parentItemId = character(0),
sellerInfo = character(0),
seeDetailsInCart = character(0)
)
# We will start writing a for-loop. We need to use what we did
# in Taxonomy API for Search API as well.
for (j in (1:length(accumulator))){
for (k in (1:length(taxonaomy_list$categories))){
Sys.sleep(0.5)
url <- paste('http://api.walmartlabs.com/v1/search?query=', accumulator[j],
'&format=json&categoryId=', df$Parent_Category_ID[k],
'&apiKey=[Please Insert Your API Key]&numItems=25', sep = "")
for (m in url){
Sys.sleep(0.5)
query_list = jsonlite::read_json(m)
iterator = plyr::rbind.fill(lapply(query_list$items, function(y){as.data.frame(t(y),stringsAsFactors=FALSE)}))
full_search = plyr::rbind.fill(full_search, iterator)
}
}
}
full_search<- as.data.frame(full_search)
full_search$parentItemId <- as.character(full_search$parentItemId)
full_search$sellerInfo <-as.character(full_search$sellerInfo)
full_search$seeDetailsInCart <-as.character(full_search$seeDetailsInCart)
full_search <- unique(full_search)
a Snapshot of Resulting Data Frame from Search API

Good job there. For supplementary datasets, I recommend looking at:

  • Walmart Recruiting — Store Sales Forecasting — use historical markdown data to predict store sales)
  • Walmart Dataset — a collection of Walmart import records, distributional centers, and vendors etc.
  • Walmart challenges — participants to accurately predict the sales of 111 potentially weather-sensitive products (like umbrellas, bread, and milk) around the time of major weather events at 45 of their retail locations.

How about BestBuy APIs for competitor analysis? :)

“APIs are powering business in ways we couldn’t have imagined not so long ago. Best Buy’s easy-to-use catalog of APIs gives users access to a wide range of data across the history of BestBuy.com, including product, store, category and more. Come on in and build a query, dig into our data and join the Best Buy API community.” — Meet the BestBuy APIs

Future direction of the project may concentrate on the prediction of customer ratings of E-Commerce Walmart’s products. This revolves around the idea that: “What are the attributes customers take into account when rating a product?. → Product attributes such as product type, short description, image, sales price, color, marketplace, shipTostore, and stock availability etc. can help us understand behavioral customer’s decision-making process. This sort of task involves text analysis and image feature extraction as inputs of the regression model.

Lastly, we can also look at Walmart store sales prediction. In particular, “one challenge of modeling retail data is the need to make decisions based on limited history. If Christmas comes but once a year, so does the chance to see how strategic decisions impacted the bottom line.” We have historical sales data for 45 Walmart stores located in different regions. selected holiday markdown events are included in the dataset. These markdowns are known to affect sales, but it is challenging to predict which departments are affected and the extent of the impact.

Citations

Korkrid Kyle Akepanidtaworn

Written by

Cloud Solution Architect (Data & AI) at Microsoft, Former Data Scientist at Accenture Applied Intelligence

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade