Market Share Prediction with R

Using mlogit function to build up the choice model and predict product share

Aashiq Reza
Data For Tomorrow
5 min readJul 1, 2020

--

Photo by Austin Distel on Unsplash

Introduction:

Choice Modelling is a scientific methodology used by academics, economists, and policy-makers to measure consumer preferences. It is regarded as the most scientifically robust method to investigate and understand how choices are made. (surveyengine)

Multinomial Logistic Regression is used to predict a choice from a set of alternatives based on the features of each alternative.

Why choice modeling:

  • To understand the public demand for products with different features
  • To determine a reasonable market price
  • To improve product-line panning
  • Finding out more about customer favorite products

Market Share?

Market share is the percentage of a market(defined in terms of either units or revenue) accounted for by a specific entity.(wikipedia)

Why Market Share?

  • To judge how effective a product in the market
  • Enables to judge market growth
  • Helps to analyze market trends

R Code For Choice Modelling and Market Share Prediction

Getting and understanding the data

This dataset was collected from kaggle. Firstly load all the libraries required and the data and take a look at the data.

library(tidyverse)
library(mlogit)
library(ggpubr)
library(CGPfunctions)
data <- read.csv("sportscar.csv")
glimpse(data)

Output:

taking a look on data

About the data:

  • rest_id: unique identifier of the respondents
  • qus: unique identifier of the questions asked
  • alt: identifies each possible car can be chosen by the respondent
  • segment: divided into three categories — basic, fun, and racing
  • seat: contains the number of seats available(2,4,5)
  • trans: specifies manual and automatic transmission type
  • convert: responses in 1 and 0 respectively indicating with convertible roofs and standard roofs
  • price: in thousands($) with three possibles choices(30, 35, 30)
  • choice: 0 — not chosen, 1 — chosen

Checking NA’s

summary(is.na(data))

Output:

checking NA’s

Checking NA’s

The output confirms there are 0 NA’s in the data.

Summarizing choice with different variables

xtabs(choice ~ price, data = data)
xtabs(choice ~ trans, data = data)
xtabs(choice ~ seat, data = data)
xtabs(choice ~ convert, data = data)
xtabs(choice ~ segment, data = data)

Output

choice summary

Data Visualization

PlotXTabs2(data, y = choice, x = price, title = "Choice made by customers  according to price"
, plottype = "side")
PlotXTabs2(data, y = choice, x = segment, title = "Choice made by customers according to segment"
, plottype = "side")
PlotXTabs2(data, y = choice, x = seat, title = "Choice made by customers according to seat"
, plottype = "side")
PlotXTabs2(data, y = choice, x = trans, title = "Choice made by customers according to trans"
, plottype = "side")
PlotXTabs2(data, y = choice, x = convert, title = "Choice made by customers according to convertible roof"
, plottype = "side")
Visualizing choice summary

Comments:

  • Buyers are likely to buy low-cost cars
  • Basic category cars are more popular in the segment
  • 5 seated cars are more preferable by the buyers
  • Auto transmission is more preferable
  • Convertible roofs are more popular in the market

Converting The data to mlogit data

Convert “choice” into logical and “price”, “seat”, “segment”, “convert”, “trans” into factors.

data$seat <- as.factor(data$seat)
data$segment <- as.factor(data$segment)
data$trans <- as.factor(data$trans)
data$convert <- as.factor(data$convert)
data$price <- as.factor(data$price)
data$choice <- as.logical(data$choice)

Change the data frame to mlogit data

Use mlogit.data to change the data frame into mlogit data. Because mlogit function works only on mlogit data type. Here, choice set equal to “choice” and alt.var set equal to “alt”.

mlog <- mlogit.data(data, shape = "long", choice = "choice", 
alt.var = "alt")
str(mlog)
str(mlogit)

Fitting mlogit model to the mlogit data

m <- mlogit(choice ~  0 + seat + trans + convert + price + segment, 
data = mlog)
summary(m)
Summary from mlogit fitted model

Comments:

  • The negative coefficient of price indicates people are less interested to buy expensive cars.
  • 5 seated cars are most popular in the market.

Share Prediction

products <- select(data, -c(resp_id, choice))
x <- predict(m, products) # x is 2000x3 matrix
share <- t(x) # transpose x
shares <- cbind(share, products)
head(shares)
Market Shares

Comments:

From the result, we can say that the third alternative from the first question among the three alternatives has the highest market share.

Visualizing market shares

xtabs <- xtabs(share ~ price + seat + trans + segment + convert,                 data = shares)
plot(xtabs, main = "Average Market Shares of each type of cars")
Market Shares

This figure shows the average market share of each category.

Conclusion:

  • We built a choice model using mlogit function so that it can be used to predict share.
  • Market share was predicted by the predicted function.
  • Getting an idea about market shares is very important for production.
  • A clear idea of market trends and customer choices is very important for profit maximization which can be done from this analysis.

Other Stories:
1. Time Series Forecasting in R

2. An Approach To Make Comparison of ARIMA and NNAR models For Forecasting Price of Commodities.

3. Churn prediction with telecom data in R

--

--

Aashiq Reza
Data For Tomorrow

Data Science, ML, Image processing. Good hands in R, MATLAB, Python, SPSS, C/Cpp. Always free to connect : https://www.linkedin.com/in/aashiq-reza-2030b516a/