Market Share Prediction with R

Using mlogit function to build up the choice model and predict product share

Aashiq Reza

Published in

Data For Tomorrow

5 min readJul 1, 2020

Introduction:

Choice Modelling is a scientific methodology used by academics, economists, and policy-makers to measure consumer preferences. It is regarded as the most scientifically robust method to investigate and understand how choices are made. (surveyengine)

Multinomial Logistic Regression is used to predict a choice from a set of alternatives based on the features of each alternative.

Why choice modeling:

To understand the public demand for products with different features
To determine a reasonable market price
To improve product-line panning
Finding out more about customer favorite products

Market Share?

Market share is the percentage of a market(defined in terms of either units or revenue) accounted for by a specific entity.(wikipedia)

Why Market Share?

To judge how effective a product in the market
Enables to judge market growth
Helps to analyze market trends

R Code For Choice Modelling and Market Share Prediction

Getting and understanding the data

This dataset was collected from kaggle. Firstly load all the libraries required and the data and take a look at the data.

library(tidyverse)
library(mlogit)
library(ggpubr)
library(CGPfunctions)
data <- read.csv("sportscar.csv")
glimpse(data)

Output:

About the data:

rest_id: unique identifier of the respondents
qus: unique identifier of the questions asked
alt: identifies each possible car can be chosen by the respondent
segment: divided into three categories — basic, fun, and racing
seat: contains the number of seats available(2,4,5)
trans: specifies manual and automatic transmission type
convert: responses in 1 and 0 respectively indicating with convertible roofs and standard roofs
price: in thousands($) with three possibles choices(30, 35, 30)
choice: 0 — not chosen, 1 — chosen

Checking NA’s

summary(is.na(data))

Output:

checking NA’s

Checking NA’s

The output confirms there are 0 NA’s in the data.

Summarizing choice with different variables

xtabs(choice ~ price, data = data)
xtabs(choice ~ trans, data = data)
xtabs(choice ~ seat, data = data)
xtabs(choice ~ convert, data = data)
xtabs(choice ~ segment, data = data)

Output

Data Visualization

PlotXTabs2(data, y = choice, x = price, title = "Choice made by customers  according to price"
              , plottype = "side")PlotXTabs2(data, y = choice, x = segment, title = "Choice made by customers  according to segment"
           , plottype = "side")PlotXTabs2(data, y = choice, x = seat, title = "Choice made by customers  according to seat"
           , plottype = "side")PlotXTabs2(data, y = choice, x = trans, title = "Choice made by customers  according to trans"
           , plottype = "side")PlotXTabs2(data, y = choice, x = convert, title = "Choice made by customers  according to convertible roof"
           , plottype = "side")

Comments:

Buyers are likely to buy low-cost cars
Basic category cars are more popular in the segment
5 seated cars are more preferable by the buyers
Auto transmission is more preferable
Convertible roofs are more popular in the market

Converting The data to mlogit data

Convert “choice” into logical and “price”, “seat”, “segment”, “convert”, “trans” into factors.

data$seat <- as.factor(data$seat)
data$segment <- as.factor(data$segment)
data$trans <- as.factor(data$trans)
data$convert <- as.factor(data$convert)
data$price <- as.factor(data$price)
data$choice <- as.logical(data$choice)

Change the data frame to mlogit data

Use mlogit.data to change the data frame into mlogit data. Because mlogit function works only on mlogit data type. Here, choice set equal to “choice” and alt.var set equal to “alt”.

mlog <- mlogit.data(data, shape = "long", choice = "choice", 
        alt.var = "alt") 
str(mlog)

str(mlogit)

Fitting mlogit model to the mlogit data

m <- mlogit(choice ~  0 + seat + trans + convert + price + segment, 
             data = mlog)
summary(m)

Comments:

The negative coefficient of price indicates people are less interested to buy expensive cars.
5 seated cars are most popular in the market.

Share Prediction

products <- select(data, -c(resp_id, choice))
x <- predict(m, products) # x is 2000x3 matrix
share <- t(x) # transpose x
shares <- cbind(share, products)
head(shares)

Comments:

From the result, we can say that the third alternative from the first question among the three alternatives has the highest market share.

Visualizing market shares

xtabs <- xtabs(share ~ price + seat + trans + segment + convert,                 data = shares)
plot(xtabs, main = "Average Market Shares of each type of cars")

This figure shows the average market share of each category.

Conclusion:

We built a choice model using mlogit function so that it can be used to predict share.
Market share was predicted by the predicted function.
Getting an idea about market shares is very important for production.
A clear idea of market trends and customer choices is very important for profit maximization which can be done from this analysis.

Other Stories:
1. Time Series Forecasting in R

2. An Approach To Make Comparison of ARIMA and NNAR models For Forecasting Price of Commodities.

3. Churn prediction with telecom data in R