ContextBase Predictive Analytics

John Akwei, ECMp ERMp Data Scientist

This document contains examples of the Predictive Analytics capabilities of ContextBase, http://contextbase.github.io.

Predictive Analytics Example 1: Linear Regression

Linear Regression allows for prediction of future occurrences derived from one explanatory variable, and one response variable.

revenue <- data.frame(revenue)
model <- lm(market_potential~price_index, revenue)

cat("The Intercept =", model$coefficients[1])
## The Intercept = 15.21788
test <- data.frame(price_index=4.57592)
result <- predict(model, test)

Example 1 — Linear Regression Conclusion:

cat("For a Price Index of ", as.character(test), ", the predicted Market Potential = ", round(result, 2), ".", sep="")
## For a Price Index of 4.57592, the predicted Market Potential = 13.03.

In conclusion to ContextBase Predictive Analytics Example 1, a direct correlation of Price Index to Market Potential was found, (see above graph). As a test of the Predictive Algorithm, a Price Index of 4.57592 was processed, and a Market Potential of 13.03 was predicted. The source R dataset shows this prediction to be accurate.

Predictive Analytics Example 2: Logistic Regression

Logistic Regression allows for prediction of a logical, (Yes or No), occurrence based on the effects of an explanatory variable on a response variable. For example, the probability of winning a congressional election vs campaign expenditures.

How does the amount of money spent on a campaign affect the probability that the candidate will win the election?

Source of Data Ranges: https://www.washingtonpost.com/news/the-fix/wp/2014/04/04/think-money-doesnt-matter-in-elections-this-chart-says-youre-wrong/

Expenditures <- c(1000000, 1100000, 1200000, 1300000, 1400000,
1500000, 1600000, 1700000, 1800000, 1900000,
2000000, 2100000, 2200000, 2300000, 2400000,
2000000, 2100000, 2200000, 2300000, 2400000)
ElectionResult <- c(0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1)
CampaignCosts <- data.frame(Expenditures, ElectionResult)

The logistic regression analysis gives the following output:

model <- glm(ElectionResult ~., family=binomial(link='logit'), data=CampaignCosts)
model$coefficients
##   (Intercept)  Expenditures 
## -7.615054e+00 4.098080e-06

The output indicates that campaign expenditures significantly affect the probability of winning the election. The output provides the coefficients for Intercept = -7.615054e+00, and Expenditures = 4.098080e-06. These coefficients are entered in the logistic regression equation to estimate the probability of winning the election:

Probability of winning election = 1/(1+exp(-(-7.615054e+00+4.098080e-06*CampaignExpenses)))

For a Candidate that has $1,600,000 in expenditures:

CampaignExpenses <- 1600000
ProbabilityOfWinningElection <- 1/(1+exp(-(-7.615054e+00+4.098080e-06*CampaignExpenses)))
cat("Probability of winning Election = 1/(1+exp(-(-7.615054e+00+4.098080e-06*",
CampaignExpenses, "))) = ", round(ProbabilityOfWinningElection, 2), ".", sep="")
## Probability of winning Election = 1/(1+exp(-(-7.615054e+00+4.098080e-06*1600000))) = 0.26.

For a Candidate that has $2,100,000 in expenditures:

CampaignExpenses <- 2100000
ProbabilityOfWinningElection <- 1/(1+exp(-(-7.615054e+00+4.098080e-06*CampaignExpenses)))
cat("Probability of winning Election = 1/(1+exp(-(-7.615054e+00+4.098080e-06*",
CampaignExpenses, "))) = ", round(ProbabilityOfWinningElection, 2), ".", sep="")
## Probability of winning Election = 1/(1+exp(-(-7.615054e+00+4.098080e-06*2100000))) = 0.73.

Example 2 — Logistic Regression Conclusion:

ElectionWinTable <- data.frame(column1=c(1100000, 1400000,
1700000, 1900000,
2300000),
column2=
c(round(1/(1+exp(-(-7.615054e+00+4.098080e-06*1100000))), 2),
round(1/(1+exp(-(-7.615054e+00+4.098080e-06*1400000))), 2),
round(1/(1+exp(-(-7.615054e+00+4.098080e-06*1700000))), 2),
round(1/(1+exp(-(-7.615054e+00+4.098080e-06*1900000))), 2),
round(1/(1+exp(-(-7.615054e+00+4.098080e-06*2300000))), 2)))
names(ElectionWinTable) <- c("Campaign Expenses", "Probability of Winning Election")

In conclusion to ContextBase Predictive Analytics Example 2, a direct correlation of Campaign Expenditures to Election Performance was verified. The above table displays corresponding probablities of winning an election to campaign expenses.

Predictive Analytics Example 3: Multiple Regression

Multiple Regression allows for the prediction of the future values of a response variable, based on values of multiple explanatory variables.

input <- data.frame(state.x77[,1:4])
colnames(input) <- c("Population", "Income", "Illiteracy", "Life_Exp")

# Create the relationship model.
model <- lm(Life_Exp~Population+Income+Illiteracy, data=input)

# Show the model.
print(model)
## 
## Call:
## lm(formula = Life_Exp ~ Population + Income + Illiteracy, data = input)
##
## Coefficients:
## (Intercept) Population Income Illiteracy
## 7.120e+01 -1.024e-05 2.477e-04 -1.179e+00
a <- coef(model)[1]
cat("The Multiple Regression Intercept = ", a, ".", sep="")
## The Multiple Regression Intercept = 71.2023.
XPopulation <- coef(model)[2]
XIncome <- coef(model)[3]
XIlliteracy <- coef(model)[4]

modelCoef <- data.frame(XPopulation, XIncome, XIlliteracy)
colnames(modelCoef) <- c("Population", "Income", "Illiteracy")
row.names(modelCoef) <- c("Coefficients")

Multiple Regression Conclusion:

popl <- 3100
Incm <- 5348
Illt <- 1.1
Y = a + popl * XPopulation + Incm * XIncome + Illt * XIlliteracy
cat("For a City where Population = ", popl, ", Income = ", Incm, ", and Illiteracy = ", Illt, ",
the predicted Life Expectancy is: ", round(Y, 2), ".", sep="")
## For a City where Population = 3100, Income = 5348, and Illiteracy = 1.1,
## the predicted Life Expectancy is: 71.2.

In conclusion to ContextBase Predictive Analytics Example 3, the multiple variables of “Population”, “Income”, and “Illiteracy” were used to determine the predicted “Life Expectancy” of an area corresponding to a USA State. For an area with a Population of 3100, a per capita Income Rate of 5348, and an Illiteracy Rate of 1.1, a Life Expectancy of 71.2 years was predicted.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.