Fundamentals of Business Analytics With R

Let us get a brief on how to analyze the data by apply machine learning techniques & check the effectiveness in various business domains.

Akanaksha L

Published in

MindMajix

5 min readDec 5, 2019

Gearing Up for Predictive Modeling

Much of the predictive modeling involves the key concepts of statistics and machine learning, and this chapter will provide a brief tour of the core distinctions of these fields that are essential knowledge for a predictive modeler. In particular, we’ll emphasize the importance of knowing how to evaluate a model that is appropriate to the type of problem we are trying to solve.

Models

Models are at the heart of predictive analytics and for this reason, we’ll begin our journey by talking about models and what they look like. In simple terms, a model is a representation of a state, process, or system that we want to understand and reason about.

Models come in a multitude of different formats and flavors, and we will explore some of this diversity in this book. Models can be equations linking quantities that we can observe or measure; they can also be a set of rules.

Learning & Analysing from data

The models we will study have two important and defining characteristics.

The first of these is that we will not use mathematical reasoning or logical induction to produce a model from known facts, nor will we build models from technical specifications or business rules; instead, the field of predictive analytics builds models from data.

For example, if we want to build a model to predict annual rainfall in various parts of a country, we might have collected (or have the means to collect) data on rainfall at different locations, while measuring potential quantities of interest, such as the height above sea level, latitude, and longitude.

The second important characteristic of the problems for which we will build models is that during the process of building a model from some data to describe a particular phenomenon, we are bound to encounter some source of randomness.

We will refer to this as the stochastic or nondeterministic component of the model. It may be the case that the system itself that we are trying to model doesn’t have any inherent randomness in it, but it is the data that contains a random component.

Learn more about stochastic component here

The core components of a Model

So far we’ve established some central notions behind models and a common language to talk about data. In this section, we’ll look at what the core components of a statistical model are. The primary components are typical:

A set of equations with parameters that need to be tuned
Some data that are representative of a system or process that we are trying to model
A concept that describes the model’s goodness of fit
A method to update the parameters to improve the model’s goodness of fit

most models, such as neural networks, linear regression, and support vector machines have certain parameterized equations that describe them. Let’s look at a linear model attempting to predict the output, Y, from three input features, which we will call X1, X2, and X3:

This model has exactly one equation describing it and this equation provides the linear structure of the model. The equation is parameterized by four parameters, known as coefficients in this case, and they are the four β parameters. In the next chapter, we will see exactly what roles these play, but for this discussion, it is important to note that a linear model is an example of a parameterized model. The set of parameters is typically much smaller than the amount of data available.

Types of Models

With a broad idea of the basic components of a model, we are ready to explore some of the common distinctions that modelers use to categorize different models.

Supervised, unsupervised, semi-supervised, and reinforcement learning models

We’ve already looked at the iris data set, which consisted of four features and one output variable, namely the species variable. Having the output variable available for all the observations in the training data is the defining characteristic of the supervised learning setting, which represents the most frequent scenario encountered.

In a nutshell, the advantage of training a model under the supervised learning setting is that we have the correct answer that we should be predicting for the data points in our training data.

Using the availability of the value of the output variable as a way to discriminate between different models, we can also envisage a second scenario in which the output variable is not specified. This is known as the unsupervised learning setting.

Parametric and nonparametric models

In a previous section, we noted how most of the models we will encounter are parametric models, and we saw an example of a simple linear model. Parametric models have the characteristic that they tend to define a functional form.

This means that they reduce the problem of selecting between all possible functions for the target function to a particular family of functions that form a parameter set. Selecting the specific function that will define the model essentially involves selecting precise values for the parameters.

Regression and classification models

The distinction between regression and classification models has to do with the type of output we are trying to predict, and is generally relevant to supervised learning. Regression models try to predict a numerical or quantitative value, such as the stock market index, the amount of rainfall, or the cost of a project.

Classification models try to predict a value from a finite (though still possibly large) set of classes or categories. Examples of this include predicting the topic of a website, the next word that will be typed by a user, a person’s gender, or whether a patient has a particular disease given a series of symptoms

Costs and Benefits of Using R

The only cost of using R is the time spent learning it. The lack of a package or application marketplace in which developers can be rewarded for creating new packages hinders the professional mainstream programmer’s interest in R to the degree that several other platforms like iOS and Android and Salesforce offer better commercial opportunities to coding professionals.

R is available for free download.

R is one of the few analytical platforms that work on Mac OS.
Its results have been established in journals like the Journal of Statistical Software, in places such as LinkedIn and Google, and by Facebook’s analytical teams.
It has open source code for customization as per GPL and adequate intellectual protection for developers wanting to create commercial packages.
It also has a flexible option for enterprise users from commercial vendors like Revolution

Lets conclude the following article. hope you found the information useful & here are a few worthy links for more information about Business Analytics with R

This Article is originally Published in Mindmajix