Machine Learning 101 for Startups — Part 1

Published in

Kontiki AI

3 min readJan 17, 2018

This is the first post in a series of posts on ML techniques for startups and business to help in decision making. The target audience for this post is business folks looking to understand how they can leverage Machine Learning in their business or product.

How confusing it becomes when the output is even more complex than the problem statement? To take you out of this loop and help in your decisions we introduce you to the concept of Logistic regression. For instance:
Question: Do body weight, fat consumption, and age have an impact on the probability of having a heart attack?
The answer predicted by the algorithm is a simple yes or no, thus resulting in quick conclusions.

The problem statement — How to classify new data based on existing data.

Logistic regression is generally used when the dependent variable is binary or Dichotomous.

That is the ‘dependent variable’ can take only 2 possible values such as:
“Yes/No”, “Default/No Default”, “Living/Dead”, “Responder/Non Responder”, true/false etc, while the ‘independent factors’ or variables can be both categorical or numerical.

Logistic regression is a combination of regression and logistic function which predicts the probability of occurrence of an event. So, if the probability is less than 0.5, then the predicted variable is 0, and if the probability is more than 0.5, then the predicted variable is 1.

Even though logistic regression is frequently used for binary variables (2 classes), it can also be used to handle categorical dependent variables having more than two classes, which is termed as Multinomial Logistic Regression.

A simple example of Multinomial Logistic Regression.
Here we are considering the famous Iris flower data set. The dataset, introduced by the British statistician and biologist Ronald Fisher, consist of various flowers and their properties like the length and width of the sepals and petals, in centimetres. The data set comes preloaded in the sklearn library.

In the above code, we see that the logistic classifier accuracy is 86%, which proves to be pretty good for such a minute dataset.

The algorithm is advised when you want to follow a probabilistic approach in your product model. This method proves to be useful when in future you want to append and immediately incorporate more training data into your model. The algorithm is quick to train and classify unknown records.

It is important to note that in a situation where the training data is sparse and high dimensional, the logistic model may overfit the training data.
Therefore, for stable and meaningful outcomes, you need to provide Logistic regression algorithms with a huge dataset.

Like the algorithm above, we will come up with more Machine Learning tools, that will help you in understanding different scenarios and producing robust outputs.

At Kontikilabs, we help our customers build such custom Machine Learning and deep learning tools, with a focus on the data available, the problems to be solved and the end user objectives. You can connect with us on twitter or on email: hi[at]kontikilabs[dot]com

Machine Learning 101 for Startups — Part 1

The problem statement — How to classify new data based on existing data.

Written by Tanya Thakur