Stories by alwaysLearning on Medium

Linear Regression

alwaysLearning — Fri, 23 Feb 2018 05:28:47 GMT

ELI5

In statistical terms, regression analysis is an experiment to see if the occurrence of one thing could be related to another (ELI5 reference)

What is Linear regression then (ordinary least squares), in terms of what we use for ML: Linear regression is a method whereby calculating the distance of a lot of points from the mean, you can see how closely it fits a straight line (ELI5 reference)

Detailed Learning Reference

Regression is such an important topic to start building an intuition for ML that one should really spend time to get depth in this topic. I personally learned from 2 sources the first is the UW Applied ML course on Coursera, Prof. Emily Fox is simply excellent in explaining the intuition behind regression and dives deeper into individual aspects of linear regression including forming the cost function, trying to minimize it using the closed form solution, how this is not always feasible and how it can be approximated by gradient descent. Here is the link.

My two cents

Like we discussed in a previous post, an ML model is an assumption we make about how the world works or rather how the underlying system in the problem we are trying to solve works. Now assuming we have an input value x and the the output value y which we are trying to predict, y = f(x) + ϵ where f(x) is the model and, ϵ is the irreducible error. In the case of regression, specifically linear regression f(x) = mx + b + ϵ where m is the slope of the line and b is the bias term. Irreducible error is exactly what it sounds like, it cannot be completely eliminated by building better and better models. As a matter of fact if you fit a model which has 0 error it means you have overfit to the training data in hand and therefore the model will not be able to generalize well to new, unseen data. ϵ has some important properties, E[all ϵ across data] = 0 i.e. the error is equally likely to be +ve or negative (it has a zero mean) and therefore f(x) is equally likely to be above or below the actual y. The values for m and b which are also called coefficients are inferred over time by optimizing the cost function. here as we can see there is just one input feature i.e. x if we use multiple input features like f(x) = m1x1+m2x2+…+mnxn + b + ϵ it is called multiple linear regression. We can also apply functions to the input feature to modify to make the regression non-linear.

So why not always use linear regression?

well because it is wrong a lot of the time but it is still useful. As you can imagine not all relationships can be explained by fitting a straight line through the data. Well in that case we can apply a transformation to the input so that it can be a polynomial curve. However there is an issue with that too. the higher the degree of the polynomial, higher are the chances for overfitting. This also applies with higher number of features, especially in the case where the number of samples we have to train the model ‘n’ is not much larger than the number of feature or dimensions ‘d’. When this happens,the model can possibly learn each and every combination of the input and the corresponding output to provide a highly overfit model. This is a big problem! especially in the data age we live in, it’s very possible to have extremely high dimensional data. There is fortunately a way to handle this, this method is called regularization. It is one of the most powerful tools of machine learning which we will go over in the next post.

Code Snippet

SKlearn is an absolutely essential resource for ML problem solving be sure to learn and leverage it well. Below is a snippet which implements simple linear regression using SKLearn.

https://medium.com/media/1e145e0c97976d7ca43506960103b8f7/href

Linear Regression was originally published in Machine Learning - hands On on Medium, where people are continuing the conversation by highlighting and responding to this story.

Format of Upcoming posts

alwaysLearning — Thu, 01 Feb 2018 18:57:47 GMT

This is quick post to get your acquainted with the flow of problem solving I am going to be following in upcoming posts.

Algorithm study format

For this part, a big shout out to Siraj Raval and his video on learning advanced concepts which helped me frame this story format.

ELI 5

With every post I will start with the best ELI5 comment to explain the concept discussed. What is ELI5? it’s basically “Explain to me like I am 5” its a subreddit where you can find simple explanations for various concepts.

A well explained tutorial

Links to tutorials or videos I found helpful to understand this concept.

Scientific Paper

I will include this section for more algorithm related posts. Would recommend looking at this video about how to read scientific papers if you are new to this. Key takeaway: you’ll need a lot of coffee.

Code Gist

I’ll include a code snippet of how to use various ML libraries like SKlearn to use the algorithm to produce the desired output. what’s a gist? check this out.

Link to solved problem

This will be a link to Kaggle or Github kernel/repo where I have solved a relevant ML task using a publicly available dataset.

Problem solving Format

I have used a number of sources that I have leveraged for this, taking the best practices from each in order to come up with my own custom flow. Each problem is different in it’s own way so the steps may vary but the steps below are a good general guideline. I would highly recommend you go check the links below, they are very useful in coming up with your own flow:

The ML Mastery post for working through data science problems — https://machinelearningmastery.com/process-for-working-through-machine-learning-problems/ OR https://machinelearningmastery.com/how-to-work-through-a-problem-like-a-data-scientist/
Elite data science’s 7 day crash course to solving DS problems — https://elitedatascience.com/
KDnugget’s article — https://www.kdnuggets.com/2016/03/data-science-process-rediscovered.html
This is an excellent source with an actual solved example, highly recommend going through this code once you have read the theoretical example above — https://github.com/rhiever/Data-Analysis-and-Machine-Learning-Projects/blob/master/example-data-science-notebook/Example%20Machine%20Learning%20Notebook.ipynb

To put my own concise version, this is what I generally do:

EDA =>Data Cleaning=>Feature Engineering=>Model Selection=>Model Training=>Model Validation(& hyper-parameter tuning)=>Model Testing=>REPEAT!

As I mentioned before, I am using this publication as a way to showcase what I have learnt so far and get feedback. I would highly recommend you start doing this as well, build your own portfolio start doing rather than just reading. Your portfolio doesn’t need to be public, although I personally found making posts public keeps me inspired to keep doing this week over week & set deadlines.

Format of Upcoming posts was originally published in Machine Learning - hands On on Medium, where people are continuing the conversation by highlighting and responding to this story.

What a lot of ML models look like

alwaysLearning — Mon, 29 Jan 2018 01:14:16 GMT

The basic idea behind an ML algorithm is something that looks like this:

answers = f ( inputs ) + noise

Where f() is the function which takes the input values and computes the output. Of course there are a number of things that need to happen in order for f() to function like a ML algorithm. for instance we need to make sure that the inputs are in a form that f() can interpret correctly. For instance, if you have a dataset to predict the mpg for a car and one of the inputs available is the make the car which is a string like “ford”, by itself it cannot be understood by the model. The required behavior of f() to transform the inputs to answers needs to be identified. We need to identify how to evaluate how well f() is doing in its function of predicting the answers. The answering of all these questions is what building of an ML model entails.

Let’s start with a brief intro into what a lot of “supervised” learning models look like (more on what supervised means in a bit).

Notation & Terminology

A few notations to keep in mind before you get to the figure below. X is the inputs we have from the data we are working with, it is also called the independent variable(s) or features. So for instance we want to predict house prices, X can be the square footage of the house or the number of bedrooms it has, etc. the answer in this example is the house prices we are trying to predict, the answer is denoted by y it is also called the target variable or the dependent variable. So, why is X capitalized and y is not? good question, because X would be a matrix in this example whereas y is a single columned vector, more on this soon, promise. y-hat are the answers produced by running our ML model in the dotted block below. Now that we have some basic notation out of the way let’s get into what the diagram below does.

As we can see above, the ML model takes processed data it gives it to some function f() which is basically the ML model block in the figure above and that block produces outputs for this iteration. The really cool thing here is that the behavior of this model is learnt using the y-hat produced evaluated against the y which was observed, in case of supervised learning. The evaluation produces the error i.e. the difference between the observed value of y i.e. the calculated values y-hat.

Kind of ML Models

I have used the word supervised learning a bunch of times up above so let me give you a quick tour of kind of ML models or kinds of learning tasks. Most ML models can classified into the following type of problems

Supervised learning

This is a kind of model or learning task where you have input data and the corresponding observed answers for that data. This model takes the data inputs and its outputs and learns to make generalizations. These generalizations can then produce answers for any new data. So for instance you can have a bunch of emails and a label for each indicating if the email is spam or not. A supervised learning model can be fit to this and can after training predict if any new email it comes across is spam or not. Regression and classification fall into this kind of learning tasks

2. Unsupervised learning

This is the kind of modeling task where the input data is not labeled. It does not have answers for observed values. In a lot of cases in this kind of a learning task you may not even know what answer you want, it may just be to get a the data’s underlying structure i.e perform automatic analysis. Clustering is this kind of a task. An example is you have a bunch of research papers and you want to find which ones are similar but these “types” which define similarity of documents may not be predefined. In this kind of a model, the line for y which goes into the evaluation box in the figure above would not exist. the evaluation is done using just the input data and the current version of the output.

3. Semi Supervised learning

This task is quite similar to supervised learning except all the answers or observed values for the target variable are not available, some are some aren’t. Some of these target variables are basically inferred from the observed data.

Hope this has been helpful so far. Starting next week I will be going into supervised learning, specifically regression. I will also be publishing one extra post in the middle of this week going over ML problem solving methodology I use and the format of the posts to follow.

What a lot of ML models look like was originally published in Machine Learning - hands On on Medium, where people are continuing the conversation by highlighting and responding to this story.

Welcome

alwaysLearning — Mon, 22 Jan 2018 00:36:39 GMT

Hello and welcome to Machine Learning — hands on. This is a place where I will be sharing my journey of learning and developing skills in the field by sharing the inner working & logic of ML algorithms in a less opaque and a more applicable manner, as well as applying them to actual datasets. I am by no means and expert, so feedback is always welcome & appreciated. This is a part portfolio part tutorial kind of a publication and I hope I can get some feedback from folks already in the field as well as help other learners in their path to ML.

I will be starting with explaining ML algorithms in a particular problem type e.g. Classification, Regression, Clustering etc. perhaps 3–4 algorithms per post. This will be followed by an application post where I will apply the algorithm to a data set. I will also be providing references from which I learned these techniques. The code shared in subsequent posts will be available on github.

To start with let me list the key learning resources I have leveraged so far. I have taken the 4 part University of Washington ML MooC[1] which covers ML application as a black box, followed by a course on Regression, Classification and Clustering. I worked through every programming assignment which actually implements the various ML algorithms. I highly recommend it! I also follow the ML Mastery blog[2], again highly recommended, it has nudged me in this direction by providing a very well defined path to self learning and constant practice. I also leverage kaggle[3] which is a really good resource for datasets and provides access a huge community of data scientists, ML engineers and fellow learners.

With this, let’s get started. I expect to publish a post every other Sunday, so look out for new posts then!

Warm-up

What is machine learning. There are a number of definitions out there, the most famous one being:

“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.”

— Tom E. Mitchell

To put it in a context which is less textbook definition-y, and more related to modern applications in say an app: machine learning is a method by which you don’t need to hard code the program or code out each and every instance to return certain answers based on certain situations. Because there can be infinitely large possible actions from an infinitely large number trigger situations which cannot be programmed.

Say for instance you wanted to make an Amazon Alexa clone, a smart assistant who can answer questions based on your verbal inquiries. Everyone has a slightly different way of phrasing what they want, for instance asking for the news. I can say “Alexa what’s in the news?” where as my brother can say “Alexa what’s the latest news for today?” both there are completely different strings but they’re asking the same question. For the underlying program of the Amazon Alexa to take this pattern of strings and to generalize that the user is asking for the news is what Machine learning does. It takes some data and generalizes for the future. This kind of a way to “learn” is also called induction learning[4]. What induction learning does is it takes some instances of the data (getting all the possible data for a particular problem is not possible, but we are getting ahead of ourselves here) and to generalize that like, given this kind of data x I will make a decision y. It’s not unlink a child learning something. For instance say you have a teapot, the child touches is once is quickly struck by how hot it is. The next time the child will not touch the teapot simply because from it’s last experience it knows the teapot could be hot. This may not be the case, the teapot can be at room temperature but the child has made a generalization that the teapot would be hot. This error in judgement because of prior experience is the child’s “bias”. The same phenomenon occurs for a ML algorithm, the generalizations made the ML algorithm based on the data it learnt from is errored, it has to be, it’s a generalization. If it answers every trigger situation x with the exact answer y, it is “over-fitting” it will not be able to answer correctly is a slightly different x shows up in new data for which solution is not y. There is another aspect to the ML algorithm and that is its “variance”. This is the measure of how much the answers vary depending on different values of x. In other words how much the behaviour of the baby is different based on different teapots it sees. One of the key challenges in “fitting” a ML model is this whole aspect of “bias-variance tradeoff”. It is really well explained by Scott Fortmann-Roe in his post regarding this subject[5] and the bullseye illustration from his post is very useful to get a feel for this concept.

Hope this is good enough intuition to get started in understanding what ML is. Why we want to use it and what are some of the things to keep in mind. See you next week, I will be going into the topic of the “blueprint of ML algorithms”.

References:

University of Washington, ML Specialization: https://www.coursera.org/specializations/machine-learning
Machine Learning Mastery blog: https://machinelearningmastery.com/
Kaggle profile: https://www.kaggle.com/rnmehta5
data, learning and modeling, Jason Brownlee https://machinelearningmastery.com/data-learning-and-modeling/
Understanding the bias-variance tradeoff, Scott Fortmann-Roe http://scott.fortmann-roe.com/docs/BiasVariance.html

Welcome was originally published in Machine Learning - hands On on Medium, where people are continuing the conversation by highlighting and responding to this story.