A Simple ML-Framework of Cookie Jars

Shoaibkhanz
convergeML
Published in
5 min readFeb 11, 2019

In our every day lives we make important decisions for our well-being and it’s necessary that we make the right ones more often than wrong. Over a period of time, we learn from our mistakes and we avoid repeating them.

We are constantly approaching an expert state as we learn to solve particular kinds of problems.

In this post, we will read through and understand a framework that will help you grasp the idea of Machine Learning. It will also help you recognize the concepts as you experience them in the real world.

If you didn’t know Machines can also become experts if they are trained well, we will explore how they can become efficient at certain tasks and sometimes exceed human capabilities.

In the past, we would have never imagined driverless cars. Most of us always thought of flying cars but never driverless, our imagination failed us, developments in AI has enabled us to create such products and so the possibilities are limitless.

driver-less car(left) — flying car(right)

So, If you want to create something useful for the benefit of the society, you can use Machine Learning to optimise and make it even better.

Let’s dive into the Cookie Jar Framework and take a glimpse at 6 Jars.

1| Data

Data is essential to make any decision, machines use them too. But is more data always better? The answer is counter-intuitive. In the simplistic sense, the more data we have the better but we need to be careful and must try to understand how that data was produced. We thus avoid biases as we model a solution controlling for the factors.

2| Tasks

Imagine we are a data scientist at NETFLIX and our job is to build models that classify movies into action, romance, adventure, comedy etc, perhaps we will use the scenes in the movies and take cues from the title of the movies to classify them into their respective genre. i.e. we will perform a task of classification using some statistical technique.

So to summarise, even if we are from GAFA(google, amazon, facebook, apple) we need to strategically think and prioritise tasks or problems that we will invest our time to solve. It could be classifying movies or fake videos, or perhaps trying to predict sales of iPhones or a toothbrush.

3| Model

Let’s continue our job as a data scientist at NETFLIX. We definitely have a lot of data and we are sure that business’s priority is to predict viewers favorite movies and then recommend them similar ones.

How are we going to approach the task?

We may build a simple model or a complex one that looks at the distribution of movies that viewers have watched in the past. But most movies do not have just action or comedy it is a mix and perhaps some viewers like Tom Cruise more than any other actor or perhaps they prefer movies made in 1990s and 2000s and so this relationship can get complex very quickly.

Anyways since we build these models subconsciously i.e our likes and dislikes are rooted in our subconscious, machines can have a hard time predicting them, but we will see how machines can approximate them really well.

4| Loss

We will use our previous example of predicting favourite movies of viewers. In the 3rd jar we built a model but we couldn't decide if the simple model is better than a complex one or vice versa. To solve this particular conundrum we look at the errors that our model makes. If we make fewer errors than most models then we would consider that model to be a good one.

_________________ERRORS == LOSS_________________

Error is our loss and loss is our Errors, we mean the same thing when we call either of them. We are now 4 jars deep and so we are almost there to understand the ML trickery!

5| Learning Algorithm

As humans, we learn as we make mistakes and so when a model makes a mistake we want the model to update itself and avoid the errors. All algorithms do the same. They learn from mistakes and they optimise for the solution. Thus a learning algorithm consists of a loss function(error function) which it optimises, correcting its predictions as it learns more from data.

We could try various algorithms to predict favourite vs not so favourite movies for e.g. we could use logistic regression, classification trees, neural network etc. So this jar is filled with algorithms and techniques to solve a business problem and it is an important jar since without them finding a simple solution would be difficult and perhaps impossible for complex ones.

However, How do we know which algorithm is better for our task? We will discuss this intriguing question in the next Jar.

6| Evaluation

This is the final jar in the Cookie Jar framework. Once we have built the model and found a solution, we would want to test it in the real world, i.e Evaluate it, Examine it and repeat the process of the model building if our model performs poorly in the real world. There are many evaluation metrics that you might have heard of such as Accuracy, Precision, Recall , Rsquared etc..

I would suggest you read a dedicated blog of mine on the confusion matrix and it will familiarise you with the terminology.

Summary

We just went through the 6 Jars of Machine learning, the framework springs from the professors of OneFourthLabs and I think it's a great summary and a framework to help someone understand the process of Machine Learning. It’s pure engineering and science and not a magical trickery. I hope you guys liked it and if you did, please clap a little and enjoy!

--

--