Machine Learning for Everyone

Why this blog?

Machine Learning is becoming an essential technology for our day to day life. Everyone, whether in a technical role or in a business role should understand what it is, how it works, and having an intuition for how Machine Learning can help with your area of work is becoming an important skill.

There is no dearth of online courses for machine learning. However, these courses pretty quickly throw you at the deep end of Machine Learning with concepts such as matrix manipulations, gradient descent optimization, embeddings, etc. People, whose daily job is not to implement Machine Learning algorithms, quickly become disillusioned and lose interest. Our goal with this blog is to offer a friendly overview of machine learning, providing our readers with the necessary background in a simple to understand language.

What gives us the authority to write about the business of Machine Learning? We are product managers, marketers and technical implementers with experience working on implementing Machine Learning solutions at organizations such as Amazon, Expedia, Walmart, Boeing, Visa, Pfizer, etc. We have seen how effective teams with diverse backgrounds are put together to tackle some of the huge problems with Machine Learning and other innovative technologies.

With this blog, we will cover a variety of areas including topics such as:

  1. What constitutes a Machine Learning project?
  2. When to consider Machine Learning for your projects and more importantly when not to use Machines Learning?
  3. Demystifying various Machine Learning terminologies such as supervised learning, unsupervised learning, distant supervised learning, deep learning and many more.
  4. How to build a successful Machine Learning team and what are the various roles you need to hire for?
  5. As a stakeholder in the project, what questions you need to ask before starting a Machine Learning project?
  6. Various Machine Learning algorithms, and their suitability for a given problem.
  7. The intersection of Machine Learning with other emerging technologies such as Virtual Reality, Blockchain etc.
  8. and many more…

This is just a sample list of topics and as we go along and hear your feedback and suggestions, we will include many more topics.

For this post, let’s start with what Machine Learning is and some sample problem areas where machine learning can help. Most of you may have already experienced many products and solutions, which use Machine Learning without realizing that what you are interacting with is built using Machine Learning.

What is Machine Learning?

What is machine learning anyway… what is machine “learning” and how is it learning?

Machine learning is a set of generic algorithms that teach computers what to do instead of telling them what to do. These algorithms learn from the data they are given and can tell you something about that data without having programmers to actually write any custom code.

Ok, enough with the definition. If you have never written code or dealt with algorithms, you may already be feeling lost so let’s take an example. Let’s say you are a real estate broker and you have built a good intuitive sense from years of experience about how to price a home just by looking at it. Now, your business is growing and you need to price more homes than you have time for. So, you turn to Machine Learning.

You start by collecting a bunch of historical data about the houses sold in your area. You get the area of the house, # of bedrooms, the year the home was built, neighborhood etc. and also the price for which the house was sold for. You collect something like this:

We call this the training data. You feed this data to a Machine Learning algorithm to train that algorithm. After you feed a good amount of data, this algorithm starts to get good enough at predicting the price of a new home. The Machine Learning algorithm works backward to automatically figure out the pricing logic from the given training data to price a new home. Now, with this trained algorithm, all you need to do is feed it the information about a new home: the size, # of bedrooms, year built and the neighborhood and it can predict the price.

For solving this same problem in a traditional programming way, without using Machine Learning algorithms, someone would be required to derive explicit rules that govern the price of a home and then they would be required to program those rules into the source code. The task of deriving these rules could quickly become unwieldy as the diversity of homes may be very large and the sale price may not have a discernable pattern. Machine learning algorithm, on the other hand, finds implicit rules that have statistical validity when predicting the sale price. They are called Machine Learning algorithms because they “teach” a computer to solve a problem by examples.

Types of problems solved by Machine Learning

Machine learning problems can be grouped into common types. The following six groups cover most of the problems we refer to when we are using Machine Learning:

  1. Classification: Classification problems involve figuring out what kind of a thing something is. In these problems, data is labeled, meaning it is assigned a class or type. Identifying emails that are spam and those that are not, or labeling transactions as fraudulent or authorized are some examples of classification problems.
  2. Regression: In regression problems, you are trying to predict a numerical value of a thing. Data is labeled with a real value. In our earlier example from real estate, trying to predict the sale price of a home is a regression problem.
  3. Recommendation: With recommendation algorithms, you suggest users the thing they will be most interested in. You apply recommender systems in scenarios where many users interact with many items and your recommendation systems can predict what other users will like. You see these recommender systems in practice on retail sites such as Amazon or media streaming services such as Netflix or Spotify.
  4. Ranking: With ranking problems, you help users find the most relevant thing from a large set of possibilities. You see this applied in many areas such as search engines ranking by Google, Bing, etc. or for ranking the most suitable advertisements to show to the users.
  5. Clustering: With clustering problems, you divide the given data into groups based on similarity and other measures of natural structure in the data. For example grouping the pictures by a type or person that you see in applications like Apple Photos or Google Photos.
  6. Anomaly: With anomaly, you are trying to identify unusual patterns and uncommon things that do not conform to an expected behavior, called outliers. It has many applications such as identifying patterns in network traffic to detect hackers or detecting fraud in credit card transactions.

Hope this overview was helpful to you. We would love to hear your feedback and your ideas about topics that you want us to write about. In the next blog post, we will cover the different parts of a Machine Learning project from data preparation to model training to evaluation and deployment of the model.