Machine Learning: How to get started

I attended Microsoft build last month where great content was presented about Machine Learning. It provided better insight on how to get started with it and what we need to plan while we design our IoT solutions.

First you need data

They presented great demos of the suite of Microsoft products that makes it easy to get started… as long as you have the right data. With good data, and lots of it, a medium algorithm will do ok. So before one thinks of hiring a data scientists, there are a few things that can be done at the system design level to get things going.

What is ML?

A presenter had a good summary of what Machine Learning is.

Machine Learning is about learning from examples. Given a data set, you develop a model, that you then apply in real time to new data points to predict their outcome.

The falling wine glass

Concretely, Machine Learning can be used to replace logic based algorithms. Instead of using domain expertise to develop complex algorithms, it is possible to leverage data to gain insight from it.

One example that helped understand this is a wine glass falling, and predicting if it will break. Anyone can predict that in an instant, so should be easy to develop an algorithm for it right:

If falling from greater than 6 feet, then it will break
mmmm or is that 3 feet? Well it will not break from 3 feet if it is on grass
If falling from greater than 3 feet, and on concrete then it will break
mmmm what if it is a plastic glass..
If falling from greater than 3 feet, and on concrete, and made of glass, then it will break
nmmm… if it is full, will it break more easily? You can already see that by the time we are done, we would have a complex conditional logic.

How it works

Machine Learning (ML) isn’t magic, what it does is finding relationship between sensor readings (columns), and past outcomes (features). For ML to work you need relevant, complete, connected and accurate data.

  • If you don’t know how high a glass is falling from, you can’t know if it will break
  • If you don’t know if it’s made of glass or plastic, you will not predict it accurately all the time either
  • Knowing if it full or empty can be useful, knowing if it is white or red wine is not relevant, it will just add noise and decrease accuracy
  • You need enough data, covering all cases, if all you have is drop from 6’ on concrete, what you can predict is limited

Once you get started, to get increased precision you can: increase the data set, add more columns, or change the algorithm. Changing the algorithm is where a data scientist becomes useful. From the demos, albeit canned, it appears that getting 80–85% accuracy is not hard to achieve by a developer using Azure ML if you have a good dataset.

What can you do with it?

For IoT the main applications of ML are:

  1. Predictive Maintenance
  2. Anomaly detection
  3. Future reading prediction

Predictive maintenance
ML does fit well in IoT for predictive maintenance. If you know pass sensor reading, and how something broke, you can predict when another instance of that machine could break.

The trick is you need to have complete and connected data set. Up front planning of which data you want to collect can go a long way. You need to capture all the data that could affect the outcome, think about: vibration, heat, frequency of use, motor speed, … a lot of that data cannot be obtained after the fact, and you can’t just do a software update to start gathering it either. This is where some upfront planning can ensure you include the right sensors. Sensors are unlikely to be the biggest line item in your Bill of Materials, adding a couple more sensors can unleash a lot of value out of your IoT solution.

Then the second part will be to collect the outcomes. This part is unlikely to be coming from the same source, so you need to plan about how you will gather it from user input and cross reference it. You might also want to collect other events, as the maintenance and inspections that are being performed.

Anomaly detection
If you have past readings, event without knowing the outcome or having a complete column set, you can still do anomaly detections. You will not be able to tell what is the cause of the issue, but you can get a trigger that there is something unusual that might need to be looked after.

Future reading prediction

This part is a bit different than predictive maintenance, in a sense that it does not rely on human input to gather data about the feature that you want to predict.

In most cases, the readings can be affected by external factors, some of which can be retrieved after the fact, others that you might need to capture. For example, weather can affect reading of sensors, if your device is tracking something like humidity of a farm field. That information can be easily obtained from external sources when you will need it, so no need to gather it yourself, but you will need to know where the devices are. For other cases, you might not be able to obtain the data after the fact, so think about what can affect the reading and gather it yourself if it is easier than seeking it after the fact.

Getting started

Without having data with the outcome of the sensor reading, you can’t get started with Machine Learning.

One IoT concept that we are tinkering with is meant to increase awareness of water consumption, and reduce its use. The primary use case being to reduce the length of showers as it is a significant and addressable water usage.

The concept is to have a flow meter at the main entry line of an house, and from the flow detect what the water event is: Toilet flush, shower, hand washing, … An initial idea we had was to leverage ML for it.

To get started, we would have to use a logic based algorithm while we are building the data set. To make the data more accurate, we would then need human input to confirm what the water events were before ML can be more reliable than the logic based algorithm. We would also need complementary data about the setup of the house as whether the pipe is made of plastic or copper. And think about other factors that could affect the flow reading, some things to think about:

  • Is the type of water heather impacting water flow? Seems a tank based could reduce water flow intake
  • Is having multiple showers impacting flow? Maybe the shower heads could be different enough to impact water flow reading.

While discussing this data gathering problem with our UX team, our assumption is that users don’t expect predictions from machines to be perfect from the start, they do understand that the machine has to learn. If there is a easy and frictionless way for the user to correct the outcome, than they should take the time to do it, allowing to gather a better dataset, that will as a first step allow to replace the logic algorithm with ML, then continue to allow the ML algorithm to learn and improve over time.

What does it mean for our IoT solutions?

My takeaway from this that will impact the IoT Solutions I will design is to think of the data from the start. Think about extra data and sensors that could be useful later, and plan how it will be gathered and stored permanently in a cost efficient way.

Later once we have enough data, a developer will be able to use Azure Machine Learning to start experimenting with it to get predictions and insight from the data. At one point the solution will need a data scientist to make the most of the data, but gathering data from the start can put you a full product release cycle ahead.