Machine Learning for the Masses!

Mairead O'Cuinn
LibertyIT
Published in
7 min readFeb 26, 2019

--

What do microwaves and machine learning have in common?

This is the question I found myself thinking about at a Web Summit talk last November.

Cassie Kozyrkov, Chief Decision Scientist from Google, was walking us through the analogy of machine learning and microwaves…

Microwaves have been around for some time, and whilst most people wouldn’t have a clue how to build one, that doesn’t stop them successfully using them to heat up food! Likewise, with machine learning, years of research have already gone into developing superb algorithms, so we don’t need to start from scratch if we need to use an ML model…

Kozyrkov explained that there are actually 2 types of machine learning (ML):

  1. ML Research and
  2. Applied ML

Companies are struggling to get started in ML as it’s perceived that they first need a team of data scientists (really only needed for #1), however there are many tools and algorithms already publicly available to create ML models on a given dataset (see #2). As Kozyrkov says:

“The world is teaching us to build more and more microwaves — who is teaching us how to use them?”

This talk really got me wondering more about the options for companies and software developers who are new to machine learning and data science in general…

One thing for sure is that demand for machine learning is growing exponentially!

Savvy organisations know that using AI will be vital to stay ahead of competitors, whilst consumers now increasingly expect their devices and software to exhibit “smart” features.

So why aren’t all companies already using machine learning?

Well apart from a lack of knowledge on how to go about it, there have also been historical roadblocks such as:

Inadequate computing power to process the huge volumes of data needed to train ML models.

e.g. you may have months of proxy data stored in your logging tool but there might only be enough capacity to process 5min of it! Also, getting the first model going may have been fine, but scaling was a problem because the infrastructure wasn’t there.

Not enough good examples in datasets to accurately train a model for a certain use case.

e.g. having thousands of terabytes of Cisco ESA logs won’t help find email data exfiltration patterns if there’s only one example to learn from…

Nowadays however, Cloud computing has largely resolved big data issues, and if you don’t have enough data examples in your own datasets, there will hopefully be a public dataset which you can purchase — or maybe even use for free!

Machine Learning Enablers

So what’s available to companies today if they’ve decided that it’s Applied ML which they want to use?

Well, the resources range from Python ML libraries like Scikit-learn, to complete open-source platforms, like Tensorflow.

Google ML Kit is Google’s ML SDK which allows mobile developers to use ML to build features on Android and iOS, whatever their level of expertise.

Common Voice is a Mozilla project to help make voice recognition open to everyone. Now you can donate your voice to help build an open-source voice database. They are hoping to build this out in all of the world’s natural languages from Japanese to Cornish!

Also, for all you AWS developers out there, there is AWS Sagemaker, which is a fully-managed service that enables data scientists and developers to build, train, and deploy ML models at any scale.

AWS are continually enhancing Sagemaker and have recently released the following new features to make ML even more accessible:

Amazon Personalize (In Preview)

  • An ML service that makes it easy for developers to create individualized recommendations for customers using their applications.
  • Real-time personalization and recommendation.
  • Based on the same technology used at Amazon.com

Amazon Forecast (In Preview)

  • A fully managed service that uses ML to deliver highly accurate forecasts based on time-series data.
  • Accurate time-series forecasting service.
  • No machine learning experience required.

Ground Truth

  • Uses ML to automatically label data.
  • Very useful for those in-house photos of Claims vehicle damage etc.

Neo

Enables ML models to train once and run anywhere in the cloud and at the edge.

One company that’s really stripping down the barriers to ML is DataRobot!

DataRobot’s CEO, Jeremy Achin, says that:

“If everyone on the planet became data scientists there still wouldn’t be enough.”

So DataRobot have created an “an automated ML platform which captures the knowledge, experience and best practices of the world’s leading data scientists to deliver unmatched levels of automation and ease-of-use for machine learning initiatives.”

DataRobot “automates automation” for you and enables users of all skill levels to harness the power of “10x Data Scientist capabilities” to build and deploy highly accurate ML models in a fraction of the time.

DataRobot say that In 2019, we’ll see democratisation of the data scientist role with “AI workers” coming from both business and software backgrounds.

A recent report from Gartner concluded that by 2020,

“due largely to the automation of data science tasks, citizen data scientists will surpass data scientists in terms of the amount of advanced analysis they produce and the value derived from it.”

Another recent development in the spread of ML has been the emergence of Machine Learning Marketplaces…

So now, if you just need that 1 killer algorithm to get your model going, or if you’re in search of a complete end-to-end AI pipeline, there’s a market out there for you!

Google’s AI Hub is a “one stop shop for everything AI” where you can buy everything from an out-of-the-box Kubernetes pipeline for your retail recommender system, to any number of plug and play components.

AI Hub provides enterprise-sharing capabilities for organizations to privately host their AI content as well as to foster reuse and collaboration.

Users can also access unique Google Cloud AI and Google AI technologies that can be easily deployed for experimentation but also all the way to production on Google Cloud and on hybrid infrastructures.

Algorithmia is another company which not only hosts models for you but also operates a marketplace of 5000+ algorithms!

With Algorithmia, you can discover models and deploy them with a simple API call and its algorithms come in 14 different programming languages which can be piped together e.g. you can take the results from one algorithm written in Scala, and pipe them into another algorithm written in Python…

With this wealth of possibilities, Algorithmia provides a great way for developers and data scientists to work together by mixing and matching each other’s algos, along with those built-in, to produce the best fit models.

Finally, not to be left behind in the AI marketplace race, AWS Marketplace for ML was launched at AWS re:Invent last year and allows developers and data scientists to find and buy ML algorithms and models, and deploy them in AWS SageMaker.

Haptik, the leading conversational AI company, was one of the first sellers in this marketplace which also features algorithms from the likes of Mitra who are an innovation solutions company, selling models such as their Bitcoin Price Indicator and their Abusive Content Detector which developers could subscribe to and start using straightaway in Sagemaker.

Customers can find and subscribe to hundreds of algorithms and models here which facilitates instant buyer and seller feedback.

So as you can see, there are multiple solutions which you could use to leverage machine learning in your software with new resources becoming available every week.

The Data Scientists have already done most of the hard work creating the algorithms for us - now we just need to start picking them up and applying them!

--

--

Mairead O'Cuinn
LibertyIT

Software Engineer on AI team @ Workgrid Software