Over the last few months I have been working with the Google Cloud Machine Learning (ML) Engine to deploy machine learning models into production. Google’s vision is to make machine learning and deep learning accessible to all from engineers to analysts to data scientists. Over the last few years they have released a variety of cloud based tools designed to lower the barriers of entry to using Artificial Intelligence (AI).
My experience so far has been positive ML Engine does seem to abstract a lot of the complexity of deploying models away for example. In the following post I want to review the various tools that Google have or are releasing in an effort to democratise AI.
The primary purpose of BigQuery ML is to make machine learning accessible to data analysts by enabling the user to develop machine learning models with standard sql. Additionally it allows for fast prototyping, by allowing users to run a model directly where the data already sits, the viability of a machine learning project can be assessed with limited time investment. This is currently in a beta release, and only supports three types of models — linear regression, binary logistic regression and multiclass logistic regression.
It can be used through a number of tools that data analysts will already have extensive knowledge of. These include spreadsheets with a connection to BigQuery, the BigQuery web UI, statistical software and the Google visualisation tools Cloud Datalab and Data Studio.
I have recently tried using this both in Datalab and the BigQuery web UI and found it very straight forward to get it working. In essence it is as simple as writing a sql query similar to the one shown below. You will need to create the data set in your project first by following the instructions in the documentation. Running this extracts the data and trains the model selected in OPTIONS using this data. You can view training statistics and evaluate the performance of the model directly in the web UI which is impressive. You can access predictions via a different sql query.
My hope is that in the future they will build in support for other open source libraries such as scikit-learn. Currently the limitation of only having regression based models available limits the ability to develop the most accurate models on real world data.
Aimed at both developers and data scientists this solution offers a managed service to train and deploy machine learning models. A number of machine learning libraries are available including scikit-learn, XGBoost, Keras and Tensorflow (more on this later in the post).
I have recently been trialling using ML Engine to train a classification model with scikit-learn. I have been able to serve the model in a production environment without having to involve engineers so on the surface this appears to be a way to democratise deploying models for data scientists.
The process we have followed is relatively straightforward. I created a packaged version of the scikit-learn code for training the model and predicting on new data. This was placed on cloud storage. A cli command needs to be run for each part of the process to create, train and then serve the model. You can then send online prediction requests to the served model.
This product is aimed primarily at developers, enabling those with limited knowledge of machine learning to deploy a number of pre-built models for specific use cases. Current offerings include AutoML Vision for image classification, AutoML Natural Language for text classification (I am currently trialling this for sentiment analysis) and AutoML Translation.
With these products you import your prepared data from Google Cloud Storage using the UI for the AutoML product you are using. You train the model using options in the UI. The performance is evaluated via the command line, and predictions are accessed with the UI.
Tensorflow, created by the Google Brain team is an open source library designed to simplify large scale deep learning. This tool enables both experts and non experts to leverage the learning power of Artificial Neural Networks. In recent years these networks have powered a lot of the growth in advanced AI including the development of self driving cars and facial recognition.
The Tensorflow library includes a set of pre-made estimators that act as building blocks to use neural networks for classification, regression, as well text and image identification. Similar to other open source libraries for machine learning the code is written in python and follows a similar train, predict and evaluate set of functions. Tensorflow can be also used with ML Engine to serve these models in a production environment.
I have been fortunate to have access to use these Google tools with data already in BigQuery as part of my job. However for anybody that wants to try them Google offers free tiers for most of these products. Additionally BigQuery has a variety of public datasets that can easily be made available to practice with. Although my use of these tools is limited so far it does seem that they offer a way to take some of the pain away from developing and deploying both machine and deep learning. I am hopeful that with better support for a wider variety of machine learning libraries, and versions, that these tools will open up the use of AI to more job roles, and enterprises.