Harnessing Machine Learning: A Primer
This post is for you if you don’t have the time for lengthy courses, you are a developer, a manager, or an executive who simply wants to know what machine learning can do for your business, and how your team can get started. By the end of this post, you will know what it takes to apply machine learning to your business, specifically:
- When to use generative AI vs custom machine learning models.
- What are the components of a machine learning process.
- How to start and manage a machine learning project for success.
This post will walk through the following topics:
- What can machine learning do?
- Where shall we start?
- Business Outcomes vs Machine Learning Tasks
- Business and Model Metrics
- Experimentation
- Evaluation and Baseline
- Getting Started on Google Cloud
What can machine learning do?
A machine learning model is trained on data to make predictions on new inputs. Models can be trained to predict a number, a class membership, identify groups, or extract information from images and text. More recently machine learning models can also generate text, code, images, audios and videos. We can think of these generative models as predicting what the output should look like given the input. Generative models are trained on a large amount of language and media data, and can often perform multiple tasks. Figure 1 outlines many tasks that machine learning models are currently being used for:
Where shall we start?
Machine learning can be a powerful tool for businesses that want to improve their efficiency, growth, and profitability. A machine learning model is produced by a learning algorithm following a training process. The process consumes the data we have collected about the system we want to model. For the shortest time to accomplish these tasks, there are often pre-trained models which may very well help us reach our goals without the need to train our own models.
If the pre-trained models are not fully reaching our goals, we may tune a pre-trained model with some of our data if the model licensing and/or the host platform supports it. We can also build a model custom to our task goals from scratch using only our data. At the time of writing, pretrained models are more common for text, image, video and generative tasks, while custom models are commonly built for predictive tasks with tabular and time series data.
To determine if a pre-trained model is enough or which types of models to consider for a custom build, it is crucial to first investigate the pairings between our desired business outcome and the range of possible machine learning tasks as outlined above. Consider our business goals and the tasks that different models support early on will help us make informed decisions and avoid wasting resources.
Business Outcomes and Machine Learning Tasks
Business outcome and task pairings help us ground our thinking about how machine learning can help our business. The exercise uncovers the role machine learning can play in concert with the rest of the software architecture and business processes to achieve the outcome. The earlier we can eliminate bad ideas, the less investment we waste. Figure 2 places this step in a flywheel:
For example, it is now well-known that language and image models can aid humans in generating creative content, which can improve employee productivity or be customized and packaged as a product to assist others in creating. The semantic understanding capability of language models can be used for sentiment analysis and information extraction, resulting in efficiency and cost savings in processing unstructured data, reducing the latency in taking business actions based on that information.
A classification model can predict customer churn. This allows the business to identify customers who are likely to cancel their service and take actions to retain them. Clustering can be used to investigate how many groups of customers there are. This allows the business to reach customers through advertising or recommend products that they are likely to be interested in.
Figure 3–6 lists a number of example pairings of machine learning prediction tasks and business use cases. It is non-exhaustive and there is a lot of room for creativity:
Business and Model Metrics
The objective and task pairings allow us to consider early on how the output of our machine learning model could be used in production. The task will have narrowed the choice of model types and also the model evaluation metrics. The next consideration is how we will measure its impact against our business. This step is equally important before significant investment into the project is made. It is important to distinguish between the metrics that determine how well a model performs our task, or captures the patterns in our data, and the metrics that measure the impact to our business. Business metrics are like revenue, net promoter scores, customer satisfaction or engagement, whereas model metrics are like precision, recall, root-mean square, or normalized discounted cumulative gain (NDGC).
A good model does not always translate to business outcomes. Separating business metrics from model metrics allows us to compare business performance of different models independently from model performance. Figure 7 illustrates both feedback loops. The inner loop is the iterative development of a machine learning model, from choosing the model type, collecting and feeding the data to the learning algorithm, to model evaluation. For generative or pre-trained models, the model development step is mainly about prompting and/or tuning. The outer loop measures the business outcome as feedback into how well such models are performing with respect to business impact.
Experimentation
There is no magic formula for applying machine learning. It is an iterative journey of incremental improvements. There are several inherent uncertainties. For example, how do we know when our model has learned? Do we have enough data? How much data do we need?
To build a custom model from scratch, model experimentation is important to create a model that generalizes well to previously unseen data. The standard practice is to split the data set into training, validation, and test sets. Typically, a model is iteratively trained on the training set and evaluated with the validation set to find the model configurations (called hyper-parameters) that produce the best results. If our validation / test set is small or lacks variability, and we want to maximize the amount of data for training, we can employ k-fold cross-validation. To evaluate the model’s generalization capability, a final model using the best derived hyperparameters is built using a training / validation set, and evaluated on the held out test set representing unseen data.
With generative models, model development becomes prompt development. We split the dataset into two sets: one for prompt development and one for testing. The prompt development set covers a wide range of scenarios, while the test set ensures that our prompt generalizes to data not seen during prompt development. If tuning the pre-trained model is an option, the first split is used for tuning. If the tuning process also affords configurable hyper-parameters, we use the familiar three splits, where the validation set helps us find the best hyper-parameter configuration.
Evaluation and Baseline
Evaluation metrics and data splits help determine if you have a good model and enough data. During the training process, the learning algorithm measures loss, which is the difference between the model’s current prediction and the supplied data. As the learning algorithm progresses, this loss decreases. As the loss decreases, the performance of the model against our evaluation metric will improve.
The same evaluation can be measured with the validation set with the same model snapshot. The trajectory of the loss and the evaluation metric, known as learning curves, and the performance gap between the training set and the validation set provide a lot of information. These values can help us determine if we need more data, if our data requires a more complex model to capture its essence, or if we need to simplify our model instead due to over-fitting, which decreases model performance. Such projects should allow multiple iterations of experimentation.
To complete the loop, we shall establish a business performance baseline using the simplest of models as possible, or even methods that do not involve model training. Generative models may even give us a quick start nowadays. The goal is to establish the infrastructure or the process of measuring the business metric, allowing us to compare the business merit of our future models. Conduct A/B tests with different alternatives, and test canary release to see how a new version performs from a business perspective relative to the last.
Getting Started on Google Cloud
As mentioned above, pre-trained models are more common for text, image, video, and generative tasks, while custom models are more commonly built for predictive tasks with tabular and time series data.
Google Cloud offers a variety of pre-trained models, both first-party and third-party, including non-generative models for images, documents, text, speech-to-text, text-to-speech, videos, and foundational generative models for language and image generation. Many of these models can be further tuned with your own data. You can browse all of these models in the Vertex AI Model Garden, and begin experimenting with the generative models in the Generative AI Studio. There are sample prompts and tips for both text and image generation.
Google Cloud also offers three different approaches to building custom machine learning models. For the least amount of code, there is AutoML, which helps you find the best model architecture and feature engineering automatically with your data. There is also BigQuery ML, which lets you build models and execute predictions using entirely just SQL. For full MLOps support, Vertex AI is the end-to-end platform that enables the full spectrum of support for notebooks, custom modeling, evaluation, and deployment. You can orchestrate your own workflows, or leverage pre-built templates. Finally, if you’re already running on-premises or another cloud, and you need consistency across the platforms, your containerized training applications and components can be set up as Vertex AI Pipelines, or directly on our Compute Engine and Kubernetes (GKE) services.
There are many choices, and the best one for your project will depend on a variety of factors, such as the complexity of your task goal, the data you have available, and your team’s expertise. Google Cloud is a unified platform that caters for any data, any user, and any workloads. The key is to start with a business outcome. What business improvement do you want to achieve by using machine learning? Once you have a clear aim, you can begin to narrow down your options that are most likely to help you achieve it. There may be multiple paths to reach the same goal. By starting with a business criteria, you can make data-informed decisions about which approach is best to help you achieve the results you want. If you are new to Google Cloud, follow the Google Cloud setup checklist for simple step-by-step set up procedures.