Comparing Google Cloud Platform, AWS and Azure

Georgian
Georgian Impact Blog

--

A starting point to help you choose the right platform for your ML project.

By Jing Zhang

Introduction

If you’re looking for an end-to-end machine learning (ML) platform, you’re spoiled for choice. There are three main choices for cloud providers: Google Cloud Platform (GCP), Amazon Web Service (AWS) and Microsoft Azure Platform (Azure). The question is: how do you choose between the three? What functionality do they provide to build ML pipelines? We set out to answer these questions in a recent hackathon.

The R&D team at Georgian, where I work as an ML Engineer, decided to organize a hackathon to explore how each provider can help with the workflow. Our team builds machine learning software components ourselves, but we’re typically working with the growth-stage software companies in the Georgian family to deploy, so we wanted to familiarize ourselves with the different platforms and to be able to adapt to their workflows faster.

In this blog post, we’ll give a high-level overview of the ML platform solutions provided by GCP, AWS and Azure based on our experience during the hackathon. This list is not exhaustive but shows the results of our hackathon research.

Specifically we’ll cover:

  1. What services are provided on each platform to run a ML project from end to end, and
  2. Whether there is a preference for choosing one platform over another for any particular project.

We hope this can offer help as a starting point to help you choose the right platform for your ML project.

Preparation

Project Definition

To help compare the different platforms, we chose a project that we could run on all three so we could compare apples to apples. Our project was to build a binary classification model with a given dataset and an end-to-end pipeline on a cloud provider. We used a dataset of companies that we use for our own machine learning platform, Spring. Spring helps us identify companies that fit our investment profile.The dataset contained general information about companies including their management team, financial information and office locations. The goal was to identify whether a company fit well into our investment profile (labeled 1) or not (labeled 0).

Dataset

Since we used our own data for this project, we can’t share the specifics here, but to give you a sense what we were working with, here’s an overview of the dataset:

  • Dataset format: tabular
  • Number of features: 353
  • Training data: 1956
  • Validation data: 653 (labels provided)
  • Test data: 652 (held out test set, labels hidden)
  • Data types: numerical, categorical, text, and date time

We also intentionally injected noise and error in the dataset so that we could run some tests on the data cleaning functionality. Specifically, we introduced:

  • Negative numbers formatted by parentheses as in accounting, e.g., (300) instead of -300
  • Column names that included characters like “% and #”

Evaluation Criteria

Functionality — Can you do X?

One of the goals of this hackathon was to explore and assess each aspect of the end-to-end ML pipeline shown in the figure below. We decided to approach the challenge this way so that we would be able to assess the needs of a given project against the providers. For example, if explainability is important to a project, which is the best choice? Or, which provider performs the best if you want to run heavy testing?

(credit: AI Platform | AI Platform)

Specifically, we wanted to assess each platform’s functionality in these areas:

Preprocess

  • Clean up the dataset

Discover

  • Data exploration and visualization
  • Feature correlation
  • Text analysis

Develop

  • Run jupyter notebooks on the cloud

Train

  • Explore AutoML
  • Explore distributed training from hosted notebook
  • Explore distributed training from script
  • Experiment tracking (without third-party tools like Comet.ml)

Test & Analyze

  • Error analysis
  • Fairness analysis

Deploy

  • Batch prediction of the deployed model

Model Performance

Since we were building a model, we needed to think about model performance. The metric we tracked was Area Under the Receiver Operating Characteristic Curve (ROC_AUC). We weren’t, however, using model performance to make a judgment on which platform was better, so long as the results were roughly comparable.

Model Fairness

Fairness in ML has emerged as an enormous issue area for policy makers, industry and the public. It’s important to have tools at your disposal to verify that your model is not treating any group and individual unfairly. To be able to address fairness, explainability tools are important too. These tools allow you to query why a model reached a certain decision. We were looking for the availability of the tools rather than assessing their relative performance.

Cost

Unlimited budgets are nice to dream about, but the reality is that cost is a constraint we should always keep in mind, so we tracked the costs for each provider.

Overall Comparison

Functionality side by side

As you would expect with these three platforms, there is a product or service for each step in the ML pipeline. The differences between them are in how well the services integrate with each other to build an end-to-end ML pipeline experience.

We have summarized the available services in the areas we’re interested in the table below. Going into detail is beyond the scope of this blog post. You can use this summary table as a starting point to see what’s available at each step for each provider.

Cloud ML Platforms Functionality Comparison

For our team, the most important features are the model development and model deployment sections and, to a lesser extent, data visualization and QA. We were glad to see that all three providers have hosted notebook services, experiment tracking and version control, and easy deployment methods.

As for AutoML, all three providers have developed their own offerings. We used it to build our initial models and check whether there were valuable signals in the dataset. It would be great if the explainability component could be integrated so we could understand how the models are built for further model analysis and development.

Model Explainability

Speaking of model explainability, while all three platforms all provide some tools, the functionality varies. If you have specific requirements, make sure you check if the current functionality satisfies your needs.

GCP uses a package called “the what-if tool”. You can integrate it with your notebook and play with the model by changing threshold or feature value for a given example. This allows you to check how certain changes affect the prediction result.

If you are using the Sagemaker debugger, it allows you to analyze how feature engineering and model tuning are done.

Azure provides a built-in module in their SDK, which seems to have the best integration.

Cost and Model Performance

So how did the models perform? As measured by ROC_AUC, performance was comparable across all three platforms. Azure and GCP scored slightly higher than AWS. This doesn’t necessarily mean one platform is better than another. It matched our expectation that the scores vary but should be close.

Cost, on the other hand, was more interesting. The cost on AWS was considerably lower than GCP and Azure. Please note this isn’t strictly an apples-to-apples comparison. The Azure team on our hackathon explored more services since Azure was a completely new platform to our team and we wanted to use the opportunity to learn more about it, so that may account for some of the cost difference.

One question we asked was: “Is it worth spending 4 to 6 times of money to get a 5% performance improvement?” The answer depends on the problem you’re addressing. For example, we’re looking for companies that could potentially generate large returns for our fund, so it may be worth spending the extra few hundred dollars. If you have budget concerns and the spending doesn’t justify the return, then it’s a different story.

Summary

Based on our observations, all three cloud providers cover the aspects of the ML workflow we care about.

Two hot topics in the industry right now are the rise of AutoML and the need for an end-to-end machine learning workflow in one place to provide a frictionless experience. As we mentioned earlier, AutoML products are already available on all the platforms but the end-to-end pipeline experience doesn’t seem to be mature yet.

On GCP and AWS, you’ll need to assemble multiple products together to get to the desired outcome. Azure, on the other hand, provides a machine learning designer service with drag and drop UI. This might imply that they are targeting different customer bases.

With the drag and drop UI interface, Azure’s machine learning designer may be more friendly to those new to data science, with little coding and technical background, to try out machine learning projects and evaluate whether it brings value to the business. Many corporations already use Microsoft products, so it might be an easy choice for teams at larger companies who are looking to try machine learning for the first time.

AWS and GCP seem to be more developer-focused. Though it’s a little more work to assemble a pipeline, they’re more customizable with the different components available. These components and the connection of the pipeline are often developed through code and configurations rather than an user interface. Companies who are more familiar with the options and know what they want to achieve may prefer this option.

We certainly learned a lot from this Georgian hackathon and it puts us in a better position to undertake projects on any of these three platforms in future. We hope that this is useful to you and helps you pick the right platform to start your next project.

--

--

Georgian
Georgian Impact Blog

Investors in high-growth business software companies across North America. Applied artificial intelligence, security and privacy, and conversational AI.