Anthos : Portable Google Cloud Platform Hybrid Management
Bringing the cloud down to earth
Hybrid is a bit of an overloaded term; in this article, hybrid management refers to managing applications and infrastructure located on-premise (where on-premise may included a managed service provider) and in the cloud in the same way and using the same tools; it does not refer to extending on-premise network boundaries to the cloud, or distributing components of your application between on-premise and the cloud.
There are a few reasons some enterprises may also wish to run Google infrastructure elsewhere; they may be…
- Based in countries which have stringent data sovereignty requirements, and in which Google Cloud Platform doesn’t have a data center; as the Google Cloud Platform worldwide footprint continues to expand (18 regions and counting), this is becoming increasingly rare.
- Running some of their software in areas with poor internet connectivity which isn’t suitable for high-bandwidth data transmission, and which may not be always available.
- Building SaaS solutions on top of Google Cloud Platform and have major customers which fall in the buckets above.
Last year, Google Cloud announced the Cloud Services Platform, now rebranded to Anthos, an integrated family of cloud services that lets you increase speed and reliability, improve security and governance and build once to run anywhere, across GCP and on-premise environments. General availability was announced in March and includes:
- Hybrid computing: GKE On-Prem with multi-cluster management
- Policy enforcement: Anthos Config Management, to take control of Kubernetes workloads
- Service mesh: Availability of Istio 1.0 in open source, Istio and service management, and Apigee API Management for Istio.
- Ops tooling: Stackdriver Service Monitoring
- Serverless computing: GKE Serverless add-on and Knative, an open source serverless framework
- Developer tools: Cloud Build, a fully managed CI/CD platform
This makes it much simpler to implement what Chris Ciborowski referred to as GIFEE or “Google Infrastructure For Everyone Else”, and realize the best of many worlds: a local data plane running on an on-premise open-source infrastructure stack managed by a cloud-based control plane, with portability to the cloud, and backed by Google, which has been running containerized workloads at massive scale in production for more than a decade. The beta runs on VMware because it provides a common on-prem platform.
However, as Miles Ward pointed out, there’s more to hybrid management than infrastructure: applications need messaging and data services. Miles extended GIFEE to address application component architecture, describing the primarily open-source ecosystem which delivers the application stack on-premise (eg. HBase, Druid/Drill), as well as proprietary components such as load balancers and file systems.
Managed Services On-premise
One of the significant benefits of the cloud is that you can use it like a utility: it’s a fully managed service which you just plug into.
Replicating this on-premise is a challenge; there are a couple of approaches to delivering a similar experience on-premise; which one is right for you will likely depend on your existing vendor relationships, and how many / which services your applications need.
There are enterprise subscriptions for many of the open-source GCP equivalents:
- Document store: MongoDB Enterprise Advanced
- Wide column store: DataStax Cassandra
- Messaging: Confluent Enterprise Subscription (Kafka)
- OLAP: Imply Druid and MapR Managed Services
Managed service platforms
Because GKE On-prem is very new, today these stacks are managed separately from your GKE On-prem application stack using the vendor’s management tools; conceptually (and in some cases actually) you can think of them as appliances.
Global Systems Integrators and Managed Service Providers
Global Systems Integrators working with regional managed service providers may be the best option for SaaS providers to deliver a managed service experience to their customers.
While not the focus of this article, it’s worth touching on: some customers have this requirement in countries with national firewalls, or for workloads which can only be managed by nationals of that country (eg. workloads requiring “FedRAMP high” compliance).
Offerings like Cisco Container Platform can address the infrastructure component of this.
If you want to be able to run the same application both in the cloud and on-premise, you need portability. To realize the promise of application portability, you want to build your applications once and run them anywhere using the same tools.
Anthos helps address infrastructure and policy portability; there are a couple of other key aspects: API and tools portability.
Application API portability is what developers think of first re. write once / run anywhere. There are two ways of solving for this:
- Use common services across the cloud provider and on-premise. Along one dimension this is the degenerate case and the solution is the same as if you’re just running on-premise. Cloud providers such as Google provide one-click deploy of datastores/databases such as MongoDB/Cassandra and messaging services such as RabbitMQ/Kafka with various levels of management through their marketplaces (eg. Google Cloud Marketplace), and there’s a well-established on-premise ecosystem. This approach provides your application with full access to all of the services’s features; the downside is that even in the cloud, you either need to take advantage of the service vendor’s managed service offering, or manage scaling, high availability, and disaster recovery yourself.
- Use managed cloud services in the cloud, and open-source services on-premise. This provides you the full power of the cloud provider’s services; in the case of Google Cloud Platform’s fully managed services, this includes scaling, high availability and disaster recovery as part of the fabric; this is difficult to replicate. The downside is that your application needs to speak to different APIs in the cloud and on-premise. Fortunately this is solvable: conceptually this is the extension of Java’s write once, run anywhere promise for which Enterprise Java Beans and the Spring Framework solved decades ago, and which the Go Cloud Development Kit solves for today.
You’ll need to determine the right approach based on your preferred development language, your timelines, the complexity of your applications’ use of the services’ APIs, and the relative value of fully managed services in the cloud.
Go Cloud Development Kit
The Go Cloud Development Kit (Go CDK) allows Go application developers to seamlessly deploy cloud applications on cloud providers, as well as on-premise. It does this by providing stable, idiomatic interfaces for common use cases like storage and databases; it supports provider-specific escape hatches for edge cases to avoid the least common denominator trap.
Today it provides the following interfaces and implementations:
- Google Cloud Storage
- Azure Storage’s BlockBlob, along with OpenBucket
- AWS S3
- Google Cloud SQL
- AWS RDS PostgreSQL
- Google Cloud SQL
- AWS RDS MySQL
- Google Cloud Pub/Sub
- Azure Service Bus Topic and Subscription
- AWS SNS (Simple Notification Service) and SQS (Simple Queueing Service)
Some interfaces and implementations are still missing, but the project has been adding new interfaces and implementations regularly since its launch last summer: Firestore, MongoDB, and DynamoDB are on the roadmap, as is Kafka.
In addition, the Pub/Sub Emulator for Kafka implements a gRPC server that satisfies the Cloud Pub/Sub API as an emulation layer on top of an existing Kafka cluster configuration; it’s a Java-based Docker or Kubernetes distro.
The other portability aspect is how you provision and manage the services on which your application relies. Historically you’d use one set of tools (eg. gCloud) to manage Google Cloud Platform managed services, and another (often multiple) to manage on-premise open-source application data and messaging services.
Kubernetes’s Custom Resource Definitions (CRDs) provide a consistent , declarative resource management facade for the provisioning/management of external datastores/databases (eg. CloudSQL instance) and messaging services. Developers can create a custom resource definition that defines the config schema for a new resource, and a controller that implements the resource’s behavior. The Kubernetes API server machinery can manage new resources as service endpoints without requiring any modification so they can be kubectl apply’d, etc.
The Operator pattern combines custom resources with custom controllers to provide a true declarative API. This allows you to declare or specify the desired state of your resource and tries keep the current state of Kubernetes objects in sync with the desired state. You can use custom controllers to encode domain knowledge for specific applications into an extension of the Kubernetes API.
Note that Operators extend to more services than you might think; intuitively, one tends to think of a service like Google Cloud Storage as a fully managed service, so presumably not needing provisioning/management. However, the Go CDK does not provide a way of creating a new bucket, categorizing this as an operations task, so you’d need an Operator for Google Cloud Storage to manage it using the Kubernetes model.
Over time, cloud providers and open-source vendors will likely provide Operators for their resources, enabling consistent declarative provisioning of resources across clouds and on-premise. For instance, …
Read the following to learn more about the concepts and solution components described in this article:
- GIFEE — Google Infrastructure for Everyone Else
- What if you could run the same, everywhere?
- Urs Hoelzle’s vision for Cloud Services Platform
- Application modernization and the decoupling of infrastructure services and teams
- Google Cloud Services Platform announcement
- Anthos overview
- Anthos Config Management
- Kubernetes CustomResourceDefinitions
- Go Cloud Development Kit; design decisions
- Pub/Sub Emulator for Kafka
Read the following to learn more about the 3rd party providers mentioned in this article:
- Cisco Container Platform
- Cisco data center technology integration
- Nutanix Acropolis object storage service
- Stratoscale Symphony open source service
- MongoDB Enterprise Advanced; Kubernetes Operator
- DataStax Cassandra; Kubernetes Operator
- Confluent Enterprise Subscription; Kubernetes Operator (Kafka)
- Imply Druid
- MapR Managed Services (Apache Drill)
Many thanks to Zach Casper for review and feedback.