Securing Cross Datacentre Communication

Yong Wen Chua
BasisAI Engineering Blog
8 min readSep 2, 2020

One of the questions frequently asked by our customers is how Bedrock protects the privacy of user’s training data in their workload cluster. In this article, our engineers Yong Wen Chua and Han Qiao explain the technologies behind our secure cross datacentre communication.

Introduction

Bedrock machine learning workloads run in their own private VPC that does not allow external access except through secure access points. This ensures that Bedrock, and anyone else not authorised cannot access your internal model endpoints and data through vulnerabilities or misconfiguration.

In order for Bedrock to provide you with tools to monitor and observe your ML workloads, Bedrock needs a mechanism to retrieve metadata from your workloads in a secure manner. This includes resource usage (CPU and memory) for training jobs, model endpoint throughput and latency, and service uptime alerts.

Design Principles and Goals

The overall guiding principle in designing a solution are the concepts of separation of concerns and defence in depth. Each component in the Bedrock platform should only be concerned with performing its own separate function and nothing else. Defence in depth entails multiple levels of security controls so that if any fails, the damage can be limited. The solution we come up with should be scalable with the number of workload clusters that Bedrock has to manage.

With these principles in mind, we set out several goals that our design should meet.

  1. The communication channel must be encrypted and not be susceptible to man-in-the-middle meddling and snooping.
  2. The communication channel must be decoupled in a manner that allows disabling on either Bedrock’s or your workload cluster’s sides.
  3. The communication must only be authorised for use only by Bedrock.
  4. Bedrock must not be able to access more than metadata in your workload cluster, such as training data, models, and endpoints.
  5. Any credentials used in the authentication and authorisation should not be permanent and can be rotated easily.
  6. There must be a one-to-one communication channel between Bedrock and each Workload cluster. There must not be any cross talk possible between clusters.

Possible Solutions

There were a couple of solutions we considered but ultimately decided that they fell short of our design principles and goals.

VPC Peering

The simplest method to allow Bedrock to retrieve metadata from your workload cluster would be to use VPC Peering between Bedrock’s VPC and your workload cluster’s VPC. Although this solution does not expose network traffic to the public internet and is generally secure, it does not fulfil our design principles.

Peering essentially joins the networks of Bedrock and the workload clusters together. Bedrock and its components can now freely communicate with your workloads and vice versa. This raises additional security concerns as the services managed by Bedrock are no longer isolated from services running in the workload cluster.

Security in an open network is difficult to manage. Bedrock schedules services and workloads using Kubernetes with auto-scaling node pools. In such a dynamic environment, it is not possible to define exactly which node or port a service will be scheduled to run on. Therefore, it is simply infeasible to define fixed firewall rules between services. Any attempt to do so may also lead to misconfiguration and result in hard to diagnose issues.

Finally, VPC peering is not scalable because there are limits in the number of peerings each VPC can have. We will not be able to scale with the number of workload clusters we manage.

Thus, we are unable to achieve our design goals with VPC peering.

Mutual TLS Communication over the Internet

We have established in the previous section that peering the VPCs together, while simple and convenient, has many undesirable characteristics and consequences. As an alternative, we considered using mutual TLS over HTTP over the internet as the communication channel.

This would require deploying a proxy component inside the workload cluster that acts as both HTTP client (to contact Bedrock) and server (for Bedrock to contact). This proxy would have to be accessible via the public internet and essentially forwards traffic between for Bedrock and the workload cluster. Client certificates present the biggest obstacle in adopting this solution.

First, there is no secure way of rotating the client keys automatically, unless we resort to additional third party services like HashiCorp Vault.

Second, trusting the client certificate proves to be even more difficult. While the public server endpoints can make use of certificates signed by public Certificate Authorities (CA), the client certificate presented during a mutual TLS cannot make use of any of these public CA. Otherwise, anyone can possibly masquerade as the clients for communication should they get hold of any of these certificates. In addition, it is unlikely that there will be CA that issues certificates for this usage.

Third, in order to prevent different clusters from masquerading as one another, each cluster will have to have its own CA for its client certificate. This increases the operation and architectural complexity for Bedrock’s infrastructure. In addition, since TLS occurs at a lower level on the network stack, it’s hard for Bedrock to identify the source of any messages.

We would then have to run our own CA to issue client certificates for Bedrock and a separate CA for each cluster. We can possibly use HashiCorp Vault to manage a pair of CA for Bedrock and cluster. Vault can issue short lived certificates that can be renewed automatically and securely. This, however, increases the complexity of the infrastructure by a degree that we are not comfortable with.

In addition, there is no good way to check if a certificate is revoked. Vault does not support a CRL server for checking certificate revocation. We would have to live without a CRL, deploy one ourselves, or issue short lived certificates.

Finally, without a service mesh, the proxy server’s endpoint will also be accessible by workloads inside the cluster and outside from the internet. This additional attack surface is susceptible to denial of service and any zero-day vulnerabilities.

Broker via Cloud Provider Message Queues

Building on the concept of a proxy server detailed in the previous section, we came up with the solution of utilising the underlying cloud infrastructure message queue services to create a secure communication channel between Bedrock and the workload cluster.

Broker

Broker is a client that runs in the workload cluster to proxy user requests for workload metadata from Bedrock to the workload cluster. It allows users to fetch resource usage (CPU and memory) for training jobs, model endpoint throughput and latency, and service uptime alerts.

As shown in the figure above, when users request for metadata, Bedrock queries the Prometheus service inside the workload cluster via the broker service to generate visualisation and alerts. Currently, Prometheus is the only metadata server accessible by broker. Support for additional services in the future will be evaluated carefully against our design principles before implementation.

Apart from processing information in transit, none of this metadata is persisted in Bedrock so customers can be assured that their data only lives inside their workload cluster. Together, this fulfils the fourth design goal that Bedrock is unable to access arbitrary data and services on the workload cluster.

Communication via Message Queues

Bedrock and Broker communicates securely via the relevant message queue services provided by the underlying cloud infrastructure. On Google Cloud Platform, PubSub is used while on Amazon Web Services (AWS), a combination of SNS and SQS are used. Access to the cloud providers’ APIs are done securely through HTTPs and messages are encrypted in transit and at rest by the cloud providers’ managed keys or, optionally, through a custom Customer Managed Key. The message queues thus meet our first design goal of being secure.

The message queue also decouples Bedrock and Broker. Combined with the fact that access to the message queue is done through the cloud provider’s Identity and Access Management (IAM) systems for authentication and authorisation, access for Bedrock or Broker to the message queue can be removed by simply revoking the relevant permission through IAM. This fulfils the second and third design goals.

Ephemeral Credentials

Bedrock and Broker both make use of short lived ephemeral credentials issued by a secure third party to access the message queue APIs to meet the fifth design goal.

To retrieve the credentials, both services have to first authenticate with the third party service by establishing their identities. Both Bedrock and Broker are deployed in their own Kubernetes clusters. They establish their identities with the third party through the use of the Kubernetes Service Accounts. Usage of these service accounts is controlled through a combination of Kubernetes RBAC system and the underlying Cloud IAM.

The Broker uses its Kubernetes service account identity to retrieve ephemeral cloud credentials using methods specific to each cloud provider. The credentials are refreshed automatically at intervals and have a limited lifespan.

Similarly, Bedrock will use its Kubernetes Service account to authenticate with HashiCorp Vault. It will get a time limited token from Vault in return which it will continuously renew throughout its lifespan. It will then make use of the relevant secrets engine to retrieve one set of credentials per workload cluster to access the message queues. The credentials lifetime is tied to the lifetime of the token and are revoked when the replica of Bedrock terminates.

Defence in Depth

By using a message queue to communicate with Bedrock, we can establish a secure one-to-one communication channel between Bedrock and Broker for defence in depth. Since Broker uses long polling to pull messages from a static message queue, it does not need to expose any ports in its own container. This provides additional security guarantees compared to VPC peering or mutual TLS which both require making a service discoverable in either the peered cluster or from the Internet. Broker, on the other hand, is not discoverable or accessible by anyone but Bedrock.

In addition, Broker does not execute arbitrary user submitted code. It only performs a set of predefined metadata read operations from the Prometheus server. Any unrecognised commands in the message queues are dropped. In the unfortunate event of a compromise, the attacker will not be able to use Broker to gain access to other healthy workload clusters. This fulfils our sixth design goal that there cannot be crosstalk between workload clusters and effectively reduces the blast radius during an attack.

Conclusion

Communication between Bedrock and customers’ workload clusters is done through a secure message bus with access control done through use of ephemeral cloud provider’s IAM credentials. We have achieved all of our design goals and adhered to our design principles.

If you have any further enquiries, feel free to leave a comment or reach out to us via this link: https://basis-ai.com/get-in-touch.

--

--