S3 Gateway Endpoints: Good and Good for Your Wallet

Gordo
Databricks Platform SME
4 min readJun 24, 2024

Accessing S3 for data storage often incurs additional networking costs when running applications on Amazon EC2 instances. My colleagues JD and Al covered the topic nicely in this blog post since the same principle applies to Databricks on AWS. The most cost-effective way to eliminate these expenses is to use an S3 gateway endpoint. By keeping all S3 traffic within the AWS network, you can avoid data transfer charges and bypass the need for internet gateways or NAT gateways which carry hourly fees.

What is an S3 Gateway Endpoint?

An S3 gateway endpoint is a regional gateway you can provision to serve as an entry point for traffic destined to Amazon S3 from within your VPC. It allows you to keep all S3 traffic within the AWS network, eliminating the need to traverse the internet and reducing your attack surface.

Benefits of Using an S3 Gateway Endpoint

Improved Security

By keeping all S3 traffic within the AWS network, you reduce the risk of unauthorized access or data leakage. You can also restrict access to S3 buckets to only resources within your VPC by updating the bucket policy.

Simplified Network Architecture

EC2 instances with only private IP addresses can directly access S3 without needing a NAT gateway, internet gateway, public IP addresses, or bastion hosts. This simplifies your network design and eliminates the need to manage security group rules to allow internet access for S3. Compute nodes in the classic data plane use private IP addresses exclusively.

Reduced Costs

Using a gateway endpoint is free of charge, unlike NAT gateways which incur hourly costs. Additionally, you avoid data transfer costs for internet traffic since the S3 traffic stays within AWS.

Increased Performance

Traffic between your VPC and S3 may have lower latency and higher throughput by avoiding alternate routes.

Validated Solution

Databricks shoulders the burden to secure network communications when utilizing fully-managed compute in your Databricks account (i.e. the serverless compute plane). Although this post does not focus on serverless, the S3 gateway pattern plays a key role in achieving the stated goal for serverless. It is a tried-and-true solution that is on by default.

Seamless Integration

Deploying an S3 gateway is transparent to Databricks. Data access patterns such as Delta Sharing also benefit with no code changes needed. For example, in a Databricks-to-Databricks share situation, the recipient reads intra-region data over the gateway at no cost. The provider may explicitly allowlist the recipient’s VPC ID in their bucket policy for an additional layer of protection.

Centralized Access Control

The gateway supports endpoint policies to centrally manage and audit S3 access from your VPC. This powerful feature enables you to define exactly which S3 buckets and operations are allowed and thus provides a strong safeguard against data exfiltration. As an example, a granular policy includes only minimal bucket references: buckets containing business data for analytics as well as other buckets required by the platform.

Setting Up an S3 Gateway Endpoint

Setting up an S3 gateway endpoint is a straightforward process that can be done through the Amazon VPC console. Here are the high-level steps:

  • Open the Amazon VPC console and navigate to the Endpoints section
  • Create a new endpoint, selecting the AWS service category and the appropriate S3 service name for your region (e.g. com.amazonaws.us-east-1.s3)
  • Select the VPC and route tables to associate with the endpoint. These are the same values used for compute in the classic compute plane
  • Configure the endpoint policy to allow full or custom access to S3
  • Update the security group rules for your EC2 instances to allow traffic to the Amazon S3 prefix list

Once set up, your EC2 instances in the configured subnets will be able to access S3 buckets through the gateway endpoint over the AWS network. Neither an internet gateway nor a NAT device is needed. The AWS resource map for the VPC should show a connection from the private subnets of the VPC to the gateway endpoint as shown here:

S3 Gateway connected to private subnets

Using an S3 gateway endpoint is a recommended best practice for securely and efficiently accessing Amazon S3 from your EC2 instances within the same region. It provides improved security, simplified networking, reduced costs, and potentially better performance compared to accessing S3 via more complicated approaches.

--

--