How Fast is Snowflake with AWS S3 Express One Zone?

Published in

Snowflake Engineering

4 min readJan 13, 2024

In this article, I share the results of combining Snowflake with AWS S3 Express One Zone to evaluate performance and compatibility.

Generated by DALL.E. White bear riding on a highway in a futuristic city.

High-Speed Storage : AWS S3 Express One Zone

In December last year, AWS announced S3 Express One Zone.

Amazon S3 Express One Zone is a high-performance, single-Availability Zone storage class purpose-built to deliver consistent single-digit millisecond data access for your most frequently accessed data and latency-sensitive applications. S3 Express One Zone can improve data access speeds by 10x and reduce request costs by 50% compared to S3 Standard.

If this is true, using it as an external stage for Snowflake could potentially create extremely fast external tables.

Conclusion

For read-heavy workloads, there was an improvement of up to 16% in query performance. However, at this point, there’s no pressing need to migrate from standard S3 buckets to Express One Zone. The disadvantages currently outweigh the benefits.

Firstly, write operations could not be executed and resulted in errors. Also, queries like LIST, which scan the entire bucket, were up to 20 times slower.

Overall, it seems that Snowflake’s internal code isn’t designed to accommodate Express One Zone, leading to frequent errors. As it’s unsupported, regular usage poses challenges.

For those looking to leverage Express One Zone for a high-speed data lake in Snowflake, it appears necessary to wait for an official support announcement.

Below are the detailed findings from my testing.

Executable and Non-Executable Operations in Express One Zone External Stage

I explored which operations could be performed on an Express One Zone external stage. Writing processes to the stage resulted in errors.

Here’s a summary of operations that were executable and those that were not. △ indicates possible but significantly slower.

Tested Operations

Querying external stage ✔
Creating external tables ✔
Loading from external stage ✔
REST API Ingest from Snowpipe ✔
Unloading to external stage ✕
Auto Ingest from Snowpipe ✕
LIST command △
REMOVE command ✕

Successful Operations

Querying an External Stage

Queries can be executed almost normally. Note that the prefix must end with /.

SELECT $1 FROM @express_onezone/myfolder/(
   FILE_FORMAT => 'my_parquet_format', 
   PATTERN => '.*.parquet'
)

With Express One Zone, the ListObjectsV2 prefix must end with /. For example, use LIST @express_onezone/myfolder/ instead of LIST @express_onezone/myfolder to avoid errors.

Creating External Tables

Creating external tables was possible. Metadata columns like metadata$filename worked as they do with standard buckets.

CREATE EXTERNAL TABLE EXPRESS_EXTERNAL_TABLE (
   ...
   _LOAD_AT TIMESTAMP_NTZ AS (metadata$start_scan_time::timestamp_ntz),
)
    location = @express_onezone/myfolder/
    file_format='my_parquet'
    ;

LIST Command

LIST @express_onezone/my_directory/;

COPY INTO TABLE FROM [STAGE]

Though the SQL is omitted, loading data was feasible as usual.

Failed Operations

Unloading to Express One Zone

Standard unloading didn’t seem possible. This bucket does not support prefixes ending without a delimiter.

COPY INTO @express_onezone/mydata.csv
FROM mytable
FILE_FORMAT = (TYPE = 'CSV');

Even specifying `SINGLE` didn’t work. Internal error accessing stage area: Internal error: java.lang.IllegalArgumentException: Cannot create enum from EXPRESS_ONEZONE value!

COPY INTO @express_onezone/mydata.csv
FROM my_table
FILE_FORMAT = (TYPE = 'CSV')
SINGLE=TRUE;

Auto Ingest from Snowpipe

This was not possible. I couldn’t find a mechanism for managed SQS notifications in Express One Zone. Ingest via the Snowpipe REST API was feasible, however.

Reading Speed Tests

Since writing tests couldn’t be conducted, I evaluated the execution speed of read-heavy queries. Reading performance was approximately 16% faster.

However, the LIST command was notably slow. Even during COPY INTO, significant time was spent on Listing external files....

Performance Comparison: Standard Bucket vs. Express One Zone

Test Conditions

Compared buckets with identical folder structures and files.
About 150,000 Parquet files were involved.
The total data volume was approximately 250GB.
The number of rows was around 3 billion.
Each file ranged from 15 to 100MB in size.
The warehouse size used was 2x-Large.
Data was stored in a hierarchical structure like bucket/yyyy/dd/mm/data.parquet.

Creating an External Stage with Express One Zone

Here’s a guide for future reference, for when Express One Zone becomes regularly usable.

Creating an Express One Zone Stage

Pay attention to AWS permissions. Permissions need to be for s3express:* instead of just s3:*.

CREATE OR REPLACE TEMPORARY STAGE express_onezone
  URL = 's3compat://xxxxxx--apne1-az4--x-s3/'
  ENDPOINT='s3express-apne1-az4.ap-northeast-1.amazonaws.com'
  CREDENTIALS = (AWS_KEY_ID = 'xxxxxxxxxxx' AWS_SECRET_KEY = 'xxxxxxxxxxxxxxxxxxxxxxxxx');

Previously, adding an endpoint required support, but now it seems possible without authorization.

Copying Files from S3 to Express OneZone

Use the aws sync command to create a copy of an existing bucket.

aws configure set default.s3.max_concurrent_requests 100
aws s3 sync s3://source_backet s3://xxxx--apne1-az4-x-s3

Use the stage

SELECT $1 FROM @express_onezone/myfolder/(
   FILE_FORMAT => 'my_parquet_format', 
   PATTERN => '.*.parquet'
);

Postscript

After testing, I realized from an AWS bill of $134 just for the ListBucket operations that the number of ListBucket calls was incredibly high. With a cost of $0.0025/1000 requests, it seems there were around 536,000 ListBucket calls. The slowness of the LIST command might have been due to recursive LIST operations on each directory. Until there's official support, it might be wise to moderate playtime with Express One Zone :)

Summary

While regular use still seems challenging, there’s hope for future improvements.