Announcing Cross-region Inference on Snowflake Cortex AI

Published in

Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

4 min readAug 8, 2024

Generative AI is evolving rapidly, with more performant large language models (LLMs) becoming available at a fast clip. However, until now, due to regional availability, some customers had to wait till the model was available in the region where their application stack is located. Today, we are excited to announce the general availability of cross-region inference. Cross-region inference helps you to use your preferred LLM sooner, by making it possible for you to access models in a different region. With a simple setting, you can process LLM inference requests on Cortex AI in a different region (cross-region) if the original region (source region) does not yet provide the LLM required to fulfill the request. The new capability further enhances the usability of Cortex LLM functions, as well as features such as Snowflake Copilot and Cortex Analyst (public preview soon) that use LLMs. You can now easily and quickly integrate new LLMs as soon as these become available on Snowflake Cortex AI!

Data traversal

The data traverses across regions once cross-region is enabled. If both of the regions are in AWS, the data traverses the AWS global network privately. Data remains entirely within the AWS global network. All data flowing across the AWS global network that interconnects the data centers and regions is automatically encrypted at the physical layer. If the regions involved are on different cloud providers, then the traffic traverses public internet (encrypted using MTLS). User inputs, service generated prompts, and outputs are not stored or cached during cross-region processing; only inference processing occurs in the cross-region.

Enable cross-region inference

Snowflake Cortex AI is a fully-managed service that enables access to industry-leading LLMs. You can use LLM Functions to execute inference and generate LLM responses directly from within the secure Snowflake perimeter. For cross-region inference, you will set the account-level parameter CORTEX_ENABLED_CROSS_REGION to configure where the inference may be processed. The parameter can be accessed by the ACCOUNTADMIN role. You can set the parameter to one of the following values: AWS_US, AWS_EU, AWS_APJ.

ALTER ACCOUNT SET CORTEX_ENABLED_CROSS_REGION = ‘AWS_US’;

Once the parameter is configured, Cortex AI will automatically select a region for inference processing if the source region does not provide the requested LLM. For example, if you set the parameter to AWS_US, the inference would be processed in AWS us-east-1 or us-west-2. Similarly, for the value of AWS_EU, Cortex AI will direct the inference request to eu-central-1, and for AWS_APJ to ap-northeast-1. You do not need to have any account set up in the target (cross) region. By default, cross-region is disabled. The parameter is set to “DISABLED”, ensuring that your Customer Data will not leave the region that you configure.

Currently, the target region can only be configured to be in the AWS Cloud (as indicated by AWS_US, AWS_EU, AWS_APJ). So, it is important to note that if you enable cross-region in an Azure or GCP environment, then the request will be processed in a different cloud (AWS). We plan to enable capability in the near future so that the target region for cross-region inference is also supported in other Clouds.

Use cross-region inference for LLM Functions

Let us explore a cross-region call for the Complete LLM Function, which provides a response to a prompt. For this example, we will use the Snowflake Arctic model to summarize a paragraph. Our source region is AWS us-east-1. Per the model availability matrix in Cortex AI, we know that the Arctic model is not available in this region. To access this model, we enable cross-region inference. We can do this easily by setting the cross-region parameter to AWS_US and then, we can process the function call to summarize the paragraph.

ALTER ACCOUNT SET CORTEX_ENABLED_CROSS_REGION = ‘AWS_US’;

SELECT CURRENT_REGION() as region,

SNOWFLAKE.CORTEX.COMPLETE(‘snowflake-arctic’,

‘Summarize the following paragraph in less than 100 words: An interest rate …’

With cross-region inference configured and the specified model (Arctic) available in AWS us-west-2, Cortex AI routes the request cross region (to us-west-2) for processing. The generated response is routed back to the source region (us-east-1). All of this can be done with one single line of code.

Figure 1: Cross-region inference processing

The round-trip latency between regions depends on the cloud provider infrastructure and network status. We expect that the network latency will be negligible, compared to the LLM inference latency. We recommend testing your specific use case with cross-region inference enabled.

Pricing

You are charged credits for the use of the LLM. The credits are considered consumed in the source region as listed in the consumption table. For example, if you issue an LLM Function call in the us-east-1 region and the call is processed in the us-west-2 region, the credits are considered consumed in the us-east-1 region. Importantly, there are no additional data egress charges for the cross-region inference.

Availability

You can enable cross-region in all regions and Clouds where Cortex LLM functions are available. Inference will be processed in one of the AWS regions based on the configured parameter (AWS_US, AWS_EU, AWS_APJ). You can access all of the LLMs available through the Cortex LLM functions including task specific functions (Translate, Sentiment, Summarize, Extract Answer, Embed). Additionally, other Snowflake features enabled by cross-region inference, such as Snowflake Copilot and Cortex Analyst (public preview soon), use the functionality once the cross-region parameter has been enabled. At this time, fine-tuned inference and Document AI are not supported by cross-region inference.

Conclusion

Cross-region inference on Cortex AI allows you to seamlessly integrate with the LLM of your choice, regardless of regional availability. For further information, please refer to the documentation. We are excited to see how you use Snowflake Cortex AI to deliver generative AI innovations to your customers!