“Deployment” in Amazon API Gateway

Jaewoo Ahn
10 min readMar 4, 2022

--

For newbie to Amazon API Gateway, one of most frustrating experiences is related to its deployment. Here is an illustrated dialogue. Sounds very familiar?

“Hi, my API is working in API Gateway console, but does not work as expected when I invoked the API. What’s the problem?”

“Have you deployed your API?”

“Yes, I deployed my CloudFormation template.”

“No, I’m asking whether you’ve created a new Deployment”

“Yes, I used ‘deploy’ command in CloudFormation. It was successful.”

“No, no, no, not that one. Did you create a new Deployment resource in API Gateway?”

“What does that mean?”

“Okay then… can you go to the console then click “Deploy API”? Then choose a stage to deploy, then click Deploy button.”

“Hang on, yeah, I just did”

“Now, wait for a minute, then try to invoke it.”

“Okay.. Oh yeah, it’s working now!”

“Great, then it’s definitely deployment related issue. You need to deploy your API to apply your API changes. In CloudFormation, you must create Deployment resource in API Gateway.”

“Got it, I will do that.”

(later)

“Hey, it worked when I first deployed my CloudFormation, but it doesn’t work any more!”

It’s not your fault. The deployment in API Gateway is a confusing concept which has been amplified with CloudFormation’s behavior. No matter what its concept is, if it confuses many people, we can say it is not well-designed. In this post, I’ll try to explain about it deeper.

What is “Deployment” in API Gateway?

Deploying API means applying your API changes into a specific stage. When you make any change on API except Stage settings, you must deploy it to stage(s) to take an effect.

In Console, you can find “Deploy API” action.

Programmatically, clicking Deploy API represented by a Deployment resource:

Did you notice that they have a slightly different description? In fact, there is a difference between them. In REST API, although you can create a deployment without deploying it to stage (stageName is optional in CreateDeployment), but a stage cannot be created without a deployment (deploymentId is required in CreateStage). In WebSocket/HTTP API, you can create a stage without a deployment. Console experience doesn’t well align with the facts. For example, with REST API/WebSocket API, Console’s Deploy API does not allow you to do it without specifying a stage. Console does not allow you to create a stage in WebSocket API without a deployment.

If you haven’t noticed yet, ApiGatewayV2 made it clearer by removing “stage” from the description: You can create a deployment without deploying it to a stage. Wait, what?

To understand what truly “Deployment” in API Gateway is, let’s borrow the definition from CDK:

API Gateway deployments are an immutable snapshot of the API.

This proves why “Deployment” is badly named. It should have been named as ApiSnapshot or ApiChangeSet. To understand easily, let’s compare it with git. Every changes that you’re making into API (except stage settings) are commits to a master branch, which cannot be invoked externally. To make your API callable, you must create a stage (branch) and deploy an ApiSnapshot (labeled commit) into it. Your change doesn’t take effect until you deploy it to each stage.

So what exactly CreateDeployment does? It creates a snapshot of API with the current state. When you optionally specified stageName, it updates the stage to point to the new snapshot.

To help you to understand, let’s see the illustration. Imagine you created a new API, create a new resource /products, create a new method GET, then configured its integration to call Lambda function GetProductV1. Likewise most AWS services, API Gateway consists of ControlPlane (where you create/update/delete APIs, Routes, Stages, etc) and DataPlane (where you can invoke your API). At this time, since you haven’t created any stage, their state would be like this:

Picture1: API without any stage

If you invoke API endpoint (e.g. https://abcdefgh7.execute-api.us-west-2.amazonaws.com/products) at the moment, you will receive 403 Forbidden since there is no stage is configured nor your API change is deployed.

Now, if you use CreateDeployment with stageName dev, it will create a new Deployment(2eh6t9), create a new stage, then associate the stage to point it. Once the change is propagated into API Gateway DataPlane, if you invoke https://abcdefgh7.execute-api.us-west-2.amazonaws.com/dev/products (Note: stage name has been added to the path), API Gateway will invoke Lambda function GetProductV1.

Picture2: dev stage is created, pointing to deployment 2eh6t9

Later, you decided to create another stage called prod. In REST API, you must create a stage with deploymentId(remember you can’t create a stage without a deployment in REST API?). In WebSocket/HTTP API, you can create a stage first then use UpdateStage to assign deploymentId. Once the change is propagated into API Gateway DataPlane, now you can invoke against prod stage: https://abcdefgh7.execute-api.us-west-2.amazonaws.com/prod/products

Picture3: prod stage is created, pointing to deployment 2eh6t9

Later, you updated it to invoke Lambda function called GetProductV2 but what if you forgot to create a new deployment? Although you see GET /products is configured to invoke GetProductV2, it never been deployed anywhere. If you invoke your API, it will continue to invoke GetProductV1.

Picture4: API is updated, but no deployment created yet

Now you created a new deployment(0iv5l9), but what if you didn’t specify a stage name? Though a Deployment is created, it wasn’t associated any stage. Your stage is still pointing to a previous deployment 2eh6t9.

Picture5: A deployment is created, but the stages still point to the previous deployment

Finally, you updated dev stage to point a new deployment 0iv5l9, then the change is being propagated.

If you’re configuring manually, this should be straightforward. However, when you use CloudFormation/SAM/CDK, you can encounter several issues if you don’t understand what happens underneath. Let’s see few examples.

Issue: Nothing changed though I deployed my CloudFormation stack

What would be a problem of the below CloudFormation template?

ApiGatewayDeployment:
Type: AWS::ApiGateway::Deployment
...
ProdStage:
Type: AWS::ApiGateway::Stage
DeploymentId: !Ref ApiGatewayDeployment

CloudFormation relies on a logical id to create/update a resource. When you create a CloudFormation template, you need to use a unique logical id for Deployment resource for each CloudFormation deployment. If you don’t change it, CloudFormation will not create another Deployment nor update existing Deployment (again, Deployment is an immutable snapshot of API). Because of this, your first CloudFormation deployment will be fine, though the subsequent deployment will not create a new Deployment in API Gateway. This leads you the exact same situation of Picture4 above.

A quick remediation/diagnose on this issue is deploying your API manually. If a problem is solved by doing it, then it means your CloudFormation/SAM/CDK has some problem on creating a new Deployment and update the stage to point it.

You can fix this issue by making the logic id unique (e.g. append timestamp or a hashed value) for each CloudFormation deployment.

ApiGatewayDeployment20220301120000:
Type: AWS::ApiGateway::Deployment
...
ProdStage:
Type: AWS::ApiGateway::Stage
DeploymentId: !Ref ApiGatewayDeployment20220301120000

However, there is one thing to keep in mind. Once CloudFormation creates a new Deployment ApiGatewayDeployment20220301120000, it will delete the previous Deployment ApiGatewayDeployment since you removed it from the stack. During the rollback, a new ApiGatewayDeployment will be created with a new API snapshot at the moment, but the deploymentId would be not the same one. If you have any stack drift meantime, you won’t be rollback into the exact previous state.

The safest approach is retaining the previous deployment in the stack.

ApiGatewayDeployment:
Type: AWS::ApiGateway::Deployment
ApiGatewayDeployment20220301120000:
Type: AWS::ApiGateway::Deployment
...
ProdStage:
Type: AWS::ApiGateway::Stage
DeploymentId: !Ref ApiGatewayDeployment20220301120000

With this, ApiGatewayDeployment won’t be deleted. When the rollback happens, the stage will be updated to ApiGatewayDeployment.

SAM and CDK have an ability to generate a new logical id of Deployment on behalf of you. It will add a hash value based on the API definition (swagger) to ensure a new Deployment is created and update the stage to use the deploymentId. Be careful though, if a change was made outside of API definition, the hash value could remain same. If a new Deployment is not created, update the API definition (e.g. simply change a description) to generate a different hash value.

Issue: Your API throws 5XX during the deployment

Let’s go back to the previous example.

You’ve used GetProductV1:

GetProductV1:
Type: AWS::Lambda::Function
...
// and API Gateway Integration is pointing to GetProductV1

but changed to use GetProductV2.

GetProductV2:
Type: AWS::Lambda::Function
...
// and API Gateway Integration is pointing to GetProductV2

However, while you’re deploying change, your API is throwing 5XX but subsided after then. Everything on the CloudFormation looks fine. A new Deployment is generated correctly, and the stage was updated.

The change propagation into all API Gateway DataPlane hosts does not happen at the same time. Since it is not creating/updating your AWS resources, it is not managed within CloudFormation deployment lifecycle. For a certain moment, host1 got the change to use GetProductV2 while host2 still uses GetProductV1.

Since you removed GetProductV1 Lambda function from the stack, CloudFormation will delete it during clean up process while API Gateway is still propagating the change to its DataPlane. Since the function is gone, Lambda service will return 4XX, then API Gateway will return 500. Eventually once the propagation is done, 5XX will be gone.

So what should you do? You should retain the previous Lambda function in the stack.

GetProductV1:
Type: AWS::Lambda::Function
...
GetProductV2:
Type: AWS::Lambda::Function
...
// and API Gateway Integration is pointing to GetProductV2

Later, if you confirmed that GetProductV1 is not being called any more, then delete it in a next stack deployment. I used Lambda functions as example, but also applies to its corresponding resources (e.g. A role for API Gateway to invoke Lambda function, Lambda Alias, Lambda Version, etc.) or any other AWS resources that are referred from API Gateway.

AutoDeploy in HTTP API

AutoDeploy in HTTP API is another approach to solve the problem. When it is enabled on a stage, any updates to the API automatically trigger a new deployment to the stage.

Although it is convenient, there is a problem. When you create a Deployment resource or use SAM/CDK, it will create a Deployment after all other changes have been made to API. All changes will be reflected at once. In contrast, AutoDeploy may cause a partial delivery of the change. With this reason, it is unlikely for me to use it in a production environment, but YMMV.

How do I know what has been deployed?

Your boss is asking: “Hey, was the API change A deployed to prod (prod stage in prod stack)?”

Yeah, you know what’s current deploymentId in the prod stage, but how do you know what’s in the deployment? Unfortunately, there is no way to figure it out.

REST API’s GetDeployment only returns id, description, and createdDate:

aws apigateway get-deployment --rest-api-id abcdefghi7 --deployment-id 3gqycx
{
"id": "3gqycx",
"description": "RestApi deployment id: d8d1530bae148c7a4b7f48739131da8db0334c9f",
"createdDate": "2022-02-01T14:14:11-08:00"
}

though the spec was said to return apiSummary.

APIGatewayv2’s GetDeployment returns little more, though it still gives nothing about the API snapshot.

aws apigatewayv2 get-deployment --api-id abcdedfgh7 --deployment-id 38x8hn{
"AutoDeployed": true,
"CreatedDate": "2021-11-23T18:07:55+00:00",
"DeploymentId": "38x8hn",
"DeploymentStatus": "DEPLOYED",
"Description": "Automatic deployment triggered by changes to the Api configuration"
}

What if a deployment is created without deploying it to any stage? GetDeployment will still report the deployment status as “DEPLOYED” even though it was never been deployed to anywhere.

aws apigatewayv2 create-deployment --api-id abcdedfgh7
{
"AutoDeployed": false,
"CreatedDate": "2022-03-04T01:36:43+00:00",
"DeploymentId": "zj1q4g",
"DeploymentStatus": "DEPLOYED"
}

Because of this, the API change must be managed by Infrastructure as code (CloudFormation/CDK/etc.) where you can track the changes and deployment in a pipeline. Still if you don’t create a Deployment resource properly, it might not have been deployed yet.

Why does it work in API Gateway console?

What you’re referring is Console’s Test feature (aka Test invoke). Console’s test feature does not invoke against any stage, so it always uses the current snapshot of your API. Also Console’s test invoke doesn’t perform pre-processing of your method such as authorization and throttling.

How could it be better?

As we discussed earlier, it is easy to make a mistake if you don’t understand what it is and how it behaves along with CloudFormation. With this reason, many frameworks like SAM/CDK or AutoDeploy in HTTP is trying to encapsulate “Deployment” in API Gateway by handling it automatically. When there is an ambiguity, it is fair enough to encapsulate it. But how could it be better? Let me express some my personal ideas.

Creating a snapshot and deploying it to a stage are clearly different ones. It should have been split into 2 separate resources:

  • ApiSnapshot: An immutable snapshot of API
  • Deployment: Deploying (or publishing) a snapshot to a stage

ApiSnapshot is similar with creating a current Deployment without deploying it. By naming it as ApiSnapshot, it aligns what exactly it does underneath. However, ApiSnapshot will show what’s inside. You could easily compare differences between 2 snapshots (or stages where those snapshots deployed) to figure out what has been changed.

Deployment (I mean, new one) replaces the use case for UpdateStage to make the stage to point a new deployment id, or CreateDeployment with a stage name. Deploying an API snapshot is an asynchronous workflow to propagate it to API Gateway DataPlane hosts. Once a deployment is initiated, it should be possible to track the state or its progress. Then the state would be changed Deployed only after it is fully propagated.

Again, these are just my 2 cents.

--

--