Setup API Gateway Robots.txt With AWS CDK

Milan Gatyás
Life at Apollo Division
2 min readAug 26, 2021
Photo by Alex Knight on Unsplash

You probably found yourself in a situation where you need to control the bot crawling activity of your API, possibly disable it entirely. In such a case you need to set up the robots.txt resource on your API Gateway. The setup of the resource is fairly straightforward, however, there are few quirks that one needs to remember. This guide will help you to set up the resource fast and easily.

Note: The robots.txt resource needs to be defined on the root path of your API. If, for example, your API is having URL https://api.example.com, the robots.txt resource needs to be defined at https://api.example.com/robots.txt .

Note 2: If you have your API Gateway defined with the base path mapping of a custom domain, e.g. https://api.example.com/petstore (petstore is the base path mapping), you need to create the robots.txt resource on the API that is having base path mapping to the root of the custom domain, i.e. https://api.example.com .

Integration

An easy way to return the robots.txt content from API Gateway is by using the API Gateway mock integration. The definition of the integration can look like

The request mapping template is required to propagate the 200 status code to the mock endpoint. The response template for the 200 status code needs to be for the content type of text/plain. The response payload itself needs to be a valid robots.txt payload, the example demonstrates a denial of the API crawling for all user agents (explicitly for the bingbot - see the note).

Note: In the example above, there is a specific rule for the bingbot to disable the crawling of the API. After discussions with the Bing support team, I learned that the bingbot is ignoring the * user-agent rule and requires a specific rule for the bingbot user-agent. Maybe this information will come in handy!

Method Options

The method options definition can look like

Notice that the response model 200 content type is also text/plain. We can use the Model.EMPTY_MODEL constant as we do not need to define any response model.

Putting It Together

Having the mock integration and method response, we can create the actual robots.txt resource and its methods. It can look like

You need to define both GET and HEAD methods of the robotx.txt resource, as some crawlers might first execute the HEAD method to see the content length of the response payload.

Further Reading

https://docs.aws.amazon.com/apigateway/latest/developerguide/how-to-mock-integration.html
https://docs.aws.amazon.com/cdk/api/latest/docs/aws-apigateway-readme.html
https://developers.google.com/search/docs/advanced/robots/intro
https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods/HEAD

We are ACTUM Digital and this piece was written by Milan Gatyás, .NET Tech Lead of Apollo Division. Feel free to get in touch.

Originally published at https://milangatyas.com on August 25, 2021.

--

--