Custom geo restriction of AWS CloudFront Distribution using Lambda@Edge

A while ago one of our clients inquired whether it was possible to block access to their web application for users from a certain US state. A typical infrastructure of applications we create at Rumble Fish involves CloudFront distribution, which serves compiled frontend files, usually built in React.

CloudFront offers geo restriction out of the box, however, for some reason it only allows to filter traffic based on country, which was way too broad in the case at hand.

In this article, we present our solution for that request. Our approach leverages Lambda@Edge to facilitate traffic filtering. In order to determine if the user connects from the US state that we want to filter out, we used publicly available list of IP ranges used by ISPs from that state. We’ve stored these ranges in DynamoDB and checked upon request if client’s IP matches the filter. If it does, the user gets redirected to a page explaining the geo restriction policy of our site. But let’s start with the basics.

What is Lambda@Edge?

Lambda@Edge is a feature in AWS Cloud Platform that allows to run short functions directly on Edge Locations which directly serve content to clients.

Lambda functions can be used to modify CloudFront requests and responses at the following points:

  • After CloudFront receives a request from a viewer (viewer request)
  • Before CloudFront forwards the request to the origin (origin request)
  • After CloudFront receives the response from the origin (origin response)
  • Before CloudFront forwards the response to the viewer (viewer response)

In our scenario of geo restriction we would like to check IP address of the client and compare it with the list of restricted addresses. So “viewer request” event is the best choice.

Implementing Lambda@Edge function in Serverless framework

The code of our solution can be found on our public company git profile. It’s done using Serverless framework which we love dearly. If you want to deploy it on your AWS account you can follow these simple steps:

git clone https://github.com/rumblefishdev/restrict-ips-lambda
cd restrict-ip-lambda
npm install
npx serverless deploy

The stack deploys a Lambda@Edge function in us-east-1 region.

In order for the CloudFront distribution to use it, modify your CloudFormation template in the following way:

CloudFront:
Type: AWS::CloudFront::Distribution
Properties:
DistributionConfig:
...
LambdaFunctionAssociations:
- EventType: viewer-request
LambdaFunctionARN:
Fn::ImportValue: RestrictIpLambdaFunctionQualifiedArn

Please note that the import will only work if your CloudFront distribution is also in us-east-1. If you usually use different AWS region, you will have to type in the ARN of Lambda Version directly to your template. Remember that Lambda@Edge functions are required to be deployed in us-east-1 region, because all other regions replicate from this one.

Loading list of restricted IPs

In our case we have quite few IP address ranges we would like to restrict access to. We’ve downloaded ranges for the US state that we needed to block from public lists. Since we want our restriction mechanism to inflict minimal delay, we’ve found out that the best choice is to convert all the ranges to the list of IP addresses.

This way the database query which happens on request time only needs to check single index to determine if address is blocked. As opposed to checking 2 indexes, if we were to store blocked ranges as beginning and end of range.

IP ranges for the US state we needed to block translated to about 8 million IP addresses, which is not a huge dataset for DynamoDB. We’ve chosen DynamoDB because it’s quite cheap (about $1,5 a month for our 8M entries) and we can scale it arbitrarly to accommodate the traffic.

IP table needs only one key: IP which we store as 32 byte integer.

Loading blocked IPs to DynamoDB

The repository comes with the Python script which does the job. Assuming you have a text file name ipc.csv where each row is an IP address to block, you can load it to DynamoDB using:

python3 import_ips_from_csv.py ips.csv

Anatomy of the Serverless app

serverless.yml

This example is a good occasion to highlight some nice features of Serverless framework. In this paragraph we will go through serverless.yml file and explain responsibility of each section.

Basic settings

Starting from the top of the file we find:

service: restrict-ip-lambda
package:
individually: true
plugins:
- "@silvermine/serverless-plugin-cloudfront-lambda-edge"
provider:
name: aws
region: us-east-1
runtime: nodejs8.10
stage: dev

As we’ve mentioned above, Lambda@Edge functions need to be deployed in us-east-1 region, so we hardcode this value in the file.

Also it’s worth to explain what @silvermine/serverless-plugin-cloudfront-lambda-edge is for. It's responsible for configuring the function's IAM permissions to work with Lambda@Edge. It adds"lambdaedge.aws.com" as Principal of the AWS::Iam::Role created by the framework and allows this role to replicate the function to Edge locations.

Custom priviledges of Lambda@Edge function

Following further down we find:

iamRoleStatements:
-
Effect: Allow
Action:
- ssm:GetParameter
Resource:
Fn::Join:
- ""
-
- "arn:aws:ssm"
- ":"
-
Ref: AWS::Region
- ":"
-
Ref: AWS::AccountId
- ":parameter"
-
Ref: RestricFlagParam
-
Effect: Allow
Action:
- dynamodb:GetItem
Resource:
- Fn::GetAtt: [RestrictIpTable, Arn]

Above block grants 2 IAM permissions that our function needs.

First one allows it to read secret settings that when passing in query string bypasses the restriction mechanism. The initial value of this parameter is set in config.dev.yml file.

Second IAM permission grants our Lambda Function an access to read from the DynamoDB table where we store restricted IP addresses.

Custom settings

Next we find this block:

custom:
stage: ${opt:stage, self:provider.stage}

We always have these lines in Serverless applications we create. ${self:custom.stage} should be taken from command line passed as--stage STAGE and is taking the default value from provider.stage where we set it to dev.

Following down we have:

config: ${file(./config.${self:custom.stage}.yml)}

This line loads the config file, which name includes the name of the stage. This allows us to deploy application to multiple environments with different settings. The app repository comes with the config file names config.dev.yml which looks as follows:

RestrictFlagPath: /develop/shared/restrict_flag
RestrictFlagPhrase: canIGetInPlease

Combination of above allows us to later refer to environment settings using for example: ${self:custom.config.RestrictFlagPath}.

Function definition

Below we can find this block:

functions:
restrictIp:
handler: restrictIp.handler
memorySize: 128
timeout: 3
package:
exclude:
- ./**
include:
- restrictIp.js

This part defines the only Lambda Function defined by the app. It specifies that its implementation is to be used in handler variable, exported from restrictIp.js file. The memorySize and timeout settings are mandatory for Lambda@Edge functions.

The package section specifies which files should be included in Lambda source.

Custom resources definitions

Finally we can move to a section which describes 2 resources that are used to control the behavior of our funciton.

resources:
Resources:
RestricFlagParam:
Type: AWS::SSM::Parameter
Properties:
Name: ${self:custom.config.RestrictFlagPath}
Description: 'Restrict flag to bypass filter by restricted customers'
Type: String
Value: ${self:custom.config.RestrictFlagPhrase}

This part defines a secret query parameter which allows the user to bypass the check (see the implementation in restrictIp.js for details.

RestrictIpTable:
Type: AWS::DynamoDB::Table
Properties:
AttributeDefinitions:
-
AttributeName: ip
AttributeType: N
KeySchema:
-
AttributeName: ip
KeyType: HASH
ProvisionedThroughput:
ReadCapacityUnits: 15
WriteCapacityUnits: 1
TableName: restrict-ip-table

This is a definition of DynamoDB table. We define one attribute ip of N(umber) type and create a HASH index on this field.

Lambda@Edge function implementation — restrictIp.js

Now let’s brake down the implementation of our filtering mechanism into pieces.

Starting from the top we have these:

'use strict'

const QS = require('querystring')
const AWS = require('aws-sdk')
const ssm = new AWS.SSM({ region: 'us-east-1' })
const dynamodb = new AWS.DynamoDB({ region: 'us-east-1' })

Our service uses SSM and DynamoDB services. At first sight a careful reader may wonder why we need to specify { region: 'us-east-1' } in service configuration. This is due to the fact that we are dealing with Lambda@Edge function which gets replicated to Edge location. When our distribution receives request from other location, it needs to know from which region it should request its services.

Implementation

exports.handler = (event, context, callback) => {
const request = event.Records[0].cf.request
const uri = request.uri
if (uri.match(/\./) && !uri.match(/index\.html$/)) {
// Only filter requests for html content.
// Pass through all the queries for media files (css, images, js, etc)
callback(null, request)
return
}
checkIpTable(request, callback)
}

const TableName = 'restrict-ip-table'

async function checkIpTable(request, callback) {
const ipnumber = ip2int(request.clientIp).toString()
let data
try {
data = await dynamodb.getItem({
Key: { ip: { N: ipnumber } },
TableName
}).promise()
} catch (err) {
console.error(err)
restrictedResponse(request, callback)
return
}
if (!data.hasOwnProperty('Item')) {
callback(null, request)
return
} else {
await checkRestrictFlag(request, callback)
}
}
}

The handler function is our entry point defined in serverless.yml. This function converts the IP address of the client to 32-bit integer and looks up to DynamoDB table.

If entry is found, it calls to checkRestrictFlag that looks as follows:

async function checkRestrictFlag(request, callback) {
const params = QS.parse(request.querystring)
if (!params.ipr) {
return restrictedResponse(request, callback)
}
const ssmParams = { Name: '/develop/shared/restrict_flag', WithDecryption: true }
let data
try {
data = await ssm.getParameter(ssmParams).promise()
} catch (err) {
console.error(err)
restrictedResponse(request, callback)
}
if (data.Parameter.Value !== params.ipr) {
restrictedResponse(request, callback)
} else {
callback(null, request)
}
}

This function checks the query string and compares the value of ipr parameter to the value stored in Paramater Store. If the value matches the filter mechanism, it lets the user through and blocks them otherwise.

If access for the user should be restricted, request to the origin will be modified as following:

function restrictedResponse(request, callback) {
request.uri = '/restricted.html'
callback(null, request)
}

This has an effect of returning to the client the content of restricted.html file instead of the path that was requested. We've made this page to present the information about restrictions to the user.

Debugging Lambda@Edge function

Ok, so now that we have function deployed and we’ve explained how it’s build, let’s see how we can debug it, assuming you’ve followed all the steps gthat the function is attached to a CloudFront distribution.

Every time it gets requested you will see some log output in CloudWatch. The function will run it in Edge location that is specific to the place where you make the request. Requests done from central Europe would get served from eu-central-1 region.

CloudFront will automatically replicate our function between regions and run it in Edge location. You should look for a Log Group named /aws/lambda/us-east-1.restrict-ip-lambda-dev-restrictIp in a region geographically closest to you.

Check that restriction mechanism works

Now you can add your own IP to the list of blocked IPs and check that the mechanism works as expected. You can do this from command line using:

curl https://ipecho.net/plain | python3 import_ips_from_csv.py -

Now make a request to your distribution. It shold render the content of restricted.html file (provided you've created one) or 503, if you haven't.

Now add ?ipr=canIGetInPlease at the end of your URL. You should see the original content again.


Summary

Takeaways from this article:

  • Using Geo Restriction as an example, we’ve demonstrated how Lamda@Edge can modify behavior of Cloud Front distribution.
  • Lambda@Edge functions can access resources in us-east-1 and access storage in DynamoDB table.
  • It’s most convinient to use Serverless framework to create Lambda functions, AWS resources and to tie them together.
  • We’ve demonstrated few tricks we typically use in our development work at Rumble Fish.