Implementing secure web sockets with AWS API Gateway, Cognito, DynamoDB and Lambda

Published in

The Startup

14 min readAug 27, 2019

First steps

On this article, we’ll be implementing a web socket using AWS products. You can use web sockets to allow near real time, bi-directional, full duplex communication between your services and clients. As usual, you can find all the source code on this Github repository for future reference.

We’ll use NodeJS as our programming language, but keep in mind AWS offers a set of SDKs for multiple languages, so you can adapt this solution to your own stack.

In order to manage, deploy and remove our stack from the AWS cloud, we’ll be using the serverless framework.

For those of you who have not got in touch with serverless, it’s aim is to ease the process of building and deploying applications on cloud providers. For our example, all you need is an AWS account and NodeJS installed on your computer; on these links you can check serverless’ getting started general guide and specific AWS configuration guide.

Important note: serverless recommends creating a service user with specific policies to avoid security issues — you’re not using your AWS admin account in case it happens to leak.
However, the policies provided on the official example do not provide access to AWS Cognito. For a working policy set please check the file here.

Last but not least, we’ll implement a custom client on NodeJS for out web socket. The idea is to be able to bootstrap multiple web socket clients and see how messages are successfully delivered to all of them.

AWS products used

While implementing the web socket solution, we’ll be using the following AWS resources:

AWS Lambda:

Lambdas allow us to implement services without bothering about provisioning or infrastructure, this means all we have to care about is our code. Another important feature is that we’re billed while our Lambdas are executed, in the case they’re sleeping their cost is zero.

However, Lambdas have a maximum execution time so it’s important not to implement long-lasting processes. Another important caveat is that, for certain languages, the “wake up” time is significantly higher than others, so warming your Lambdas up might be a thing depending on your use case.

Lambdas are billed by three parameters:

Memory: usually 128–256 MB should be more than enough for your Lambda processes.
Amount of requests: they’re free for up to a million monthly requests, at which point they cost 0.2$ for the next million requests.
Process time: this depends on the amount of chosen memory, but for a 128 MB lambda it’s free up to 3200000 seconds a month.

You can see further detail on the official AWS Lambda pricing site here.

Lambda free tier is still available for users who have already consumed their 12 months of AWS’ free tier.

AWS API Gateway:

API Gateway will allow us to expose our Lambdas to the world. We’ll only be billed by the received requests, with no base cost.

We’ll also be implementing our web socket using API Websocket, which is only billed by the time our sockets are active.

API Gateway / WebSocket are billed by three parameters:

Amount of API requests: the pricing is 3.5$ for the first 333 million requests a month, getting slightly cheaper for higher amounts.
Cache storage: while optional, we’ll also be billed for the amount of data stored as cache, starting at 0.02$ an hour for a cache up to 500 MB.
Web socket connection time: we’ll be billed 0.25$ for a million minutes of web socket connection time.
Amount of web socket messages: first thousand million messages will be charged for 1$, becoming slightly cheaper after said threshold.
It’s important to note that the maximum message size is 128kb, being split into fragments of 32kb — for example, a 33kb message is sent as two messages: 32kb + 1kb.

For further detail on AWS API Gateway / Websocket pricing you can check their pricing site here.

If we’re new users, we’ll be able to use most of these features for free for 12 months using the AWS’ free tier.

AWS DynamoDB:

Based on Cassandra’s white paper, this no-SQL database serves us as a KV or documents store with cheap and quick access.

We’ll be using DynamoDB to store relevant information about our web socket connections. We might argue this should be a feature included within the API WebSocket layer instead of pushing us into building this on our own, but it might be a future feature and as of today we have to handle it ourselves.

There’s two provisioning / billing modes, sort of speak:

On demand provisioning: we’re billed by the amount of write / read requests, allowing DynamoDB to scale writers and readers as required.
Static provisioning: we’re billed by the amount of provisioned writers and readers

Again, for further detail you can check AWS DynamoDB’s pricing site here.

AWS Cognito:

Cognito provides us with tooling to handle user pools. We can integrate Cognito with social network login providers (Google, Facebook and the like) and anything that supports SAML.

In our case, we’ll be creating a simple user pool on which we can create users which will be able to auth themselves in order to access our web sockets.

There’s several parameters taken into account of Cognito billing, the most relevant being:

Monthly Active Users (MAU): users which use any authentication, password change or token access feature. Free up to 50000 MAU.
SAML/OIDC logins: users which use any authentication through SAML or OIDC. Free up to 40 MAU.

For further detail on AWS Cognito pricing, check it’s pricing site here.

Project structure

Before diving into the solution itself, we’ll take a quick look at the project structure.

Keep in mind, these are personal preferences and I’d be keen on hearing your opinions or how you structure your own projects down on the comments!

My serverless projects are usually structured as follows:

**Figure 1**: serverless project structure.

serverless.yml: serverless stack definition file.
handler.js: entry point used to aggregate all Lambda controller functions. Used to simplify handler paths at severless.yaml.
src: folder which contains all the source code.
constants: JavaScript file with project constants, usually loaded from environment.
controllers: JavaScript Lambda controller functions. May also contain private helper functions which are used to process request or response data.
connectors: JavaScript AWS SDK wrappers which abstract away the SDK, providing functional features.
For example, AWS SDK allows you to query DynamoDB, the wrapper function allows you to query DynamoDB for active socket connections.

I always use yarn on my NodeJS projects because I found it’s cache to be much better than npm’s.

It is also noteworthy to point out that I always use the name attribute on serverless’ lambda definitions in order to keep my AWS account organized. Feel free to remove the custom lambda naming in case you’re not comfortable with it.

How do we implement a web socket using AWS technologies?

Now that we’ve taken a first glance at the products we’ll be using, it’s time to take a look at how we’ll be combining them to implement our web socket.

While AWS Lambda provides a cheap, scalable way to implement our REST API services, they’re not designed for near real time scenarios on which a client remains connected for an undefined amount of time. To leverage this, web socket wss connections are handled by AWS API WebSocket at the API Gateway layer, which then forwards requests to Lambdas depending on the received request path.

There’s four types of routes we can define at AWS API WebSocket:

$connect: this route is triggered every time a new socket client is registered.
Whenever a new client connects to our socket, API Gateway will generate an unique client id we can use to track it’s status and to send messages to said client. We should store said client ids in order to be able to communicate with them.
$disconnect: this route is triggered when a socket client disconnects. However, AWS doesn’t guarantee this path will be invoked on every disconnection — some stale connections do not trigger this event, so we’ll have to check for stale connections in our code; do not worry though, it’s really simple!
$default: this route is triggered when the received client connection doesn’t fit any of the connect, disconnect or custom routes.
Custom: custom routes we’ll implement with our own logic.

For each of these routes we’ll be implementing AWS Lambdas with our connection management and data management logic.

These routes cover the client-to-server side of things, but the server-to-client is still to be handled. In order to handle this kind of communication, API Gateway exposes an endpoint we’ll be querying using AWS SDK which allows us to submit data to clients by their client id. Now you can see why it’s really important to track our active clients.

So, summarizing, we’ll need be implementing the following:

Lambda to register new clients.
Lambda to remove exiting clients.
Default web socket path lambda — we’ll make it a ping / pong simple example.
A greeting lambda which greets all the connected clients on request.

We’ll also implement a series of connectors to keep our code as tidy and organized as possible.

The overall solution architecture can be seen on this diagram:

**Figure 2**: AWS overall architecture diagram

Connectors

As mentioned previously, a set of connectors are provided within the example and, while they’re out of the scope of this article, we’ll put a few minutes in listing them and their purpose.

The purpose of these connectors is to abstract away the AWS SDK from our APIs. This can be handy in the case we wanted to support multiple cloud providers, we could create a connector interface and have implementations for each desired cloud provider.

You can find the following connectors on this example:

API Gateway connector: exposes functions to access the API Gateway API. In our example, we need to access the endpoint exposed to forward responses from our lambda services to the web socket connections.
Cognito connector: exposes functions to authenticate users given their credentials, allowing the generation or refreshment of JWT tokens.
DynamoDB connector: exposes functions with prebuilt queries to populate or query the DynamoDB tables.

You can find all the source code for these connectors under the src/connector folder.

Tracking web socket connections and default response

We’ll be begin tracking our connected clients by implementing lambda listeners to hook to the $connect and $disconnect API WebSocket events.

New web socket connection controller

Web socket disconnection controller

The connection handler simply uses AWS NodeJS SDK to create a new entry on our DynamoDB table. On the other hand, our disconnection handler removes the given connection id from said table. As we previously noted, some connections may get stale and they won’t trigged the disconnection event, we’ll see how to deal with those later on.

It is very possible your application is delivering different events to their clients and, in order to KISS, your don’t want every client receiving every possible event but only those they want/need to be aware of.

In order to achieve this, we’ll be using our web socket connections table at DynamoDB to add which events each socket is registered to.

We’ll also need to implement a global secondary index on our table in order to allow filtering by the type column. In case you want to read more about DynamoDB’s GSIs, heres a link to the official documentation.

Once we’ve done this, we can retrieve connections which are registered for a certain event type easily, allowing us to only notify clients that have actually registered for that type of event.

You can think of “events” as something like “channels”.

Our default response lambda will simply answer PONG to incoming messages, generating an error response for any other type of unknown event.

Default route controller. Used to ping/pong or return an unsupported message error.

Our serverless definition file defines our lambda event handlers and our DynamoDB connection data table, including a Global Secondary Index to be able to filter by socket connection event.

It’s also noteworthy that we define the Websockets API name and the route selection expression on this file. We’ll be using the later to route our messages to specific lambdas.

serverless definition file fragment which defines our API GW handlers and DynamoDB session control table

Securing web socket connections

Up to this point we’ve implemented a very basic system to register and unregister connections while also providing a default PONG response.

Anyhow, we’re far from over as basically anyone with our web socket address can connect to it and emit messages.

We’ll put some time on using a Cognito user pool to secure our web socket so only authenticated users can connect.

serverless definition file fragment which shows how to configure an AWS Cognito user pool and creates functions which use it to authenticate our users

Once we’ve added our Cognito user pool configuration to serverless’ definition, we can add a user of choice we’ll be using to test.

**Figure 3**: Cognito’s user management screen

Now, we will implement a lambda users can use to authenticate against and obtain a JWT token they can use as identity.

User authentication controller. Generates a JWT as identity if provided credentials are valid.

This generated JWT token has to be delivered as a parameter of the web socket connection URL so our authenticator can validate it and let the user connect or reject the connection request.

By default, Cognito users have to validate their password. For the scope of this demo, I implemented a helper function on the Cognito connector that will automatically validate the first set password as the user password.

This is not production ready and should be managed by the user; Cognito offers a pre-built website on which users can authenticate themselves and manage their credentials, but this feature is out of the scope of this article.

Last but not least, we’ll add a validation function to our web socket so that the provided JWT token is proven as valid or connections are rejected.

User JWT validation on new web socket connection request.

Custom functions are the only supported mechanism to authenticate web sockets as of the date of writing this article.

This validation function can only be attached to the $connect event, once a client has connected, it can access any event unless we manually code exclusions somehow.

The authentication flow is as follows:

User requests a JWT token using his AWS Cognito credentials.
API GW forwards the request to the user authentication lambda, which validates user credentials using Cognito’s user pool.
If the user credentials are valid, both a JWT identity token and a JWT refresh token are generated and sent back to the client. If credentials are invalid, an error 401 is sent back to the client.
User requests a new web socket connection creation, providing the JWT identity token generated at step one.
API GW forwards the request to the token validation lambda, which validates the JWT identity token.
If the token is valid, the $connect handler is invoked, and the socket connection identifier is stored on the DynamoDB table. If the token is invalid, an error 401/403 is sent back to the client.

**Figure 4**: authentication flows. Obtaining the JWT token and using it to connect to the web socket server.

Messaging all active clients

We’ve now secured our web socket connections, while also managing existing connections and removing possible stale ones. Our last milestone is to be able to forward a message to all registered connections.

In order to achieve this, we’ll be adding a new API WebSocket route to our serverless.yml definition file:

Greeting path definition on serverless.yml

We’ll also need to implement a new lambda handler for the new route; this lambda will use the AWS NodeJS SDK to retrieve existing connections and send them the received message.

Greeting controller. Receives a message via socket request and forwards it to all subscribed sockets.

There’s one important aspect we should take care of at this point: remember the stale connections which did not trigger the disconnection event? Since we’re messaging connections stored at our database, we might run into scenarios where we deliver a message to a disconnected consumer id. If this happens, the API WebSocket response endpoint will cause an exception we can handle and remove the stale socket connection as displayed below:

Fragment of code showing how to remove stale connections from active connections table.

Now we are ready to connect from multiple terminals and see how greeting messages are forwarded to all registered clients, but we need to deploy our services to AWS first!

Deploying the stack on AWS

Once we’re done defining and implementing our Lambda handlers, we can deploy them to AWS using serverless CLI by executing the following command:

$ serverless deploy -v

This command will deploy the stack defined at our serverless.yml file using AWS CloudFormation. It is important to note down the API WebSocket path so we can connect to it.

In case we’ve only changed certain handlers, we can deploy exclusively using the following command:

$ serverless deploy function -f <handlerName> -v

Finally, if you want to remove your entire stack from the cloud, it’s as simple as running this command:

$ serverless remove -v

Important note: The very first time you deploy your stack, you’ll be prompted with the API Gateway paths created, as well as the arns of the created resources. Make sure to write down the web socket server host address and the auth endpoint, because our client will need to know their location in order to connect to them.

Implementing a web socket example client

Last but not least, we’ll implement our web socket NodeJS client. As stated on the introduction, we’ll be using this one to validate access and communication with the web socket server.

Our client will support the following features:

Allow us to identify ourselves — set a client custom/logical identifier.
Internally, API Gateway WebSocket will generate a hash for our connectionId, but we want to be able to name our connections.
Allow us to defined which event we are subscribing to.
Allow us to define whether we’ll be publishing messages or just listening.
As of time of implementation, and since this is just a demo, messages will be scheduled aiming the entire event every 2 seconds.

On the other hand, we must provide our client with the following information:

AUTH_ENDPOINT: lambda service endpoint used to create the JWT token to be used as identity.
WS_HOST: API Gateway WebSocket endpoint which exposes the web socket server.
USERNAME: username created at AWS Cognito. It’ll be used in order to generate the JWT token used to validate the user identity.
PASSWORD: user’s password as set at AWS Cognito.

In order to safely store and use these variables, our client will use dotenv. This module populates the process.env object with data retrieves from a local .env file; this way we can easily store our credentials on our file system and use them on our process without exposing them publicly.

Web socket client snippet. The full working client can be found at the repository.

You can run our custom client as shown below:

# Listener client$ node index.js first-greeter greetingGenerated token for user <username>Connected!Keeping alive# Producer client$ node index.js second-greeter greeting trueGenerated token for user <username>Connected!Keeping alive

Here we can see how multiple listeners can receive information from a producer via our web socket.

**Figure 5**: multiple clients publishing and receiving messages throught the web socket

Last but not least, if we close any listener process, our web socket implementation will handle the stale connection as shown on the logs below:

Figure 6: Stale connection removal on message attempt

You can find the implemented client on the client folder within the Github repository.

Conclusions

Implementing web sockets to achieve near real time, bi-directional communication on AWS can be a tricky task if you’re attempting it for the first time, but once you grow accustomed to techniques like handling connections and effectively using consumer ids paired with event types to filter responses to clients it becomes much more trivial.

I hope this article helps you understand the magic behind web sockets on AWS and how we can secure them — we could even take Cognito off the equation and use a custom users DynamoDB table or any of the like. Feel free to submit any questions or improvements 🙂

While web sockets are standard and any client should work perfectly, I’ve had problems myself when trying to use socket.io to connect to AWS API WebSocket. I’m quite sure said issues had something to do with how socket.io client and server sides interact with each other by exchanging certain messages which may not be generated on the AWS side of things. I’ll be happy to listen to your experiences on the comments!

On a future article, we’ll be implementing an Angular frontend application which allows users to authenticate and connect to this web socket.