Make your website SEO friendly with CloudFront and Lambda@Edge

Maria Zacharia
Remote Serverless
Published in
6 min readJul 24, 2021

Enabling SEO in single-page applications or the so-called SPAs has always been a hurdle. This is because in a SPA, it’s JavaScript that adds content to the browser container, and most of the search engines we know, still struggle to crawl asynchronously loaded content.

This article is all about taking the first step to making your single-page application SEO friendly and how you can have these pretty previews when shared on apps like Facebook, WhatsApp, Twitter, Medium, etc. in the easiest and trouble-free way. At least that's how we found the solution to be.

Problem Statement

It is just a nice and easy game if you want some static information to be shown on the preview just like the example I have shared above.

But what if you want to show different information based on a dynamic route parameter. For instance, if I share my profile here on Medium, it shows the information relevant to me — my name as the title, my bio as the description, and my profile picture as the image. How can we achieve this?

Our FAILED Solution

Yeah, you guessed it right. Server-Side Rendering can do the job. That's what I also pursued initially. Since our frontend was based out of Angular, I went ahead with Angular Universal. As many of you who have gone through the same exercise could relate to, it is not a simple one-step process. For the initial setup alone, there were a ton of issues to be resolved and in order to make sure that the user-specific data is injected on the header at the server-side, we had to make use of something called a route-resolver.

The next step was getting a server — an EC-2 instance since we are using AWS as our cloud provider. But that kind of defied our initial plan — we didn't want to get into the trouble of managing and scaling the servers by ourselves. That’s the main reason why we are completely Serverless.

Instead of getting an EC-2 instance, we used Lambda and CloudFront to do the trick. But as you might have guessed already, that was a bad idea. The cold start issue of Lambda almost killed our performance.

I know I summarized a lot of concepts in a few sentences, but let me know if you want more details on that approach too.

We did a lot of research looking for solutions to the above problem along with alternative approaches to SSR. At the end of it, we had two options — CloudFront Functions and Lambda@Edge. Unfortunately, we couldn’t go ahead with CloudFront Functions due to its maximum execution time limit of 1ms.

CloudFront functions is a powerful feature which was introduced recently by AWS. Refer to the official doc , if you wish to know more about it.

The Game Changer — Lambda@Edge with Cloudfront Triggers

If you are wondering what makes Lambda@Edge different from the ordinary Lambda functions, this is it. Lambda@Edge is a feature of Amazon CloudFront that lets you run code closer to users of your application, which improves performance and reduces latency. With Lambda@Edge, you don’t have to provision or manage infrastructure in multiple locations around the world, making your web applications globally distributed with improved performance. Again, all with zero server administration.

Shown below is the basic architectural diagram of our solution.

Architectural Diagram — Lambda@Edge and CloudFront for SEO

Looks complicated? Let me simplify things up for you.

  • When the user hits www.hiretheauthor.com/maria on his web browser, the request ultimately proceeds to Amazon CloudFront. If there is an existing entry corresponding to the requested URL on the CloudFront Cache, it is returned as-is. Else, CloudFront raises a request to its origin, which is the S3 bucket, in this case, to fetch maria.html. This leads to an error since maria.html is not present in S3. So it returns the error document — index.html. This is the default flow.
  • AWS allows us to run Lambda@Edge functions in response to the following four CloudFront events:
    Viewer Request: After CloudFront receives a request from a viewer
    Origin Request: Before CloudFront forwards the request to the origin
    Origin Response: After CloudFront receives the response from the origin
    Viewer Response: Before CloudFront forwards the response to the viewer
  • We hacked into the default flow by adding a Lambda@Edge function on the Origin Request event. Why Origin Request event? Because running a Lambda function is a costly operation (performance-wise) and we don't want it to get executed every time a user hits our website. Plus, we need to take maximum advantage of the CloudFront Cache.

What does the Lambda do?

The job of the function can be abstracted into four main tasks:

  • Query the S3 bucket to fetch the index.html.
  • Query the DB to fetch details corresponding to the user with username ‘maria’.
  • Update the og tags within the index.html with the information fetched from the DB.
  • Return the updated index.html.

Well, it's not just the above steps that we did. This lambda is the entry point to our application. So we had to take extra steps to make sure it is robust and performant. Let me put down a few of them here.

  • DynamoDB Global Tables: In order to make the DB call as fast as possible, we had to replicate our Users Table in 5 different regions namely, us-east-2, ap-south-1, ap-southeast-2, eu-central-1, sa-east-1. A local hashmap was manually created to map the other regions where there was no DB deployed to their closest regions where the DB was available. This made sure that the Lambda always fetches the data from the closest replica of the Users Table.
  • Reusing existing connections to S3 and DynamoDB: In order to eliminate the time taken for creating new connections, we made use of the keepAlive property of the HTTPS agent as shown below.
const agent = new https.Agent({
keepAlive: true
});
const dynamodb = new AWS.DynamoDB({
httpOptions: {
agent
}
});
  • Using global variables: We store the value of index.html fetched from S3 in a global variable within the Lambda. The data from DynamoDB is also saved in a global hashmap. This makes sure that the data stays cached as long as the container in which the Lambda runs is active. We only make a call to S3/DynamoDB, if the global variables do not hold the respective data.
  • Running the code only for Bots: When you actually think about it, we only need this piece of code to execute when the request is made by a bot or a crawler. So we added a check-if-bot logic. Thus, if the requester is not a bot but we have the index.html cached within the Lambda, we return it right away. Otherwise, the request is forwarded to S3.
  • Parallel Asynchronous Calls: The calls to S3 and DynamoDB are run in parallel in order to save time and then handled later using the Promise.all() statement.
  • Try-Catch block: As I mentioned earlier, this function is the entry point to our app and it should never fail. So we added a try-catch block on the top level to handle all the scenarios which could cause it to fail. As a result, whenever a failure happens on the Lambda, it would basically ask S3 to handle the request instead.

Phew! That was a lot to comprehend. Just as hard it was for us to figure this out first. So let's get to the million-dollar question — How long will it take for the page-load?

Page Load Performance

Total time taken by the Lambda in the worst-case scenario can be represented as,

Lambda Cold-Start time + max (DynamoDB Query Response Time+ S3 Query Response Time)

Lambda cold-start time usually takes in the order of 200–300ms. Since S3 and DynamoDB calls are running in parallel, we only have to consider the maximum of the time taken by both. In practice, this amounts to around 100–200ms.

That’s not so impressive, is it? Let me get to the numbers straight.

This is what we observed with the above solution. In the worst case — with Lambda cold-start and no caching layers — it takes around 500ms on average. But 90% of the time, the load-time drops to under 90ms! Thanks to all the tweaks we have made in the form of local caches, DynamoDB global tables, and of course the super-powerful caching layer in CloudFront.

That’s pretty much everything I wanted to cover. If you found my content to be useful, do show your support by smashing the clap button. I would be happy to attend to any of your queries via Hire the Author, where you would find the easiest way to book a 1:1 call/chat session with me.

--

--

Maria Zacharia
Remote Serverless

Co-Founder & CTO at Hire the Author. Want to have a 1–1 with me? Reach out to me at https://hiretheauthor.com/maria