Whatisthetime?? AWS Infrastructure Project implementation

Nikita Rawool
9 min readDec 18, 2023

--

WhatIsTheTime.com allows people to know what time it is.I know it sounds stupid, but at least it’s so easy that everyone understands it, and we’ll be able to talk about it at length.So we don’t need a database because it’s so simple.Each instance, each server knows what time it is, and we want to start small.We’re willing to accept downtime, but overall maybe our app will get more and more popular. People really wanna know the time around the world,and so we’ll need to scale vertically,and horizontally maybe removed downtime.

Let’s go throught the Solutions Architect journey for this app.You’ll see, we’ll see a lot of things on how we can proceed.So let’s start really simple.Okay, let’s start from the very beginning.You are a solutions architect and you sayyou know what would be great?You have a T2 micro instance and you have a user, and the user says, “What time it is?”, and say, “Okay, it’s 5:30 PM.”, done, this is my app.So we have a public EC2 instance, and because we want to make the EC2 instanceh ave a static IP address in case something happens, and which we restart it, then I will attach a Elastic IP address to it.So this is my first PoC, it’s working really great.

This architectural evolution demonstrates a gradual transition from a basic setup to a robust, highly available, and cost-efficient infrastructure. The consideration of user experience, traffic growth, and cost optimization are key factors in shaping the architecture.

Our users are able to access our application, and we’re getting great feedback.So now what’s happening is that our users are really having a good time using our application.So they said to their friends, “Hey, you should also use this application.”So another friend comes in and says, “What time is it?”And 7:30 PM and another friend comes in, What time is it?”, 6:30 PM and so we realize here that our application is getting more and more traffic, and certainly the T2 Micro instance isn’t enough. And so as a solution architect we say, wait a minute, maybe we should replace that T2 Micro instance by something a little bit bigger to handle the load, so that’s called vertical scaling. Maybe we’ll make it an M5 large type of instance. So what we do is that we stop the instance, we change the instance type, and then we start again the instance, and here we go. This is an M5 type of instance. So what happened here is that it has the same public IP because it has an elastic IP address, so people are still able to access our application, but we have experienced downtime while upgrading to an M5. And so our users were not really happy during that moment, they were not able to access our application. So this works, but this isn’t great, right?

Next we’re going really popular, and it’s time to scale horizontally. So we get, remember this application M5 has one public IP, elastic IP attached to it. And now we’re getting tons of users. And so they’re all asking what time is it? And so now we wanna scale horizontally. So we start adding EC2 instances, they’re all M5 large, and they all have an elastic IP attached to it. So now on top of having three EC2 instance, we have three elastic IP. And so our users, they need to be aware of the the exact values of these three elastic IP to talk to our instances. And so that’s called horizontal scaling.

We’re doing not bad, but we see we starting to reach some limits.

Now the users need to be aware of more and more IPs, and we have to manage more infrastructure, and it’s pretty tricky, right? So okay, let’s change the approach.

Now we have three EC2 instances, M5, and let’s remove Elastic IP because it’s something that we can’t really manage. There’s only five elastic IP per region per accounts by default, so it’s not a lot. And so instead our users, what they’re going to do is that they’re going to leverage Route 53.

So we’ve set up Route 53, and the website URL is api.whatisthetime.com.And we’ve decided it’s going to be an A record with a TTL of one hour. An A record means that from a DNS like this, it’s going to give me a list of IP. So remember, Route 53 A record is IP. So great, so the users query Route 53, and then they get the IP addresses of our EC2 instances, and they can change over time, it doesn’t really matter because Route 53 will get updated, we’ll update it and keep it in sync. And so our users are now able to access our EC2 instances, and we don’t have any Elastic IP to manage anymore. So using Route 53, we’ve done some good improvements.

But what happens is that now we want to be able to scale, you know, and be able to add and remove instances on the flight. And so when we do remove an instance, what happens? Well it seems like these users on the top, they were talking to this M5 large instance, but now it’s gone, and it turns out that if they do a Route 53 query, because the TTL was one hour, they’re using the same response for one hour. So for one hour they’ll try to connect to the instance and that instance is gone.

And so here it’s not really great because even though these users are having a good time, and maybe after one hour, these user will be able to connect to these two instances, they’re not having a good time right now ’cause they think that our application is down, and that’s really, really bad.

So, okay, so this is an architecture, and we see the limit of it. So how can we push this a little bit further?

So let’s talk about adding a load balancer. So instead now having,we don’t have a public instances anymore, we have private EC2 instances, and we’re going to launch them in the same availability zone because we don’t know any better. So we’ve launched them manually, we have three and five large instances, and we are following this course, and we said, okay, let’s use a load balancer. And you know what, on top of it, it’s going to have health checks such as if one instance is down or not working, at least we won’t send traffic from our users to the instance.

So okay, we’re linking the two together. So my ELB is going to be public facing, whereas my private instance EC2 instances are in the back,and so they restrict traffic between these two using maybe a security group rule that we’ve seen before using security group as a reference.

Okay, that sounds pretty good. So now our users, they’re going to query for whatisthetime.com, but this time, it cannot be A record because a load balancer has its IP changing all the time. And so instead, because it’s a load balancer,

we can use an alias record. And this alias record is perfect because it will point from Route 53 to the ELB, and everything will work really great.And so here we’ll change the DNS. But now the users connect to our load balancer, and our load balancers redirects us to our EC2 instances, and balances the traffic out. And it’s really great ’cause now, we can add and remove these instances, and register them with a load balancer, and we won’t have any downtime for our users

thanks to the health checks feature. So really, really good. But now adding and removing instances manually is pretty hard to do. So what about we just leveraged something ,we’ll launch an auto-scaling group. So now we have our API on the left hand side, it’s the same thing, Route 53, ELB. But on the right hand side now,

we’re gonna have an availability zone, and we’re going to launch private EC2 instances, but this time they’re going to be managed by an auto-scaling group. And so this allows our auto-scaling group to basically scale on demand, maybe in the morning, no one wants to know the time of it, yet night when people want to leave work, they want to know the time.

So we’re able to scale based on the demand, scale in and scale out.And this is really, really greatbecause now we have an application,no downtime, auto-scaling, load balanced.

It seems like a really stable architecture and it is,but what happens is that there’s an earthquake that happens and availability one goes down. So one goes down, and guess what?Our application is entirely down, our users are not happy.And so Amazon comes to us and says, “Yes, it’s because,you haven’t implemented a multi-AZ application, and we recommend you to implement multi-AZ to be highly available.”

So okay, we say, “All right,

let’s change a little bit the things.”

Now we’re gonna have to ELB, and on top of health checks, it’s also going to be multi-AZ, and it’s going to be launched on AZ 1 to 3.So three AZs for this ELB and our auto-scaling group as wellis going to span across multiple AZ,and this allows us maybe to have two instances in AZ one,two instances in AZ two, and one instance in AZ three.And so the cool thing now is that,oh great, like if AZ one goes down,

well, we’ll still have AZ two and AZ three to serve our traffic to our users,and we’ve effectively made our app multi-AZ, and highly available and resilient to failure.

Pretty awesome, right?

Okay, how far can we go with this, let’s keep on going.So we have two AZ and we know that at least one instance we’ll be running in each AZ,

so why don’t we reserve capacity?

Why don’t we start basicallydiminishing the cost of our applicationbecause we know that for sure two instancesmust be running at all time during the year.And so by reserving instance,maybe for the minimum capacity of our auto-scaling group, then we’re going to save a lot of cost in the future.Whereas the new instances that get launched, maybe they’re gonna be temporary, so on demand is fine. Or if we are a bit crazy, we can even use spot instances for less cost, but we might have the instances being terminated. And so it’s really interesting, right? Because we’ve seen an architecture going from a very small application all the way to a, you know, load balance, auto-scaling group, multi-AZ, health checks, reserved instances type of application.

it’s up to understand what are the requirements, and what should we architect in returns to these requirements,

and this is what the exam will test you.

let’s just review what we’ve discussed.

  1. Single EC2 Instance with Elastic IP:
  • Basic setup using a T2 micro instance.
  • Elastic IP assigned for a static IP address.
  • Users access the application through the public EC2 instance
  1. Vertical Scaling (M5 Large Instance):
  • As traffic grows, the instance type is upgraded for better performance.
  • Downtime occurs during the instance type change.
  1. Horizontal Scaling with Elastic IPs:
  • To handle increased traffic, multiple EC2 instances are added.
  • Each instance has its own Elastic IP.
  • Users need to be aware of multiple IP addresses.
  1. Route 53 Integration:
  • Elastic IPs are replaced with Route 53 for DNS management.
  • Users query Route 53 for the application’s IP addresses.
  • A records are used, but TTL poses challenges with instance changes.
  1. Load Balancer and Auto-Scaling:
  • A load balancer is introduced to distribute traffic among EC2 instances.
  • Health checks ensure instances are functional before directing traffic.
  • Auto-scaling group manages instances based on demand.
  • Users interact with the load balancer’s alias record in Route 53.
  1. Multi-AZ Architecture:
  • To enhance availability, the architecture spans multiple availability zones (AZs).
  • Load balancer and auto-scaling group operate across multiple AZs.
  • Users are directed to available instances in different AZs, reducing downtime during AZ failures.
  1. Reserved Instances for Cost Optimization:
  • Reserved instances are introduced to reduce costs for guaranteed minimum capacity.
  • Spot instances or on-demand instances complement reserved capacity for cost savings.
  • Capacity is reserved for at least two instances to ensure continuous availability.
  1. Well-Architected Framework Considerations:
  • Security groups are utilized to control traffic between the load balancer and EC2 instances.
  • Elastic IPs are phased out in favor of Route 53 for DNS management.
  • Auto-scaling groups improve maintenance and scalability.
  • Multi-AZ architecture enhances disaster recovery capabilities.
  • Reserved instances contribute to cost optimization.

Thanks and credit of this blog to Stephane Maarek

--

--