The tech behind the under45.in vaccination alerts project

Berty Thomas
6 min readAug 20, 2022

--

At the very outset, let me put this disclaimer. If you are looking for some breakthrough piece of engineering, I think you may get disappointed. If you want to understand the thought process behind a hacky piece of code that proved extremely useful for millions of people, read on.

I know it has been more than a year since I wrote the last article on the under45project where I mentioned that I would reveal the tech stack behind it. Apologies for the delay.

During the peak of the Covid second wave in India, we saw the start of vaccination for the 18–45 age group. With the supplies limited and the demand going through the roof, everyone was trying to grab a vaccination slot on the CoWIN platform. The under45 project intended to alert anyone who subscribed to the relevant Telegram channel as and when a vaccination slot was available.

Things got difficult once the Govt. introduced rate limits to the CoWIN API. The limit introduced was 100 API calls per 5 minutes per IP. To make matters worse, the CalendarByDistrict API, which provides 7day slot data was responding with data that was 5min behind, rendering it useless. So the only API that we could use was the FindByDistrict API, which gives data only for the specific day. This means that we had to call this API 7 times to get a week’s worth of slot data which the CalendarByDistrict API was providing, hence magnifying our problem 7 times! So this article will only talk about the architecture that was used after those rate limits were put in place (it was too easy prior to those limits).

Can’t we use proxies?

Of course, this is the first thought that comes to any programmer’s mind when it comes to IP based rate limits. However, there were a few issues:

  1. CoWIN restricts API calls to India based IPs, so only India based proxies can be used.
  2. Then there is a cost associated with the use of proxies. Remember, this was a hobby project and a self funded one at that!
  3. Latency! This was the biggest issue. Every API call via a proxy takes much longer to complete than a direct call. And when we are talking about searching slots in 700+ districts, this becomes a deal breaker!

Increase the server instances?

Technically, adding EC2 server instances or reassiging Elastic IPs were possible solutions, but when you consider the earlier mentioned rate limits and the volume of API calls to be made for an effective pan India vaccination notifier, anyone with experience in AWS would understand, the costs would quickly get out of hand! So that was ruled out.

God bless the AWS Lambda

The AWS Lambda is a serverless stack. Each Lambda instance will use a particular IP for execution. So once we set the AWS region to Mumbai, we suddenly had a large, workable IP pool with amazing performance (unlike the proxies)!

Each of the Lambdas were run on NodeJS (because other than PHP, that’s what I know!). The input params were the district ID, the from and to dates. The Lambda would then call the FindByDistrict API for each of those dates, concurrently and then stitch the results together in the response.

Districts and their refresh frequency

Based on the experience from the previous 2 weeks, we understood that it wasn’t prudent to consider all the districts the same. Some city districts, required very high refresh rates, so that we can alert as soon as a slot was available. These are generally metro city areas like Bengaluru (BBMP), Delhi, Mumbai, Pune, etc. We had identified 25 such districts. These districts were checked every 10sec. All other districts were refreshed once every 30sec.

The tech stack

Now, let’s get to the beautiful part and talk about how it was done.

There were 2 EC2 instances that act as the main controllers. These 2 instances did the exact same job and fetch the slots for the same districts (via the Lambdas) albeit a few seconds apart. This was done so that in case one EC2 instance went down (ohh that happens!), the slot refresh does not stop. Only the frequency gets affected. So in such a case, instead of refreshing every 10s, it refreshes every 20s or if it was refreshing every 30s, it will refresh every minute (via the other EC2 instance controller).

How many Lambdas?

Well, a lot!

There were a thousand Lambdas in total (all created programatically, of course!). These were allocated into 3 Lambda pools.

  1. A set of 400 Lambdas were allocated for those 25 districts which required high refresh rate.
  2. A set of 300 Lambdas were allocated to Controller 1 (EC2 instance #1).
  3. A set of 300 Lambdas were allocated to Controller 2 (EC2 instance #2).

Each of these Lambdas were like a stack of cards. The API calls were routed via the Lambda on the top of the stack. The moment we get a rate limit error from CoWIN, that Lambda is sent to the bottom of the stack. The API calls are then routed via the next Lambda that is on top of the stack. And this is repeated.

The implementation was simple. Each of these Lambda IDs were a MySQL table for the respective 3 pools (Pool 1, 2, 3 as mentioned earlier). The table had just 2 columns: LambdaID and last_used. As soon as the rate limit is hit, the last_used is updated with the current timestamp. So once we fetch the table ordered by the last_used column, the first row is the Lambda that is to be used for API routing.

The database

This was a simple AWS RDS instance that worked independently of these instances. This instance had details of the Lambda pool, the Districts and the Telegram channel IDs associated with each of them. So if I receive a feedback from any user via Twitter saying, “hey, its increasingly getting difficult to find a slot in District XYZ”, all I need to do is adjust the scraping frequency in the MySQL table. It was that easy!

Scheduled restart “Jugaad”

If I have to pick one Jugaad that was deployed in the under45 project, this would be the one! For reasons unknown to me, after running for hours, the EC2 instances freeze. The fix was to just restart the instance for it to start working again. But then, it could happen anytime. So this constant monitoring was irking me. I remember the time when the worst case scenario happened. I was out to buy some groceries. I got a notification that EC2 controller instance #1 went down. I continued my shopping thinking that it will only slow down the slot refreshes and I can always come back and restart that instance to bring it back to the normal refresh frequency. Then in another 15min, the EC2 controller instance #2 also went down. Now the slot refresh had completely stopped. I started getting tweets from users saying they are not getting notified and I had to rush back to my laptop to restart the servers!

This prompted me to write a Jugaad code which would restart the two controller instances (never at the same time) when it has been up for 2hrs or more. While this happens, a third EC2 instance will automatically boot up and take the work of the controller instance that is going to be restarted. Once the restart is complete, the backup instance shuts down. Thus it no longer required me to do any kind of manual interventions or monitoring after this!

The cost

Many subscribers reached out to me saying they would like to contribute towards the server costs. I turned them down easily. Given that the only AWS usage for me were 1000 Lambdas (they are billed only per usage, which was low. Remember, at a time, only 1 lambda is used. The remaining ones are in the pool, un-billed), 3 EC2 instances, 1 RDS instance, it wasn’t expensive at all. In fact, my entire bill for this period was just around Rs.4,000 ($50)! Thank you AWS & Telegram!

--

--