I got a refund from AWS after I DDoSed myself with CloudFront.

Ruslan Gainutdinov
6 min readJul 8, 2022

You might have seen my story about my costly mistake with AWS CloudFront:

I DDoSed myself using AWS CloudFront and Lambda@Edge and got a $4.5k bill

https://medium.com/@ruslanfg/i-ddosed-myself-using-aws-cloudfront-and-lambda-edge-and-got-a-4-5k-bill-3d3f57b64cbd

Finally, after two weeks of uncertainty, I got a refund!!!. It was for 24 hours of CloudFront usage caused by this mistake, reducing my bill by 95% — the most favorable resolution I might expect!

I am grateful for this one-time favor from AWS, and certainly I don’t expect it do the same next time.

It took me considerable time and effort to gather all documents needed for this case and wait for an actual refund afterward.

How to work with AWS on this

Read prior art and official docs. There are a lot of cases on the internet, and AWS has responsive support which will work with you to understand your situation.

A couple of tips

  • Acknowledge it was an honest mistake.
  • Do not wait for the bill to grow. Fix it ASAP.
  • Plan actions to avoid the same problem in the future.
  • Reply ASAP to AWS Support requests.

Lessons learned

AWS CloudFront has serious latency in how billing is accumulated from all regions to the global Billing dashboard. It also affects AWS Budgets alerts (which were set up in my case). Collecting all charges can take up to 24H, and in my case, costs were finalized only after 48H of the initial mistake.

Monitoring AWS CloudFront

In my case, most of the requests were in Frankfurt (eu-central-1) region, so I could react faster if I had setup AWS Cloudwatch alarms in that region for a number of simultaneous requests (which peaked at 9k/s)

Probably you need to use some tools (CDK, Terraform?) to setup such monitoring in ALL regions of CloudFront (15+)

Looking for a different solution

There were more issues when using NextJS and Lambda@Edge (cold start times, and logging are spread over multiple regions, limited support for NextJS 12), so I always treated using serverless-nextjs as a temporary solution until I could invest more resources into infrastructure.

But now, it is evident that I should migrate to a different solution, which will also solve existing problems and decrease the chance of the same situation happening again.

Moving forward

In the past, I have used various approaches, ranging from VPS and EC2 instances to AWS Elastic Beanstalk and AWS ECS Fargate.

The big thing about serverless is that there is essentially no infrastructure to manage, and you will not have a “surprise technical debt” of having to set up a new virtual machine.

Also, a part of the serverless approach is that there are no local storage requirements (everything is in object storage or in a queue).

Using VMs

You can use Terraform, Ansible, etc., to automate this setup. However, you either need to do it periodically, validating your configuration still works, or you need to integrate VM recreation into the deployment pipeline, increasing deployment time.

I prefer not to use bare VMs for application deployment. It is difficult to size (how much memory and CPU do I need to handle all requests and spikes in usage?); it needs regular maintenance, and it increases the attack surface and configuration area, which needs to be managed.

Some specific cases are justified — third-party software, dedicated instances for ML, or heavy batch processing.

But when you deploy 3–4 times a day, you need a robust solution that deploys quickly and allows you to roll back if something goes wrong.

Using AWS Elastic Beanstalk

Elastic Beanstalk is an older solution from AWS which manages VMs for you and setups Nginx as a load balancer inside VM with your choice of Node, Python, or Docker application configuration. Unfortunately, the configuration is complicated, with many different options and ways to set things up (via UI or config files). I would say it works well for specific use cases, but I found it slow to deploy and difficult to configure right.

Using Kubernetes

I am the only developer in the Valosan, so using Kubernetes (even managed solution) seems to be an overkill. Besides, all other services (DB, cache, etc.) are also managed, and using cluster only for an app feels like an over-engineering to me.

Using AWS ECS Fargate

I’ve used ECS in the past, but it was slow and complicated. I remember I didn’t like using AWS CodeCommit and AWS CodeDeploy, which were more or less required to use ECS because of their awful UI and how slow it was.

I liked the article from Bigpicture.io founder, How we cut our AWS bill by 70%, so I wanted to explore it again and see how it improved in a couple of years.

But which tool to use to deploy? There is a huge ton of options. Terraform, aws ecs, ecs-cli, copilot, CDK.

Well, I tried them all.

  • Terraform — It has excellent documentation, works well, and deploys quite fast because it skips using AWS CloudFormation. But, I think you need a bigger team to adopt it.
  • aws ecs — Using AWS CLI to deploy. It boils down to writing a huge bash script, which would be difficult to maintain in the future.
  • ecs-cli — It works well, but some configuration features were difficult to set up. Also, I learned it is sunsetting while I was trying to understand how I can set up container configuration options and healthcheck.
  • CoPilot — The documentation and the CLI are nice, but I didn’t like its limited configuration options and no way to customize things.
  • ECS plugin for Docker Compose — I briefly tried using this one before discarding it. The configuration options are even more limited, and Docker Compose documentation is hard to use.
  • CDK — I always wanted to try it out. So I decided to take it for a spin.

Using CDK

After trying all other options, I stumbled upon an example of setting up an ECS Fargate cluster and liked how simple but powerful it is. CDK has so-called ECS patterns that allow you to configure everything from DNS and HTTPS records to multiple containers in one go.

It is even one of the top recommended setups on the CDK developer guide.

See full script here: https://gist.github.com/huksley/746665004649c3ed3536fc0bd12650ec

What I liked about CDK

  • “Stacks” (for example single environment for the app) — are configurable as you can see fit. You can also override some of the configuration by plugging into the instances it creates.
  • You can use existing resources easily. For example, creating a new VPC for every environment does not make sense.
  • All logs end up in a single AWS CloudWatch log group.
  • Simple IAM setup to give access to deployment from GitHub actions. You just need to give a right to assume the role for CDK-generated roles. CDK also guards against so-called permissions broadening, so initial environment deployment is done locally, and after that, it deploys automatically.
  • Support for multiple environments (so you can have an isolated environment for a branch preview)
  • Deployment time stayed the same.

What was challenging

  • Surprisingly, setting up a container healthcheck is a hack — you can have a healthcheck for a load balancer, but not for a container. Issue tracked here.
  • It uses CloudFormation under the hood. It might be slow.
  • To scale, you need to change configuration and run deployment again.
  • Every environment incurs fixed costs because of the application load balancer working 24x7 plus Fargate containers cost.
  • Additionally layer caching for a multi-stage build is a bit of a mess.

What is done outside of CDK

  • Dockerfile, in my case, was a lot of work to create because of how NextJS builds frontend and backend by baking into the build output and all environment variables needed for it to run.
  • I have a two-stage setup — first, create a docker image and push it to AWS ECR and then use CDK to deploy it to ECS. Combining it requires a relatively simple bash script to deploy.
  • Monitoring via CloudWatch alarms.

Overall, I found that ECS integrates nicely with my NextJS stack, and I look forward to switch the production environment to it soon.

About the author: I am Co-Founder and CTO of Valosan, the media relationship app to help PR & Comms professionals get earned media visibility. Reach me out on Twitter or signup for Valosan at valosan.com/signup.

--

--

Ruslan Gainutdinov

CTO at startup, software architect, engineer. Building Valosan, PR CRM to manage your relationships with the media.