The Time We Sent Out 450K Reset Password Emails

Shivaraman Aiyer
Engineering @ Stan
Published in
2 min readMay 10, 2024

On April 6, 2023, we woke up to the #infrastructure Slack channel throwing an alert. It is not an ideal good morning message, but we take what we can.

On further investigation, we found that the Email Analytics SQS, a queue that is subscribed to all the email events SES triggers, was at the size of 450K messages, well over our usual 2K range. Based on that, we realized that about 450K emails had been sent through at a rate of nearly 80K per hour! Our usual rate, as noted earlier, is about 2K.

Not ideal, but we had to find the source of the problem and solve it ASAP.

Graph showing our SES sends at a rather 40 times higher than our usual flow
Graph showing our SES sends at rather 40 times higher than our usual flow

Diagnosis

In the early days of Stan, to make sure we were able to understand the delivery of our user's product was happening accurately, we had BCCed an internal email account to all outgoing emails.

When we found out about the above issue, our first instinct was to check this inbox to understand where these emails were going. To our surprise, all the emails seemed to be a Reset Password email sent to a fan(our customer’s customer) from their course access page.

After further checking Open Replay for the unique link the fan uses to access their course, we found nearly 160K records with a max duration of 10 seconds. Each of those new loads inevitably triggered an email. This happened because we had an auto-reset-password trigger for fans who hadn’t yet set a password to access their course.

The theory we had at the moment as to why we sent out so many emails is that the user had some glitch in their browser or a refresh extension that kept refreshing the page and triggering the emails.

Immediate Actions

We added the email ID to AWS’s Suppression List to prevent further emails from being sent.

Preventative Measures for the Future

We have implemented a rate limiter similar to those we’ve established for actions such as login and reset in the Admin app.

So here is the story of the time when we sent out 450k reset password emails. How would you have tackled this problem?

Until next time,

The Stangineering Team

--

--