A vital AWS gotcha to be aware of when launching a product.
Here’s a tricky scenario that will test even the most seasoned AWS veteran’s skills. Even all with those 11 certs ;-)
Last week I wrote about the joys of Infrastructure of Code as I migrated the core of our AWS infrastructure from Sydney to the Frankfurt region in moments without any issues.
Until two days later that is! Our customers began reporting that they were no longer receiving 2FA verification codes on sign-up meaning our entire sign-up funnel was broken.
Hmm strange I thought. Why would Cognito be sending verification codes one day and not the other?
Ah, I forgot to request an SMS spending limit when I switched regions. That had to be the reason.
So I raised a spending limit increase in the eu-central-1 region and happily expected my issue to be resolved once fulfilled.
Nope. Still no verification codes. What the ****!?!
I reviewed the CloudFormation stack to see if there was some reference to the Sydney region somewhere. Nope. Nothing I could see.
AWS support asked for logs from the SNS console but I couldn’t find any in there. There was no problem any of us could find.
So I’ll give you a chance to pause and test your AWS skills here!
Why would a Cognito User Pool in the eu-central-1 region with plenty of SMS spending credit to spare suddenly stop sending messages?
.
.
.
.
The reason behind it gets into the real nuts and bolts of the Cognito SMS integration on AWS.
Cognito leverages the SNS service to send SMS text messages to customers. And while SNS is available in every region Cognito is supported, SMS is not actually supported in every one of these regions.
Cognito uses an internal mapping for certain regions which routes SMS requests to the closest region supported by Amazon SNS worldwide SMS delivery.
Which means a dictionary like this exists somewhere in the code and in my case my messages were actually being sent from the…. eu-west-1 region!
What had happened after a couple of days was our account had hit the $1 limit in eu-west-1 and our SMS limit increase in eu-central-1 was having no impact.
Once I sorted an SMS limit increase in the eu-west-1 region all was resolved. Unfortunately for me it was a few days later before I got to the bottom of the issue with AWS support and the problem set us back a bit with our early testers and some high profile early beta users in particular.
So keep an eye out for the gotcha when launching a product.
Learnings for me;
1) Do some load testing in new regions to verify they are ready for customer onboarding.
2) When priority issues span two AWS products raise parallel tickets with both product teams. I was certain this was an SNS issue and it wasn’t until I got in touch with the Cognito team that they could see the problem.
3) Research ways to attach external dependencies like support thresholds to my infrastructure as code.
4) Don’t feel bad when you have issues understanding the sprawl of AWS. Even their dedicated support team can’t be across all of it!
The real kicker with this issue is that eu-central-1 was recently added to the Amazon worldwide SMS delivery regions but Cognito have not yet updated the mappings. And you can see why.
As simple as it sounds you can now see that they will need to manage migrating support limits from eu-west-1 to eu-central-1 for every customer.
Yikes! You really do feel like how AWS handles the sprawl of services and the escalating complexity of integrations between them will dictate how their success in fighting off competition from up and comers in the cloud sector over the coming decade.
About me
An AWS Certified Solutions Architect Professional with a passion for accelerating organisations through Cloud and DevOps best practices.
If you want to work together contact me over on brianfoody.com, on LinkedIn or Twitter for a chat.
And don’t forget to follow the Lambda Lego publication!