Moving SNS-SQS subscription

Using date condition operators to route messages

An interesting problem came up recently, where we needed to move an existing SNS-SQS subscription to a new queue in another AWS account. At first glance, the solution was easy — delete the old subscription and create a new one. The problem was that this topic was constantly in use, 3–4 messages going through every second, and the messages would’ve been dropped or duplicated during the switch.

Moving SNS-SQS subscription

One possible solution was to keep both subscriptions and de-duplicate messages in the app. That would require a new DynamoDB table and felt like too much effort.

At last, I found an interesting SQS functionality — date condition operators. They let you construct conditions that restrict access based on time.

And this is exactly what I need: with the policy you can define a switchover date and time. Before that time, no messages will be allowed to the new queue. After that time, no messages will go to the old queue.

I tried it out with the following CloudFormation template. For the sake of simplicity, both queues are in the same account.

This template creates the following resources:

  • Topic — the SNS topic. I will send test messages there
  • Queue1 — the “old” queue. This queue should stop receiving messages at some point
  • Queue1Policy — this policy allows Topic to send messages until 2019–02–10T07:59:59 UTC
  • Subscription1 — Defines SNS-SQS subscription for the “old” queue
  • Queue2 — the “new”queue. This is the target queue we are migrating to
  • Queue2Policy — this policy allows Topic to send messages starting at 2019–02–10T08:00:0 UTC
  • Subscription2 — SNS-SQS subscription for the “new queue”

You can deploy the template with the following command: aws cloudformation deploy --template-file sns-resubscription --stack-name test-sqs-subscriptions. Change the dates accordingly. To test it, you can go to the AWS console and send some messages to the newly created topic. At the given time the messages should stop going to the Queue1and go to Queue2 instead.

The tricky part is SNS re-deliveries. I know that SNS has quite aggressive retry policies for SQS. If the queue is not available, it tries over 100000 times for 23 days. But what happens when SQS policy denies delivery? Will it keep trying? I don’t want to see thousands of errors in CloudWatch for every single message.

Turns out, an access error is a final status. SNS will not attempt to deliver again. In this case, it’s what I need.

In the end, it worked out well. At the given time, the messages started going to the new queue. We deleted the old stack with the old queue and the switchover was complete. I did not observe any issues.