What I’ve learned working with Amazon SQS

A brief summary


First published on my blog: litt.no

Keep your retention high (“Message Retention Period”)

 —————------     --——-     ————————--------     —————-----
| web front | → | SQS | → | dispatcher app | → | database |
—————------ ——--- ————————-------- —————-----

Shit happens and retention is your friend. If your dispatcher application stops working, the SQS queue will keep all you messages until you’re back online. You don’t need to worry about loosing any data while you’re trying to fix the problem.

Oh, what about the web fronts? If they are down nothing is working and you are not loosing any data ;)

Use correct “Default Visibility Timeout”

If this setting is set too low, duplicates can occur. Visibility timeout is the time from where you fetch the message, do nothing with it and it becomes visible in the queue again. In other words if your application fetches messages and then crash all messages will become visible in the queue again after a certain amount of time.

Use batch to insert and fetch

Amazon SQS is a service where you pay for each request and the bandwidth you use. To save cost you should try to send messages as batches of 10 messages.

Cost example; 100 mill messages (2KB each) sent and fetched:

Single messages
Request cost $100
Bandwidth cost $45.78
Total $145.78

Batch of 10
Request cost $10
Bandwidth cost $45.78
Total$55.78

A potential cost saving of ~ 60%.

Max number of messages in flight

Messages in flight are messages your application is currently working on. Inside SQS there is a limit of max 120 000 messages in flight at the same time. You have to keep this in mind if you’re working with large queues. I have experienced strange errors due to exceeding this limit.

Scales well horizontally, but remeber duplicates

SQS queues seems to scale well as long as you give them time to heat up. They’re probably using auto scaling groups behind the scenes.

If you have multiple servers working against the same queue you have to handle duplicates. They will occur due to race conditions. To handle duplicates we’re using the key value cache called Redis.

 —————------     ——---     ————————--------     —————-----
| web front | → | SQS | → | dispatcher app | → | database |
—————------ ——--- ————————-------- —————-----
^
|
v
———----
| Redis |
———----

At the moment we are inserting approx 200 message/second (17.3 mill/day). To dispatch data we have from 1–2 data dispatchers for each queue depending on the growth rate.

Cost examples

Cost example; 518.4 mill messages/month (17.3 mill/day and 2KB each) sent and fetched:

Single messages
Request cost $ 518,40
Bandwidth cost $ 237,30
Total $ 755,70

Batch of 10
Request cost $ 51,84
Bandwidth cost $ 237,30
Total $ 289,14

Pros

  • Easy to get started.
  • Easy to use.
  • No manual maintenance.
  • Good SDKs.
  • Very good monitoring via Amazon Cloudwatch or an external tools such as DataDog.

Cons

  • AWS can be overwhelming.
  • Duplicate messages can occur.
  • Expensive if traffic is huge.

More stories on my blog