Understanding the Raison D’être of the Mighty Queue

Knowing when and how to deploy this simple and versatile tool can be an extremely powerful option in data system design.

Paul Singman
Nov 9, 2020 · 6 min read

The following scenario truthfully occurred in my career, and was the moment the queue’s purpose in a system “clicked” and became clear in my mind. I hope that sharing this problem, the resulting struggle, and how I eventually solved it has a similar effect on you.

Friday Afternoon

Photo by Tomasz Rynkiewicz on Unsplash

It’s a Friday afternoon at the office (remember those?) and you are getting ready for the weekend. You have already closed your laptop and are mingling with your co-workers to see where the happy hour spot is this week.

Suddenly, Brett from the marketing department approaches you — a respected senior engineer at the company — and says, “Sorry, but I have a favor to ask. We need to send a $15 off in-app coupon to all our customers in the past year before Monday, can you do it? The leadership team is concerned about sales numbers this quarter and really wants this promotion to go out.”

Although you’ve already accomplished several impressive things during your time at the company, you remain eager to further prove yourself, especially on something with such senior visibility. Before rushing to agree to the task, you realize there are a few things that need to be clarified.

“How many people is that?” you inquire.

“It’s 50,000 in total. I’ll send you a list of the user_id’s in a single-column CSV file. Make sure each user gets one coupon, and no users get more than one… does that work?”

“Yup,” you naïvely confirm that in fact, it does. “Shouldn’t be a problem.” And out you head to happy hour, figuring you’ll handle it the next morning after an iced coffee.

Saturday Morning

Photo by Flipboard on Unsplash

You wake up early and get your typical eggs, hash browns, sausage, and extra-large iced coffee.

You open your laptop and look over the promo.csv file sent to you at 5:56PM yesterday. It takes several seconds to open.

As promised, there’s 50,000 rows in the file. Sweet.

You pull up a Jupyter Notebook, and begin composing a simple script to loop through each row, and send a POST request to the internal Rewards API.

Interlude for the Beloved Reader:

How would you do this? Think for a second how your script would look.

Thought a bit?

Okay good.

Let’s continue on.

After only 10 minutes of coding you produce the following:

You are performing 3 steps:

  1. Read in the csv file into a pandas dataframe
  2. Loop through the dataframe rows and make a POST request to your company’s internal promotions endpoint, passing in each user_id
  3. Raise an error if the status_code of the request does not equal 200

That’s it!

You put your fingers over the “shift” and “return” buttons on your keyboard to execute the notebook cell.

Only now — with your fingers hovering over the fateful keys — do a series of concerns pop into your head.

“Hold on a sec…how long will this run for? I better put some log statements every 100 rows so I know it’s running…”

Phew, good call.

Next, you remember that it is unwise to overload an API with requests, especially your company’s hastily-developed promotion service. So you decide to put a time.sleep(1) in between each request:

Looks great. Okay, each request will take 1 second… multiplied by 50,000 requests… equals nearly 14 hours!

Woah! You didn’t realize it would take that long.

Moments ago you were about to kick off this puppy, now you aren’t sure if you can run it at all.

You’ve done things like this before on 10, 20, even up to 100 users. But you didn’t realize the challenges that arise when scaling it to thousands…

Still, Brett and everyone else are expecting you to complete this. You said you could do it. You decide to make one final change before running:

Instead of raising an error at the end, let’s keep track of all user_id’s from unsuccessful requests and save them in an error.csv file at the end. Fair?

Before doubt can creep in, you hit run!

4 hours later……

You check up on your laptop. The scipt stopped runniing an hour ago and only made it through the first couple thousands users before getting stuck.

Ugh!

With no other options, you bite the bullet and manually split the 50k row csv file into 50 parts of 1k users each.

The rest of your weekend is spent queueing up your script 50 times. Surely there must be a better way?

Monday Morning

Photo by Corey Agopian on Unsplash

You arrive at your office Monday morning and tell your favorite co-worker the weekend’s harrowing tail of sending coupons to 50,000 users.

After hearing of the struggle, he matter-of-factly asks, “Why didn’t you just use a queue?”

“Huh?” you reply.

“It’s simple. Instead of looping through the users and making the promo service requests in the same script, it’s better to place each user_id as an individual message on a queue. Then run another script to read from the queue and make the service requests.

If it succeeds, the message will get deleted from the queue. If it fails, the message will not be deleted and automatically re-tried by default with something like AWS’ SQS queue service.”

Original fragile architecture

Your mind is abuzz. Why didn’t you realize the functionality of queues provide the exact behavior needed to solve the problem you faced?!

Resilient queue-based architecture!

You head to the coffee machine where you see Brett. “Got those promotions sent?” he asks.

Armed with newfound confidence from your understanding of queues you reply, “Yep, got any more?”

Final Thoughts

Queues work by getting in the middle of producers and consumers of data. In our case the CSV file of user_ids is the producer of data, and the Promo API service is the consumer.

Instead of interacting directly with each other, they both interact with the queue itself. This allows for asynchronous processing of messages, where messages placed onto the queue by the producer are durably stored until picked up the consumer.

Perhaps most critically is the way a queue provides fault-tolerance to errors. The way this works is messages become “invisible” while being processed by a consumer — but if never explicitly deleted — will eventually become visible in the queue again.

In this way, failed messages become re-tried (up to a certain number of times) while successful messages get deleted once successful.

To see specific code examples of this process in action with the SQS and Lambda AWS services, stay tuned for Part II!

Sign up for Whispering Data

By Whispering Data

Whispering Data is a publication for all the data & productivity secrets you wish you knew years ago! Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Paul Singman

Written by

DevRel @lakeFS. Ex-ML Engineering Lead @Equinox. Whisperer of data and productivity wisdom. Standing on the shoulders of giants.

Whispering Data

Whispering Data is a Medium publication for all the data & productivity secrets you wish you knew years ago!

Paul Singman

Written by

DevRel @lakeFS. Ex-ML Engineering Lead @Equinox. Whisperer of data and productivity wisdom. Standing on the shoulders of giants.

Whispering Data

Whispering Data is a Medium publication for all the data & productivity secrets you wish you knew years ago!

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store