Groupon Push Marketing — How we send almost half a billion notifications a day

Makhtarali
Groupon Product and Engineering
9 min readJan 31, 2021

Written by Mohammad Akhtar Ali

Marketing the products or services the local merchants sell (called “deals” in Groupon’s terminology) is what brings customers to our website daily. The entire pipeline delivers almost half a billion messages each day and that requires a very stable system under all kinds of circumstances. Building such a system and maintaining it gave us a lot of learnings that we would like to share in this short blog

Push Marketing focuses on taking the product to the customer and putting the product in front of the customer for purchase. This type of marketing strategy intends to minimize the amount of time between a customer discovering a product and buying that product. To this end, companies use aggressive and wide-reaching ads and notifications to make the biggest and most immediate impact they can have on customers.

What do we do as part of push marketing at Groupon?

At Groupon, we send millions of email and mobile notifications to an end-user with the purpose of bringing relevant deals to the end-user and increasing our purchase rate. However, being a reputable e-commerce company we do understand that bombarding users with a lot of notifications won’t serve the purpose, but delivering the right content to the right audience will. Every day thousands of manual and automated marketing campaigns target the right audience and the right cadence.

The same system is also used across Groupon to send communications to the end-user based on various events like purchase, password reset, shipping, voucher expiry reminders, and 100 other different use cases. There is a limited opportunity to send relevant deals the user might like as part of these notifications.

Our push marketing targets both Groupon and LivingSocial, a total of 15 countries(North America and EMEA), ~70 million users, verticals (Health-Beauty-Wellness, Food & Drink, Goods, Things-To-Do, Travel, Coupons, and more), and four application platforms (Desktop, mobile web, iOS, Android). Our daily outreach includes scheduled marketing campaigns as well as user event-driven marketing campaigns (deal you viewed, deal price drop, searched deals). Targeting all customers in all verticals and platforms requires us to send almost half a billion notifications daily within a short timeframe.

A few people may ask why timing is important. The time at which our users receive notifications from us is a critical attribute contributing to the success of our marketing campaigns. We make sure the schedule of our notification is in coordination with our target users’ timezone so we focus in to the best time when they would likely open and read. At the same time, some notifications which are time-critical (such as a very limited time promo code or a late reminder) have to be delivered as soon as possible from the scheduled send time.

Another thing that is important is the cadence of communication. In order to restrict the number of notifications to a certain number (usually 4 to 5), we pick the best emails that are most likely to engage our customers. At the same time our business partners want certain campaigns to be always delivered (called Guaranteed Delivery). These don’t go through arbitration and are theoretically unlimited(monitored for actual cadence).

As part of push marketing, we send emails, mobile notifications, and App Inbox notifications.

Metrics that matter to the business

As part of push marketing we primarily focus on the following metrics:

  • Open rate
  • Click-through rate
  • Purchase rate

Message personalization is an important factor that impacts business metrics. A study reveals that personalization can lift open rates by up to 26% and can impact the overall revenue by almost 800%.

Important requirements of the overall push pipeline include:

  • Deliver all email and mobile notifications within two hours of scheduled time because the campaigns are strategically targeted for a duration where the chances of opening, clicking, and purchasing are higher
  • Deliver the right content (relevant set of deals) to the right audience
  • Use the user’s subscription to determine the kind of marketing messages that can be delivered
  • Use the right look and feel for the right email type
  • Be able to detect moving users and target them with deals in the vicinity
  • Be able to handle a sudden spike in traffic due to many campaigns at the same time
  • Handle personalized sends for large audience campaign ( for example >20M as personalization is resource-intensive)
  • Consider resting non-engaged users to avoid resource cost on those deliveries; however, unrest as soon as they start
  • Consider block sends to users who do not want any marketing emails or mobile notifications irrespective of their subscriptions
  • Ability to support automated data-driven marketing campaigns which may get scheduled in large numbers
  • Deals sent in an email or mobile notification should be an active and non sold out deal. ( edge cases should be rare)
  • Ability to constantly update both rested users as well as user denylisting
  • Ability to support every kind of client who wants to use the notification platform for user communication.
  • A message which is delayed beyond a threshold should be dropped
  • Mobile notifications should almost never be delivered beyond a certain delay
  • Various platforms should not impact each other (i.e. email campaigns should not result in the delay of push campaigns)
  • Able to manage email and push notification bounces (hard and soft) as they waste resources and impact the reputation of the company with ISPs

High-level flow of the push pipeline

Core notification scheduling platform architecture

Core notification delivery platform architecture

What have been our various issues and how we solved them

  1. Audience size: Delivering large campaigns of ~60M audience within the target timing of 2 hours has been a challenge. We scaled our system and dependencies to increase our rate of delivery to deliver the same volume within timing window.
  2. Personalization: Delivering large personalized campaigns within the timing window of 2 hours was a problem because we would be able to cache our deals to be sent. This again was solved by scaling to increase our delivery rates.
  3. Reliable Delivery: Message State Tracking and acknowledgments from both producers and consumers confirmations ensure reliable delivery of the messages.
  4. Frequent queue backlog: Any time any of our dependencies had issues for a prolonged period it would create queue backlog on our side. In few cases, this happened due to a large personalized campaign running. We investigated to fix more frequent cases.
  5. Avoiding domain deny-listing: Ensuring that we are never deny-listed requires us to carefully and gradually scale up as email delivery rates can make ISPs deny-list us temporarily causing our sends to be blocked
  6. Avoiding duplicate deliveries under all circumstances: Duplicate marketing messages can irritate end users and our system should ensure no duplicate sends at all times. This has been tricky under manual retries.
  7. Faster delivery to SMTP server: We use TCP(Level 4) load balancer, which is simple, faster, and efficient which suits the scale of our push pipeline, to distribute the email send requests to MTA (Mail Transfer Agent) servers. MTA is responsible for delivering messages regardless of the content, it simply forwards the network packets to and from the upstream server.
  8. Running the system during issues at dependent systems: Our email delivery platform is dependent upon almost 10 different services. Latencies and issues at dependent systems often slowed down the pipeline. We used a mix of automatic and manual retries to handle these scenarios.
  9. Immediate sends: Often campaign managers would ask us to immediately send a big campaign impacting revenue. This involves pulling large audiences in the range of ~60M users, calculating whether the users should receive the intended campaign, and dispatching the request to the downstream system for sending the email.
  10. High CPU utilization: Traditionally we have been facing CPU utilization as high as 99.5%. We used profiling tools to find out the root cause which were some of the template processing being high while having a genuine capacity issue. We worked on both to solve the problem.
  11. Ensuring a non-engaged user is not being sent marketing campaigns: Groupon as a company has a strong reputation and we ensure that we do not send to those dormant users who are not engaging with us. We continuously integrate with systems that have this information.
  12. Deal recommendation latencies: Intermittent but prolonged issues at dependent deal recommendation service slowed down our pipeline resulting in SLA miss once in a while. This was happening because of the deal recommendation service not being able to scale up quickly to the sudden increase in traffic. We solved this issue by having an estimator service that can fetch information like current pending queue backlog and upcoming campaigns to better plan the clusters required at deal recommendation system to process upcoming request.
  13. Distinct rate limits: When you have a pipeline expected to run at a certain rate, introducing new dependencies with smaller rate limits can pose challenges. We handled these scenarios with effective caching to work with lower ones.
  14. Optimizing load balancer connection policy: High number of backend connections to the load balancer was resulting in increased latencies and causing timeouts at the upstream service. The degradation in performance was remedied by changing the load balance connection policy to suit the traffic requirement.

Holidays Peak Week

Holidays Peak Week (including Black Friday and Cyber Monday) is an important time for us. We generate a major portion of our yearly revenue during this time and our marketing campaigns play a major role in bringing customers to the website for purchase. We run special campaigns that are highly personalized in the hope of doing better targeting thereby hitting our business metrics. At the same time, the overall number of sends typically increases by more than 40% during peak week.

What do we do as part of holiday preparations?

  • Do an audit and dry run (if required) of all alerts before peak week to ensure we are covering all scenarios with respect to Alerts
  • Review marketing campaigns hourly calendar to see the schedule of sends
  • Do a co-audit of the actual campaigns with the business to ensure everything is written with respect to campaign configuration
  • Load test our system and thereby our dependency before the peak week. In some cases, we might simulate the actual load pattern to ensure the deal recommendation systems are consistently returning success

Our learnings in the last 5 years developing and maintaining the push pipeline include:

  • A notification platform has to be extensible so as to serve a variety of use cases that can be controlled from the client-side
  • The platform can linearly scale with traffic only if dependencies can
  • Have separate server fleets for separate kind of SLA requirements
  • Continuously upgrade the tech stack if performance is important for you
  • Consider moving to third-party tools and libraries even if you have developed something in the past
  • Template rendering is a CPU intensive operation and often can shoot the CPU utilization. Load testing various available libraries can help the team choose better
  • Circuit breakers are crucial when the system is dependent upon N different kind of systems which have different latencies
  • Retries should always be with exponential backoff and jitter
  • Be careful when there is a pipeline where you have many dependent systems with different rate limits. There could be unexpected results

Conclusion

We hope this article gives you insight into ways to optimize a push marketing environment. This article is Part 1 of multi-part series. In subsequent articles we will to zoom into some of the individual components that form the robust push delivery pipeline.

Glossary

Deal: In Groupon terminology, we call products a deal. This is because we are market leaders in local commerce and we deal with all kinds of local services and goods collectively called a deal.

Audience: A set of consumers to which a campaign is to be sent is called an audience.

Campaigns: A campaign is a set of template metadata along with an audience to which it has to be sent

Templates: The looks and feel for an email which includes body, subject

Deal recommendation source: A system that is responsible for getting us the relevant deals to be sent to the end-user in emails and mobile notifications. The deals can be based on different verticals

--

--