Why did we build UC’s Central Communication Platform: Raven?

By — Siddharth Kumar (Engineer, New Categories & Connectivity)

UC Blogger
Urban Company – Engineering
8 min readDec 15, 2020

--

Effective communication in this era of e-commerce is key to cementing relationships with customers. Its content, being delivered and at the right time is very important. As the organization grows, it starts using different types of communications including SMS, WhatsApp, Email, Notification and IVR. And with increasing scale it onboards different vendors. By this time, you generally abstract out the vendors and implementation logic by providing channel-wise services. We did the same at UC as described in the diagram below:

Figure 1: Architecture of the First Version of the Communication Platform

The first version (legacy) of the communication platform at UC had a very simple responsibility, to take the content to be sent (SMS/Email/etc) and the sender’s details (Phone number/Email id/etc) and trigger the communication. Although it was simple and served us well, as we scaled with more use cases and more services, we ran into some limitations:

Limitations with Control & being developer/PM friendly:

  1. Content: The content/text of the communications were managed by the calling services. Whenever the content of communication needed to be changed by a PM/business-owner, it required the service deployment by a dev. Thus adding a dependency on devs and hogging their bandwidth for a not-so-engineering task which included testing as well for a slight change.
  2. Type of communication: Often we wanted to change the communication from SMS to notification or email to save cost or to reduce noise. Or trigger both or multiple types of transactional communication (for example payment confirmation). This required a lot of change in the business logic of calling service.
  3. Configurability and logic: Often for the same trigger (For eg: booking confirmation), we needed to send different content or/and types of communication to different countries. There were cases when we needed to trigger fallbacks like if the user hasn’t opted for WhatsApp or notification wasn’t delivered in time, send a fallback SMS. These things were again maintained at the calling service.
Figure 2: Sample code showing configuration & trigger logic on different conditions in one of the services.

Limitations with Scale & Resiliency:

  1. Delivery: It was an API based service, so if the service was down or APIs took more time, we lost important communication. Or the calling services had to implement retry mechanisms and fallbacks to ensure the delivery. They also had to implement circuit-breakers in case the API started taking more time to ensure they don’t get impacted (cascading effect) by it.
  2. Priority: There was no way the system could distinguish between high and low priority communications. If it was busy delivering the low priority ones, the high priority ones were either failing or getting delivered late.
  3. Cost: If there was a burst of communication (because of marketing or retention campaigns), this service scaled accordingly leading to higher infra costs. While these could have been delivered in a staggered way without scaling the servers.

Raven: The Central Communication Platform of UC

Rather than solving these limitations one by one in the existing service, we re-architected the service which is now able to handle these limitations along with giving us enormous other benefits and opportunity to add numerous new features. Presenting Raven(start clapping :D). Why the name you ask? Ravens (black crows) were used for a highly reliable communication mechanism in ancient mythical times, sounds cool, right? :)

Figure 3: Raven’s High Level Architecture

We abstracted out a lot of common logic from across the services and put them in Raven itself and provided a user-friendly dashboard for PMs from where a campaign can be created/edited by them having different types of communications (SMS/Email/Push/WhatsApp). Raven backend is now Kafka backed async service.

Handling Control issues & making it developer/PM friendly:

Figure 4: Dashboard from where PMs can change the parameterised content.
  1. Content: Now, the PMs have no need to rely on the devs to change the content or enable/disable a communication removing the need for deployment of calling service. They are able to do all this from Raven’s dashboard(fig.4).
  2. Type of communication: From the above panel, it’s evident that PM can configure any combination of types of communications whenever they want, without having to involve devs. Also, since it’s on the dashboard it’s always visible what types of communication are being triggered to anyone who is working/going-to-work on it.
  3. Configurability and logic: Now for the same trigger the calling services do not need to write different logic for different content. Using the dashboard they are able to create different variants. For eg: if we want to send SMS & push notifications to our Indian users, but only push notifications to our Australian users. It can be easily done from Raven’s dashboard (fig.5).
Figure 5: Along with fig.4 this is a variant set for Australia, sending only push notifications for the same trigger.

Handling Scale & Resiliency issues:

  1. Delivery: Since Raven is an async service. So even if the service goes down, the queue will hold the subsequent communication packets until the service is up and running. This might add to the delay, but ensures that all communications get delivered. Being async also made sure that there’s no cascading effect on calling services even if Raven slows down.
  2. Priority: There are 3 priority-queues for different priority (High/Medium/Low) communications. So that the low ones (marketing campaigns) don’t impact the delivery of high ones (transactional campaigns). Also, with different consumption rates on these queues, it makes sure that there is minimal delay in sending high priority communications.
  3. Cost: Even if we get a burst of marketing communication, we don’t need to scale servers proportional to the load as the communication packets are held by queues and a slight delay in these communication is fine.

Numerous other advantages of Raven

We build Raven, not just to eliminate the previous limitations but to add many more functionalities:

  • A/B Experimentation(Empowering PMs): We integrated Raven with our user-service to retrieve all the relevant information like country/city/category/etc for a user, using which a lot of A/B experimentations can be configured through the dashboard for different use cases, having different content for different types of communications for the same trigger. For eg: Sending different SMSes for different categories for a successful booking. Or adding a video call link to an email for an online category when a professional is assigned(fig.5).
  • Empowering Developers: Dev’s life has become very simple now. All they have to do is send the parameters (fig.4) in their payload and the rest is taken care of by Raven. No more logic for A/B experiments, countries, users or anything. No more maintaining different content for different scenarios. This also removed a lot of boilerplate code from a lot of services. See the difference between fig.2 and fig.4.
Figure 6: Look at the code difference from fig.2 for the same scenario.
  • User Personalisation and Guardrails: Since all communication flows through Raven, we can make it personalized for our users. Also, we can configure per user per campaign limit and overall per user limit to avoid any bombardment to users because of bugs in calling services. Also, from fig.7 we have user_type and validity to make sure communication meant for our partners is not triggered to customers or communication meant for Halloween is not triggered after 31st October.
Figure 7: Team tagging, priority of communication, validity fields on the dashboard.
  • Parameterised Campaigns and support for default Parameters: The content of the communication contains parameters to be sent by calling service(fig.8). Few default parameters like customer_name, pro_name, etc are populated by Raven itself, so the calling service need not worry about populating it on its own by being dependent on other services.
  • Functions: Raven has multiple inbuilt function support for eg: url shortening, standard date formatting, etc for different types of communications(fig.8). This has many benefits like reducing the size of URL reduces SMS size thus reducing the cost.
Figure 8: shortUrl & smsDateFormat is used to shorten urls and date for our online services.
  • Localisation for Free: Integrated with Groot service(UC’s central translation platform), Raven provides free translation from the dashboard without devs/PMs worrying about it.
  • Send a test campaign: While creating the campaign, PMs can simply do “Send a test Campaign”. This ensures that the content is manually verified before making it live for production and can be tested without the involvement of devs of the calling service.
Figure 9: A localised test notification sent using “Send a test” functionality.

Internal feedback on Raven

This has been a game changer and over a period of time because of how developer and PM friendly it is, it has been adopted widely. It has reduced the code massively while giving autonomy to PMs. Some screenshots of love:

Reaction after hell lot of boiler code was removed for existing communications.
Reaction when you see something so amazing!!
Kanav, SVP @ UC, drove us to build Raven.

What next: I’ll soon share the tech aspect of Raven, will discuss design in depth and how we have reduced payload size and API time while handling more traffic.

The team behind Raven

  1. Rushil Kapoor: Rushil Kapoor is from NSIT’15 batch. He has owned and implemented Raven from scratch. He converted an idea into reality which is creating a huge impact for the tech team at UC.
  2. Vishwesh Mishra: Vishwesh Mishra is from IIT-BHU’16 batch. He recently joined UC and started owning the scalability and reliability of Raven. He has taken the Raven to the next step and is building it to be reliable with 10X of the current scale.
  3. Siddharth Kumar: Siddharth is from DCE’12 batch and has worked in various industries like sports, hospitality & logistics and recently(March’2020) joined UC. He leads the New Category and Communication pods. He is expanding his tech team and would love to get in touch with folks like you.

About the author

Siddharth is leading connectivity and new categories pod as an Engineering Manager. Currently, he is scaling home painting category at UC. He loves to roam around, and add photos to google maps and has 95 lakhs+ views on his contributions. Guess what does he get in return from Google!

Sounds like fun?
If you enjoyed this blog post, please clap 👏(as many times as you like) and follow us (@UC Blogger) . Help us build a community by sharing on your favourite social networks (Twitter, LinkedIn, Facebook, etc).

You can read up more about us on our publications —
https://medium.com/uc-design
https://medium.com/uc-engineering
https://medium.com/uc-culture

https://www.urbancompany.com/blog/humans-of-urban-company/

If you are interested in finding out about opportunities, visit us at http://careers.urbancompany.com

--

--

UC Blogger
Urban Company – Engineering

The author of stories from inside Urban Company (owner of Engineering, Design & Culture blogs)