SwissBorg Notifications

Published in

SwissBorg Engineering

15 min readNov 8, 2022

How we manage them.

Notifications are an additional tool we can leverage to communicate with our community outside of social media channels. Moreover, notifications are an important security, educational and marketing tool at our disposal.

Certain notifications can be used to communicate important information to users even when they aren’t actively using the App. With this feature users don’t need to keep the app open or be on constant alert if waiting for an action, like a deposit to arrive. Instead, they’ll receive a push notification on their device as soon as it arrives.

From a security standpoint, notifications are crucial. When someone’s account info changes, like their email or phone number, we want to alert them to make sure they know about it in case the change was made by a malicious actor. Notifications also play a role in our verification process whereby a new user receives an SMS challenge code to verify their phone number. When a big news event comes along, like the UST depeg, we also push educational content and warnings to users in an effort to assist them in protecting their assets.

Finally, we also use notifications in marketing campaigns and for user outreach. These include product announcements, new coin listings, competitions and voting events, to name a few. When choosing what notifications to send, we are conscious of ‘alert fatigue’ and follow a HIVR (Helpful, Interesting, Valuable and Relevant) formula when deciding what content should be sent out.

We use two main tools to communicate with our users in this way; HubSpot and an internal solution centred around our notification-service. This service was built to provide a central point to manage our notifications ecosystem; being responsible for templating, translations, InApp notifications and managing our 3rd party integrations.

In this post we dive into our internal solution and show how notifications are created, managed and sent.

Notifications Terminology

Before we get to the crux of the matter, let’s clarify some terminology and cover how we classify different notification types. We use the term ‘notification’ in a general sense, though a notification can be broken down further into its constituent parts. Notifications terminology is often confused, so for the sake of this article, we will define them as follows:

InApp

InApp notifications encompass any messages shown to a user inside the app. They are commonly used to help with onboarding, guidance, driving user discovery, and increasing feature adoption.

For this notification type, the app itself determines how they are styled and displayed. Users of the app will recognise them as the expanded text block displayed on the Portfolio screen and in the Notifications History view. This is our default InApp notification style, and until recently, the only InApp display style we supported; more on this later.

Push

A push notification is more of a trigger sent out to mobile devices to elicit an action from the app, even if the app is not in the foreground or is completely closed.

App developers can choose how the app reacts to these triggers. For example, they may preload some data while users are not using the app in order to improve the app’s performance when the user next opens it, or to show a message within the app. At SwissBorg, push notifications are most commonly used to trigger system notifications.

System

System notifications are controlled by the OS of the device that receives the notification, with the notification behaviour and styling dictated therein. In general, we may only decide on the type of system notification we want, for example, the App Badge, Lock Screen, or Notification Bar. Depending on the system notification, we can provide predetermined parameters for that system notification style, such as tap-action, additional action intents, icon, title and text.

In SwissBorg’s case, we employ the use of system notifications when sensitive account changes are detected, and when there are update announcements, feature launches, and transaction notifications, among others.

Example of a system notification triggered by a push notification

SMS

SMS notifications are generally reserved for communicating with users who cannot be reached via InApp notifications, such as someone who has not yet been onboarded. For example, if a Smart Send transaction is sent to one of these users, we send an SMS notification to their phone number, notifying them of the transaction. In this case, the transaction is tied to their phone number, waiting to be received after onboarding.

Email

Email notifications are where we have the most flexibility with regards to notification layout and styling since the messages here can be longer and contain richer content. As such, these notifications are mostly used as Marketing Messages, but are also often used to reinforce the messages in other notification types.

For emails, we also have the ability to send these using HubSpot rather than directly going through our internal service. Since this is a core product of HubSpot, they offer a more feature rich experience for email marketing messages.

Notification Topology / Management

Templating

Now that we know which notification types are in the tool kit, how do we set the content for them? We start by creating a template with its associated Template Content. This separation between template and its content is done to track versioning history for the content as it evolves. Later we’ll see that each sent notification is linked back to its originating template and template content version. Each time the template content is updated, the template is also updated to point to the latest version.

“Organisation is the root of all satisfaction, and satisfaction is the root of all happiness.”
– Our Technical Writer

As the name suggests, the template content allows for templated variables to be used in the content. That is to say, the content is not static. If you’ve ever received an email from a company that began with “Hello {{ firstName }}” then you’ve encountered a templated notification before. It allows us to use templated content, while still keeping the message somewhat personal and specific.

Each Notification Type allows for templating of different fields, as shown by the ADT definition below:

object TemplateData {  final case class Email(subject: String, body: String, plainText: String) extends TemplateData  final case class Push(popupTitle: String, popupBody: String, deepLink: Option[String]) extends TemplateData  final case class Sms(value: String) extends TemplateData  final case class InApp(
    title: String,
    body: String,
    buttonLabel: Option[String],
    buttonLink: Option[String], // Later rendered using handlebars, validated and converted to Option[URI]
    iconLink: Option[String], // Later rendered using handlebars, validated and converted to Option[URI]
    displayProperties: List[TemplateDisplayProperties]
  ) extends TemplateData}

To support this feature, we rely on a popular templating engine called HandleBars. We use this library to convert the templated content provided through our backoffice, into the final substituted version that our users will finally receive.

Which exact variables are available for substitution depends on the context of the notification, but some examples of variables we can use here include User Profile information, Product information, and Transaction information.

Template Collections

The final piece of our notifications topology is our Template Collections. These collections represent one or more notification types and their templated content to send. This allows us to group multiple notification types into one logical group for sending. This is used quite heavily, for example, when we send a system notification to the user, it is often accompanied by an InApp notification relaying the same message, perhaps with additional details.

We enforce a constraint of 1 notification type per collection to ensure each collection represents one piece of information to convey. We do not allow two distinct Email notifications to be part of the same collection, as an example. This is enforced at the database level by the composite primary key of the template content — PK(collection_id, type, version).

Our marketing agents create and edit collections and template content via a BackOffice interface, which issues commands to our notification-service’s gRPC endpoints.

Notification categories

We differentiate notification collections into two groups. The distinction between the two groupings can be boiled down to the following points.

Transactional Notifications (TNs)

A notification triggered as a result of some internal event.
For example this might occur when a user’s account is verified, funds arrive after a deposit or a user’s transaction has failed.
They serve to notify users of important events concerning their account when the application is closed, either via a direct notification (push, email, sms) or the next time they open the app (InApp).

Marketing Messages (MMs)

A notification triggered as a result of some external event.
A Marketing Message can be triggered for a variety of reasons from outside the SwissBorg ecosystem. When and Why these get triggered is defined by our Marketing Agents.
The can be used during a user journey to notify them about new features, compliance issues, or more generally for direct communication with specific user cohorts.

Translations

If you’ve read our previous blog post about Internationalisation at SwissBorg, you’ll remember we try to centralise our translations in Lokalise. As our approach to notifications predates multi-language support in the app, and we have the additional complexity of heavily templated strings, this generalisation is not held inside the notification-service. Currently, we have somewhat of a hybrid approach where TNs, which have well defined static content are resolved with Lokalise, but Marketing Messages (MMs) are handled differently.

In the case of MMs, translations are managed by the notification service itself, storing the template content for each language individually. The data we store for Template Content contains a map of Languages to TemplateData.

final case class TemplateContent(version: Int, kind: NotificationType, langs: Map[Lang, Option[TemplateData]])

I won’t go any further into detail here as this is an exception to the rule, and we have already discussed translations at some length in another post.

Architecture

Now that we’ve seen how notifications are managed, we can take a look at how they get from this point to our user’s devices.

Queueing

As we handle both TNs and MMs in the same service, we can expect there to be an uneven distribution of load. TNs have a relatively predictable and steady flow, however with a new marketing campaign or new feature launch we see periodic spikes of MMs to process, as shown in the graph.

As a result, processing notifications is inherently suited to a queue design by its nature. We are utilising streaming quite heavily, using Kafka via Akka’s alpakka client library in our implementation. This brings with it the guarantee of ‘At least once delivery’ ensuring that notifications are processed by the service

However, since we are not guaranteed ‘At most once’ or ‘Exactly once’ delivery, we also take some measures to deduplicate incoming events. We do this by storing the originating correlation_id of the event with the resulting notification. When a duplicate event is processed, we’ll see we have already emitted a notification for this correlation_id and skip it.

This protects us from duplicate notifications being issued, however, it does not guarantee ‘exactly once delivery’ as it’s possible the service itself fails between sending a notification and updating the internal status.

Kafka is responsible for managing the incoming events for the service, while an internal postgres table manages the commands currently being processed. This approach was chosen over a solely kafka based solution because of the advantages of its low-tech implementation, which is quite resilient and easy to debug. The implementation allows us to manage notification priority (more on that later) and retry logic in a central place. It is quite performant, and can handle the load easily as illustrated by the following graph.

During normal operation, with a full queue of notifications to send, we poll our postgres notifications_queue postgres table for ‘reservable’ notifications to process, throttling only when downstream back pressures. This polling is implemented using Akka Streams and an SQL query issued through doobie. We’re taking this approach over Postgres’ LISTEN functionality because LISTEN also involves polling on the PQnotifies function, and support in JVM libraries is relatively poor.

Source
  .repeat(())
  .via(RestartFlow.withBackoff(restartSettings)(() =>
    Flow[Unit].mapAsync(1) { _ => dispatcher.unsafeToFuture {       
      repository.reserveNotifications(
        notificationType,
        config.redeliverInterval,
        throttleConfig.batch()
      )
    }})
  )

If the notifications_queue is empty, we initiate a backoff period using Akka Streams custom cost calculation feature, whereby we stop polling the database for some time in order to reduce the number of calls to the database.

Stream.throttle(
  1,
  config.emptyQueueBackoff,
  list => if (list.isEmpty) 1 else 0
)

Since adding this optimisation with a 500 millisecond backoff, we’ve seen a significant reduction in queries hitting the database to our current rate of around 1080 requests per minute.

Priority

Not all notifications are created equal, and given the constraints around rate limits, sometimes we have to pick favourites. As mentioned previously, this is handled during the polling query to our database notifications_queue table.

When creating notifications, we allocate a numeric priority to each of them. This priority value is used in the ORDER BY clause of the query in order to return queued notifications in ascending priority. In effect, we send notifications about transactions and account changes before we send notifications about new features.

In the future, if we see a degradation of performance, we may need to get more granular with our priority levels. An example of this; perhaps account update notifications should take priority over transaction updates, for security reasons.

Below, you can see the polling query behind the repository.reserveNotifications(…) call from the previous snippet to illustrate this.

UPDATE notification_queue
 SET status = 'reserved', updated_at = CURRENT_TIMESTAMP
   WHERE notification_id IN (
     SELECT notification_id FROM notification_queue
       WHERE (
         status = $available OR
         updated_at + ($delayMillis || 'milliseconds')::interval < CURRENT_TIMESTAMP) AND
         process_after < CURRENT_TIMESTAMP AND
         notification_type = $notificationType
       ORDER BY priority, process_after
       FOR UPDATE SKIP LOCKED
       LIMIT $limit
)RETURNING *;

Rate limits

As we rely heavily on 3rd party providers for delivering notifications, rate limiting is a concern here. The number of notifications sent grows linearly with the size of our user base, and also with each new notification added. We needed to address the issue of having a shared rate quota in a horizontally scalable setup.

Our solution to this is a small utility class which utilises Akka Kubernetes Discovery API to determine the current number of service pods running, and splits the allocated quota between them; quota / numPods per service. This information is refreshed every minute. The implementation assumes all pods know the total rate limit, which may not hold, as an example, during configuration changes to a lower overall rate.

Using this information, we throttle the processing inside each pod to its allotment of the total quota.

Retries

Using our notification_queue table also simplifies our retry logic for notifications. Had we gone down the route of additional retry topics, the number of kafka topics needed by the service would have increased dramatically.

When we fail to send a notification, and consider the error to be a retryable error, we reschedule the notification by updating the status from ‘reserved’ (as set in our polling query) back to ‘available’, updating other fields required by the retry logic such as a process_after timestamp and the retries counter for the notification. The polling query will pick up this notification again when it is ready to be reprocessed.

UPDATE notification_queue 
  SET status = 'available',
      updated_at = CURRENT_TIMESTAMP,
      process_after = CURRENT_TIMESTAMP + ($backoffMillis || 'milliseconds')::interval,
      retries = retries + 1
WHERE notification_id = $notificationId

Some error cases are determined to be non-retryable. In these cases the message is logged and removed from the postgres queue. InApp notifications are also not retried, as our retry solution relies on writing to postgres, and failing to send an InApp notification would be a result of failing to write to postgres, as we’ll see later.

Time To Live

At some point it does not make sense to try to deliver a failing notification anymore, even if we have not reached the maximum number of retries allowed.

As a trivial example, imagine we send a user a TAN code challenge to verify their phone number, which they need to enter within 1 hour. If the notification has not been sent in 1 hour, does it make sense to keep retrying? The user may eventually receive the notification, but the code will be invalid.

For this reason, each notification command has a Time To Live (TTL) value, beyond which point the notification is considered outdated, and will not be sent. We configure this based on the originating trigger of the notification, examples being, 1 hour for challenge sms, 1 day for deposit notifications, 1 day for marketing messages, 7 days for account changes. This TTL is considered both in our first attempt to send the notification and also when executing our retry logic.

Sending and Overview

Finally, we get to the point where we send the notification. Let’s briefly recap. Notifications are triggered in three ways; via a gRPC endpoint or through kafka, either by reacting on a known transactional event or by a webhook call from an external service. In general, most notifications are triggered via a webhook call from our external analytics tool. Triggering via gRPC is usually reserved for testing or edge cases but follows the same flow once inside the service.

From these input triggers, we first transform them into a notification-service domain model object called a NotificationCommand, one of which is a SendNotification command. For all input triggers, the trigger provides at least the ID of the template collection and the user. From these IDs we retrieve and ‘explode’ the template collection; splitting the collection into separate SendNotification commands for each configured notification type and resolving the required user information from their ID — phone number, email, language, push tokens, etc. These commands are then put on another internal kafka topic for further processing.

Reading from this queue, the CommandTransformer sends the notification to the user, linking the notification to a specific version of the template content for reference.

The flow inside the CommandTransformer goes as follows:

Checks if a notification was already issued with the same correlationID.
Checks if the command is still valid based on the TTL.
Creates the notification in a Pending state.
Enqueues the event to be sent.
Emits an event to notify of a status change, moving the notification to ‘Pending’.

‘Sending’ the event usually means inserting it into the notifications_queue postgres table. However, in the case of InApp notifications — as these are provided to the apps by a REST API endpoint and get pulled from the server — ‘Sending’ for InApps means rendering the template (resolving templated values in the notification) and inserting directly to a rendered_notifications table, which is queried by the endpoints.

In contrast to our InApp notifications, all other notifications are delivered by a 3rd Party integration. These delivery methods do not have the limitation of requiring a user to open the app. Multiple vendors are typically used per type, and the vendor is chosen non-sequentially from a list of possible vendors in order to spread the load.

For testing purposes, we avoid incurring costs associated with our 3rd party providers by redirecting some notifications either to the void, or to a cheaper alternative such as a Console log or Slack.

Building out FullScreen

As we touched on earlier, InApp notifications materialise as an expanded block of text displayed on the Portfolio screen and are viewable in the Notifications History. These are our default InApp notification style, and until recently, the only InApp display style we supported.

We recently extended our InApp notifications to support different display stylings within the app. The feature was built out specifically to support Full Screen notifications, but with the possibility of enabling other display options in the future.

For this, we decided to refactor our InApp Notification model to allow for the configuring of different display options. Our InApp notifications now contain a list of display properties that they can exhibit in the app, this allows us to support new display styles in the future simply by defining a new DisplayProperties type for this array.

When creating the template content, we validate that this list of display properties contain only distinct values. In this way, it would not be possible to create an InApp notification that has a duplicate configuration for a display style.

The mobile apps then in turn interpret these properties on their side.

Before

final case class InAppNotification(
  notificationId: Option[NotificationId],
  title: String,
  body: String,
  buttonLabel: Option[String],
  buttonLink: Option[URI],
  iconLink: Option[URI],
  createdAt: Option[Instant],
  readAt: Option[Instant]
)

After

final case class InAppNotification(
  notificationId: Option[NotificationId],
  createdAt: Option[Instant],
  readAt: Option[Instant],
  displayProperties: List[DisplayProperties]
)sealed trait DisplayPropertiesfinal case class Simple(
  title: String,
  body: String,
  buttonLabel: Option[String],
  buttonLink: Option[URI],
  iconLink: Option[URI]
) extends DisplayPropertiesfinal case class FullScreen(
  title: String,
  body: String,
  buttonLabel: Option[String],
  buttonLink: Option[URI],
  asset: Asset
) extends DisplayProperties

In our two currently supported display properties, Simple InApp notifications are interpreted by the apps as the existing behaviour, where a notification appears on the Portfolio screen and in the Notifications History. A FullScreen InApp notification will be displayed covering the whole screen when the user opens the app. A notification configured with both Simple and FullScreen display properties will exhibit the behaviours of both types.

Et voila, Full Screen Notification for the launch of Thematics in the app!