Backend Architecture — Use Case: Live stream Platform.

dflmnq
7 min readJun 11, 2024

--

A year ago, I had the chance to work on one of the biggest challenges of my career. The idea was simple — starting from a hugely outdated version, design and create the backend for a live streaming platform handling dozens of thousands of requests per second and millions of accounts created monthly.

About us

All the work described in the series has been done by myself and Bastien, let us introduce ourselves quickly :

  • Vincent, backend developer for a bit less than 10 years now, my first experience was in a ride-hailing startup whose stack was using Golang, Kafka, and gRPC. I fell in love with Golang instantly and then specialized in all the challenges around real-time processing and large-scale applications.
  • Bastien "Teyz" Rigaud, fullstack developer that then specialized in the backend a few years ago. He worked in the bank and real-estate domains but has always been passionate about livestream. He joined the project at an early stage.

First step — what’s the need?

Let’s start with the main question — what are the requirements?

First, the platform should expose three APIs :

  • Private API: Used by both our web, mobile, and TV applications, this one is the main entry point of the backend.
  • Public API: The platform should also be accessible for developers using a public API. These endpoints are protected by OAuth 2.0 authentication.
  • Admin API: A third API only for the admin dashboard — with role-based permissions and extra security.

Another important point is about the data we receive from third-party :

  • Payments system: The platform should be able to receive webhooks from multiple payment systems (Stripe, Apple, Google…) and process them.
  • Livestream system: The core component of a livestream platform is.. the livestream system! They all work in a very similar way — each streamer has its channel & URL. All information is sent using webhooks.

Finally, there’s a whole notifications component to consider with two main use cases :

  • Chat messages: The platform has a chat messages system, with one chatroom per channel. It should be able to handle thousands of messages per second.
  • Notifications: Being either to refresh the viewers' count of a live stream or to update information in real-time, notifications should be reliable and able to handle huge peaks, for example when streamers are starting a new livestream.

Besides all these requirements, there are some technical ones to consider :

  • Versioning: We want to iterate fast, and smoothly. To ensure that we break nothing, the whole Backend, from the endpoint to data storage should be versioned.
  • Scaling: Not only should the platform be able to scale on the technical side, but also on the team side. Indeed, for now, the team is pretty small but is planned to grow, and fast. We want to ensure things are organized to simplify it, as much as possible.

Ok, so what does it look like for now?

Big picture of the architecture based on the requirements.

Let’s now define the details of the architecture, component by component.

Second step — The HTTP API & Gateways

The first question has been Rest API or GraphQL.

For the Public API — we had no other choice than using a Rest API. Considering that point and willingness to uniformize all the APIs, we decided to use the Rest API for everything. As we’ll see later, it will also be simplified by the pkg folder and the way we define the responses.

What about the API Gateway?

About the API Gateway, as you can see in the screenshot above, we decided to use Kong API Gateway — filling all our needs from simplicity, reliability, and the possibility to add plugins (authentication, authorization, rate limit…).

Requests processing, after the Gateway.

Once the request is handled by the API Gateway, it will be forwarded to the Backend. This part has been challenging to define and I used this article from Netflix to find exactly what I was willing to implement — the Federated Gateways.

Federated Gateways

This architecture uses multiple gateways to handle the different requests, based on various criteria :

  • Is the request from the Public, Private, or Admin API?
  • Is the request for the live stream, chat, channel, or any other domain?

Each gateway is isolated from the other ones, connected to the API Gateway, and able to communicate with the services based on the business requirements.

Third step — From the gateways to the services

At this stage, we defined how we wanted to handle the requests but also started to define our Federated Gateways architecture. Those gateways will hold the business logic of our platform but nothing in terms of data — that’s where the services appear.

Domains based codebase

Just before going into the details of these services, we need to step back a bit and discuss the domains. To keep everything organized and simple, we created domains — based on the product.

Here are the domains :

  • user (user) -> Our users, settings, preferences…
  • channel (chnl) -> Channel is the streaming part associated with each user. This domain also contains some sub-entities such as social links, moderators, VIPs…
  • livestream (live) -> The Live stream domain contains the live stream itself, but also the VOD, Clips, Thumbnails, …
  • payment (paym) -> Payment manages everything from the payment creation, renewal, fraud or simply storing the status of a given user for a channel based on their subscription…
  • chatroom (chat) -> Chatroom is used for the chat messages, history, identity, emotes…
  • platform (ptfm) -> This domain is almost only internal — it contains all the components globally used in the platform. It can be the search, default settings, accounts analysis…

By creating these domains, we simplify the scaling of the teams but also simplify the readability of the codebase — making it way simpler to know where to search for any data.

Standardized services naming

Applying the same logic as for the domains, the idea with the services name has been to apply a template allowing us to navigate through the repositories super easily.

Services are all named using the following template :

{domain-shortcut}-{service usage}-{service type}

domain: Defined above — any value from user, chat, chnl, live, paym, ptfm.

service usage: Simple name defining what’s the service doing — store, history, modules, settings…

service type: This last part of the template defines what communication protocol can be used in the service. svc for synchronous communication, wkr for the services working asynchronously, rnr for the crons, but also gtw for the gateways.

Examples :

user-store-svc is the service storing all our users and exposes a gRPC server.

paym-transfers-wkr is the service listening to asynchronous messages and generating transfers for the streamers.

user-gtw is the gateway processing all the requests for the user domain in the private API.

The idea is to apply simple rules from the beginning to ensure readability and facilitate future scaling with clear team domains.

Now that we have defined a lot of things, one last point needs to be resolved.

Inter-components communication

The synchronous communication

For the synchronous communication, we investigated two options — either REST API or gRPC. Long story short, we decided to use gRPC.

The main reasons are pretty simple — gRPC allows us to have a single source of truth with multiple versions, but also have a single place on which we could code a client, and then reuse it in the different services. Things are defined, versioning is easier and schema evolution safer.

Also, as we knew from the beginning that we would have to iterate fast, having everything defined and typed was super important for us.

Regarding the implementation itself and the library, we decided to use connectrpc.

The asynchronous communication

Asynchronous communication seems to me always a bit more complicated than synchronous one to choose. From the complexity of the implementation to the maintenance cost, the features included, and the reliability… a lot of points we had to consider.

The main requirements were :

  • multiple consumer groups for each topic
  • possibility of high throughput (chat messages use case)
  • Possibility to replay events in time
  • Low latency

For these reasons mainly, but also based on my previous experiences and our willingness to have this system be part of the core platform, we decided to use Kafka.

The updated version of our architecture with the domains and internal communication layer

Here’s the global picture of what all these components look like, together.

Let’s recap — we have :

  • The API Gateway receives the HTTP requests and then forwards them to our domain gateways
  • The domain gateways hold the logic — using the micro-services to store the data
  • The domains — with the micro-services — store the data and the specific domain logic.
  • The webhooks (payments & live stream) update the backend with data from our partners.
  • The notifications domain pushes real-time updates to different clients.

All these components are talking to each other using gRPC and Kafka.

What’s next?

Now that we have defined the requirements and what looks like our backend, we need to go into the details :

  • How will we organize the contracts between the services?
  • Where will be the shared code?
  • What will a service look like in terms of code organization?

All these questions are exactly what we’ll answer in the next article!

This article is my very first one, thanks for reading it! Any comment or feedback is more than welcome. Also, do not hesitate to ask any questions.

--

--