BFF: How to scale and avoid pitfalls?

Part 3: A Design Pattern with Challenges and Best Practices

Raphaël Tahar

Published in

Decathlon Digital

6 min readOct 18, 2023

DALL-E interpretation of “Scaling according to Klimt” (scaling means becoming small again, so a representation of multiple small dots or systems makes sense I guess 🤷‍♂️)

📖 Series table of content

This series explores the Backend For Frontend design pattern in 4 different dimensions captured in 4 posts.

Part 1: A Design Pattern Helping Teams Gain Ownership
Part 2: What technical benefits?
👉 Part 3: How to scale and avoid pitfalls?
Part 4: Alternatives & decision tree

BFFs Challenges 💪

Backend For Frontend also comes with challenges (especially if your product is traffic intensive).

Let’s explore the first design challenge you might face when implementing a BFF: Fault tolerance must be built-in.

01 Fault Tolerance through Error Handling.

Back to our reference use case.
Imagine that your BFF must aggregate data coming from 3 APIs: User, Products, and Recommendations. The ultimate goal is to display a list of products and recommendations for a given, authenticated user.

The page expects a payload of the following type:

type ProductPageExpectedPayload = {
    user: User,
    products: [Product],
    recommendations: [Recommendation],
}

Now, what happens if the Recommendation microservice fails to respond? (Failure might come from various origins, especially in infrastructures adopting a cattle mindset).

If a basic error-handling implementation is used, the entire GET product page call will result in an error, leaving the client app with no choice but to display a global error over the whole page since no data is retrieved.

We can all agree that it is the definition of a poor UX that will negatively impact your business.

To avoid this pitfall, you should implement your BFF routes to be fault-tolerant through graceful degradation techniques.

This means that when a call to a service fails, the BFF should respond to the client app with a partial payload embedding every piece of data it successfully retrieved from other backend services.

Following this guideline, the payload would look like this:

type ProductPagePayloadOnRecommandationFailure = {
    user: User,
    products: [Product],
    recommendations: [], // Graceful degradation responding an empty array
}

Note that a standard format must be decided within your organization to be used as a norm.

Here, empty Recommendations are serialized in JSON as an empty Array [], but it could also be removed entirely from the payload or given a null value (an empty array is a good default and can avoid many JS runtime errors; it’s a good in-between regarding the number of characters sent over the network for describing a null value).

02 Resource capping

BFFs could implement fault tolerance through Circuit Breaker, Caching, and Bulkhead mechanisms for the most traffic-intensive and critical applications.

These patterns have a common objective: limiting the blast radius in case of the failure of a service.

Circuit Breakers’ avoid sending traffic to a failing service to give it some space to recover. It also reduces the time spent by the BFF waiting for unresponsive services.
Caching lets BFFs respond to clients with stale data rather than with a gracefully degraded payload (conditioned to data availability and a functional use case allowing it). As Phil Karlton said, this is one of the two most complicated things in computer science (with naming things correctly), so take this path only if you have to.
Bulkhead helps with thread pool issues. Back to our example, if three microservices are aggregated by a BFF, a Bulkhead strategy aims at splitting the thread pool and assigning a defined subset to each service.
This way, a faulty service can’t affect other healthy services (at the cost of a smaller throughput).

03Domain’s business logic ownership & API design

As we previously stated, creating a BFF as an intermediate layer between client applications and backend services will bring flexibility by limiting the need for teams to synchronize.

This is a good point for a time as it enhances teams’ productivity by avoiding eventual roadmaps collisions. But after a while, this can hide misplaced or duplicated business logic (especially with the “one BFF per experience” rule). If multiple BFFs reproduce the same data fetching and processing features, the overall organization and global team efficiency will suffer since each team will spend time coding, maintaining, and monitoring multiple versions of the same functionalities. And no one would know about it, as code bases would live in parallel.

Teams’ awareness and business logic ownership should be regularly discussed. This is the primary way to mitigate this pitfall.

There are several ways to monitor this:

Set monthly or quarterly rituals (sync or async) to review new projects’ scopes.
Create an architecture committee (objectives: help teams capture Architecture Decisions (AD) in Architecture Decision Records (ADRs), check for previous ADRs conflicting with any newly created ones, and review Architecture Decisions proposed by teams)

An Architectural Decision (AD) is a justified software design choice that addresses a functional or non-functional requirement that is architecturally significant. An Architecturally Significant Requirement (ASR) is a requirement that has a measurable effect on a software system’s architecture and quality.

Monitoring duplication is a good start, but what should be done once this over-duplicated business logic is discovered within your stack?

There are several choices:

Either, several backend teams decide to dispatch this business logic inside their existing services.
Or, a new centralized service is created to consume the existing services and implement the business logic that BFFs used to run (a kind of functional service mesh, if you will). Some call this new layer an APEX which stands for API Experience.

The discrimination point is the business logic nature. It should be owned by the team whose domain definition is the nearest. If it falls right in the middle of several teams’ domains, pick option 2; if not, select option 1.

Ultimately, it’s about transferring that business logic ownership from the multiple frontend teams to, ideally, a backend team (but dependent on your context, it could be preferable to split it into numerous backend teams).

Best Practices ✅

To finish this BFF walk-through, here is a short digest of best practices and tips:

Create a BFF per experience.
The first objective of BFFs is to provide tailor-made data for a specific frontend application, no more, no less. Browser, iOS & Android are all considered different experiences.
BFF frontend ownership first.
The second objective of BFFs is to enforce team autonomy. So, the best move is to attach its ownership to the frontend team consuming it.
Built-in fault tolerance.
Avoid poor user experience by baking fault tolerance into your BFF error-handling layers. If your use case requires it, dig into the high-traffic solutions to ease backend pressure and help self-healing infrastructure mechanisms.
Hoist every system and sub-system error to the same semantic level.
BFFs are at the crossroads of several sub-systems implementing potentially different error formats, and a frontend application that delivers a unified experience to end-users. For it to be uniform, BFFs must also aggregate and align disparate resource providers which might have different error semantic levels (BFFs are also a way to implement Anti-Corruption Layers).
Prefer using the same language for your BFF and client applications.
Frontend engineers will split their time working on the client and BFF applications. Building these two apps in two languages would create a useless additional cognitive overload for your teams. Without mentioning the difficulty of finding polyglot staffing.
Use a Monorepo to host both your BFF & frontend applications.
New features will most of the time include changes in both applications. Localizing them within a single repository might greatly help you (full-stack auto-completion, a unique CI, atomic PRs, end-to-end type safety).

Conclusion

To conclude, BFF is a design pattern that brings team autonomy through the decoupling of architecture systems.

On a more technical aspect, BFFs also reduce network & CPU usage and lower frontend applications’ memory footprints. They can provide the flexibility required to scale but must be used knowing their pitfalls and matching mitigation strategies.

BFF might not be a good fit for every team scale stage or organization. So, in this series’ last post, we’ll walk through BFF alternatives and highlight a solution per team scale level.

Thanks for reading! 🙏🏼
👏🏻👏🏻👏🏻 Give a few claps and “follow” if you enjoyed this series.

💌 Follow our latest posts on Twitter and LinkedIn and discover our latest stories on Medium 🚀

Acknowledgments
And a big thank you to Jérome Molière, Laurent Thiebault, Alexandre Faria, and Ramzi ACHOURI for their thorough reviews and feedback. Thanks guys!