WebRTC Architectures: Advantages & Limitations

Published in
5 min readMar 14, 2024


WebRTC (Web Real-Time Communication), an open-source protocol developed by Google, is supported by all major browsers, eliminating the need for third-party audio and video streaming plugins.

However, it’s important to understand that while WebRTC is easy to setup and works great for small-scale communication uses (1–4 participants), it gets complex at scale. WebRTC technology was designed for connecting Peer-2-Peer without intermediary servers so it can’t handle large audiences on its own. To scale an experience across more than a handful of devices it requires backend infrastructure. There are a few different approaches to implementing scalable backends for WebRTC.

There are three main architectures which are used in practice: Peer to peer (Mesh), Selective Forwarding Unit (SFU), and Multipoint Control Unit (MCU). Most teams start with one of these but eventually adopt a hybrid approach, using combinations of Mesh/SFU/MCU depending on use-case and the number of participants on the stream.

In this post we’ll take a look at each of these WebRTC Architectures, with a focus on their advantages, and limitations.


Participants connect directly to each other, creating a web-like mesh network. Mesh infrastructures are ideal for: Small group video calls (2–4 participants); P2P file sharing applications; Low-latency gaming applications where direct connection is crucial


  • Scalability: Works well for small groups as direct peer-to-peer connections reduce server load.
  • Latency: Under the right network conditions there’s minimal latency due to direct communication between peers.
  • Privacy: No central server involved, potentially enhancing privacy.


  • Scalability: Doesn’t scale well for large numbers of participants due to complex connection management.
  • Security: Requires additional security measures to manage peer discovery and authentication.
  • Network Issues: Relies on good peer-to-peer connectivity, which can be affected by firewalls, NAT (Network Address Translation), or general network congestion.

Selective Forwarding Unit (SFU)

Participants connect to a media server that decides which streams to forward between participants without modifying the content. SFU infrastructures are ideal for: Video conferencing for medium-sized groups (5–15 participants); Webinars where one-way communication (presenter to audience) is primary; Applications requiring efficient use of server resources.


  • Scalability: SFU acts as a central hub, forwarding media streams to connected participants, making it scalable for larger groups.
  • Security: Centralized control allows for easier implementation of security measures.
  • Network Issues: Mitigates some network issues by handling media routing.


  • Latency: Introduces latency due to the additional hop through the SFU.
  • Single Point of Failure: SFU becomes a central point of failure, impacting overall system reliability.
  • Cost: Requires additional server resources for the SFU.

Multipoint Control Unit (MCU)

Participants connect to a media server that receives, processes (mixing, recording, layout), and distributes media streams among participants. MCU infrastructures are ideal for: Large conferences or webinars with many participants (15+); Applications requiring advanced features like stream recording, content mixing, or complex layouts; Scenarios with unreliable or varying network conditions on participant ends.


  • Scalability: Highly scalable for large conferences with many participants.
  • Content Processing: MCUs can perform advanced processing like mixing audio/video streams, recording, and layout management.
  • Network Issues: MCUs can handle some network variations by adapting streams for different bandwidths.


  • Latency: Higher latency compared to Mesh due to centralized processing at the MCU.
  • Cost: MCUs require significant server resources for processing, increasing cost.
  • Complexity: Setting up and managing MCUs can be complex.


Combines elements of all the other architectures to create a solution that adapts. For example, Mesh for small groups, MCU for large conferences. Hybrid architectures are generally used when you have varying numbers of participants in a single session; scenarios requiring scalability and advanced features like recording, layout management, or stream mixing.


  • Flexibility: Combines the strengths of different architectures. (e.g., Mesh for small groups, MCU for large conferences)
  • Scalability: Adapts to varying user numbers.
  • Cost-Effectiveness: Can be more cost-effective by using SFU/MCU only when needed.


  • Complexity: Requires more complex design and management compared to single-type architectures.
  • Latency: Latency can vary depending on which architecture is used for specific connections.
  • Troubleshooting: Identifying issues can be more challenging due to a mix of architectures.

Making the Choice

Choosing the right WebRTC architecture depends on your application’s specific needs. Consider factors like the number of participants, latency requirements, security concerns, and cost before making a decision. With WebRTC there’s no one-size-fits-all solution.

Here’s where a Communication Platform as a Service (CPaaS) provider like Agora becomes invaluable. Agora offers scalable solutions that remove the complexities and allow teams to focus on their core business.

CPaaS providers deliver value beyond infrastructure, they enable developers and teams to roll out new features faster. In the case of providers like Agora, they also provide their customers with security and privacy compliances, and can provide expert support with their industry experience.

As you explore live streaming solutions, I invite you to explore Agora’s suite of real-time APIs and SDKs. Take advantage of real-time solutions that will set your business on a path to success, with reliability and excellence.




Director of DevRel @ Agora.io … former CTO @ webXR.tools & AR Engineer @ Blippar — If you can close your eyes & picture it, I can find a way to build it