Designing Facebook NewsFeed

Architect Within
Jan 16 · 4 min read

Introduction

News Feed is a feature of the social network Facebook. The web feed is the primary system through which users are exposed to content posted on the network. News Feed highlights information that includes profile changes, upcoming events, and birthdays, among other updates. (Source Wiki)

Functional Requirements

  • Generated from People Page and Groups a user follows
  • Users should be able to see live posts
  • Users should be able to scroll down
  • User should be able to like and comment
  • Need Feed should display text, video , links and images

Non Functional Requirement

  • High Available
  • Low Latency
  • Eventual Consistency
  • Real time news feed in 2sec and 5 sec to fan out new posts

Capacity Estimation

  • 200 Mil Daily users each fatches minimum 5 times a day
  • This leads to around 1 B request per day to fecth news feed
  • 11, 500 Req per second
  • 500 Post in each users Memory at a time
  • 1 Post are Videos
  • 19 Post Images
  • 250 Post Links
  • 230 Post Text
  • Each text post average 1KB
  • Each Image 100KB
  • Each Video 10 KB (Stored as Link and then downloaded when you reach the video so it wont be stored in memory)
  • Total Storage 1.4 MB Per User ~1MB
  • 200 Mill Users total data = 200TB Data
  • 1 Server Can store upto 100GB then we will need 2000 Machines

High Level Design

FBCDN

  • Goal is to maximize cache hit ratio
  • Facebook Cache stores Photos and Videos on NAND Flash based SSD’s
  • SSDs have much higher capacity then DRAM and much higher read rate then Hard Drives
  • SSD suffer from write implications — A single logical write results in multiple physical writes to the device. Higher write amplification is undesirable because it decreases the throughput and the lifespan of the SSD.
  • Relative priority queue interface of RIPQ to implement a caching algorithm:
  • insert(x,p) on cache miss,
  • increase(x,p’) on cache hit,
  • delete-min() is called implicitly for cache eviction.

API

We can have SOAP or REST APIs for getting the field.

Get User Feed

Database

3 Main Actors :

User

Page

Post

User Can Follow User or Page

Both User and Page can post feed link, image , video or text

Each FeedItem will have ID of user who created it

Each FeedItem can optionally have PageID

GraphDB are generally suited for managing friend relationship and recommendations.

Cache

  • Cache is extremely important, user can cache multiple things
  • News Feed ID
  • Content — Popular Content in hot cache
  • Social Graph — Store user relationship data
  • Action — What action taken by user
  • Counters for like , dislike etc

Design

Feed Generation Steps

  1. Retrieve IDs of all users and entities that user follows.
  2. Retrieve latest, most popular and relevant posts for those ID.
  3. Rank these posts based on relevance.
  4. Store this feed in the cache and return top posts
  5. When User scrolls down new page gets loaded from server and old gets pushed out of Cache

We need mechanism to load new posts when user is online, we can do periodically say 5 min new loads

Detailed Component Design

  1. Celebrity problem — Slow for people with lots of connection.
  2. Generating entire time on load can be painfully slow
  3. Live updates result in feed update for all followers making it further slow
  4. For Live posts, server notifying or pushing can lead to heavy loads, best it pre generate the timeline

Solution — Offline generation of news feed. Dedicated servers that generates userfeeds and store them in memory. This way user feed is not compiled on load.

Feed generated based on Last Generated Feed.

Feed Generation

Fanout — Pushing Feed to all users

Push — Fanout on Write

Pull — Fanout on Load

Pull Model

  • Keep all the recent feed data in memory and can pull it from the server whenever they need it
  • Manually or on Request basis
  • Stale data till user issues pull
  • Most pull request result in empty response as there are not many updates resulting in waste of resources

Push Model

  • Once user publishes post it can be published immediately to all followers-Quick
  • Less Reads and also no need of pulling for each follower
  • Faster due to feed precomputed
  • Long Poll for receiving updates
  • HotKey Problem if user has lot of followers has to push to many
  • For users rarely logs precompute is waste of time

Hybrid Model

  • Pull for users with less followers
  • Post for Celebrity
  • Or Limit fanout to online friends

Other things to consider

Hotkey problem can be resolved with consistent hashing.

Vertical Scaling vs Horizontal Scaling

SQL vs NoSql

Master Slave Replication

Read Replicas

Consistency Models

Database Sharding

Metrics

QPS

Latency

building-systems

PHD in building Systems