Design a Social Feed | System Design Case Study

Step 1: Clarify Requirements

Functional Requirements

A user can publish a post (text, image, video).
A user sees a feed of posts from accounts they follow, ordered by relevance or recency.
Users can like, comment on, and share posts.
The feed updates in near-real-time when new posts are published.
Support for both a "Following" feed (chronological from followed accounts) and an algorithmic "For You" feed.

Non-Functional Requirements

Feed load time under 500ms (p99).
Support 500 million monthly active users, 200 million DAU.
Highly available: feed must always load, even if slightly stale.
Eventually consistent: a post may take a few seconds to appear in all followers' feeds.

Step 2: Back-of-Envelope Estimates

EstimationDAU: 200 million
Posts per user per day: 0.5 (many users only read)
New posts/day: 100 million
Posts/second: ~1,150

Feed reads per user per day: 10
Feed reads/second: 200M * 10 / 86,400 = ~23,000 reads/sec

Average followers per user: 200
A post from a user with 200 followers = 200 fan-out writes

Celebrity problem:
  A user with 10M followers = 10M fan-out writes for ONE post
  This is the core scaling challenge

Step 3: High-Level Design

Step 4: Deep Dive

The Fan-Out Problem

When a user publishes a post, it must appear in the feeds of all their followers. This is called fan-out. There are two strategies:

Fan-Out on Write (Push Model)

When a post is created, immediately write it to the feed cache of every follower.
Pros: Feed reads are fast; just read the pre-built feed from cache.
Cons: Write amplification is enormous for accounts with millions of followers. A celebrity posting triggers millions of writes.
Best for: Users with a manageable number of followers (under ~10K).

Fan-Out on Read (Pull Model)

Feed is built at read time by querying posts from all accounts the user follows.
Pros: No write amplification. A post is stored once.
Cons: Slow reads; must query and merge posts from hundreds of followed accounts on every feed load.
Best for: Celebrity accounts with millions of followers.

Hybrid Approach (What Real Systems Use)

Real-world social media platforms use a hybrid model:

Regular users (under ~10K followers): Fan-out on write. Pre-build each follower's feed cache.
Celebrity / high-follower users (over ~10K followers): Fan-out on read. Their posts are NOT pushed to every follower's cache.
When a user reads their feed, the feed service merges the pre-built cache (from regular users they follow) with a real-time query for posts from celebrity accounts they follow.

Feed Cache Design

Redis StructureKey: feed:{user_id}
Value: Sorted Set of post_ids scored by timestamp

ZADD feed:user123 1707400000 post_abc
ZADD feed:user123 1707400100 post_def

# Read the latest 20 posts:
ZREVRANGE feed:user123 0 19

# Only cache the latest ~500 posts per user
# Older posts are fetched from the database on pagination

Ranking / Algorithmic Feed

A chronological feed simply sorts by time. An algorithmic "For You" feed ranks posts by predicted engagement:

Candidate generation: Retrieve a pool of candidate posts (from followed accounts, friend-of-friend, trending, etc.).
Feature extraction: For each post, extract features: post age, author engagement rate, media type, user's historical interactions with the author, text/topic similarity to user interests.
Scoring: A machine learning model (often a deep neural network) predicts the probability of the user engaging with each post (like, comment, share, dwell time).
Re-ranking: Apply business rules: diversity (avoid showing 5 posts from the same author), freshness boost, demotion of low-quality content, policy filters.
Final feed: Return the top-N ranked posts to the client.

Post Storage

SchemaTABLE posts (
    post_id       BIGINT PRIMARY KEY,    -- Snowflake ID
    author_id     BIGINT NOT NULL,
    content_text  TEXT,
    media_urls    TEXT[],                 -- S3 URLs for images/videos
    created_at    TIMESTAMP,
    like_count    BIGINT DEFAULT 0,
    comment_count BIGINT DEFAULT 0,
    share_count   BIGINT DEFAULT 0
);

TABLE social_graph (
    follower_id   BIGINT,
    followee_id   BIGINT,
    created_at    TIMESTAMP,
    PRIMARY KEY (follower_id, followee_id)
);
INDEX idx_followee ON social_graph(followee_id);
-- "Who does user X follow?" -> query by follower_id
-- "Who follows user Y?" -> query by followee_id

Media Handling

Images and videos are uploaded to object storage (S3) and served via a CDN.
On upload, generate multiple resolutions (thumbnail, medium, full) asynchronously using a media processing pipeline.
Store only the media URLs in the posts table, not the media data itself.
Videos are transcoded into multiple bitrates for adaptive streaming (HLS/DASH).

Step 5: Scaling & Optimizations

Caching Layers

Feed cache (Redis): Pre-built feeds for each user. The primary read path.
Post cache: Cache hot posts by post_id. Avoids hitting the database for viral posts being viewed millions of times.
Social graph cache: Cache the follower lists for high-follower accounts to avoid graph queries during fan-out.
CDN: All images, videos, and static assets served from edge locations.

Database Sharding

Posts table: Shard by author_id. All posts by a user are on the same shard, enabling efficient "get all posts by user X" queries.
Social graph: Shard by follower_id. "Who does user X follow?" is a single-shard query.
Feed cache: Shard by user_id across Redis cluster nodes.

Handling Viral Posts

A viral post generates enormous read traffic (millions of views) and engagement writes (likes, comments).
Read path: Cache the post aggressively. CDN for media. Serve from replicas.
Write path: Buffer like and comment counts in Redis and batch-flush to the database periodically (e.g., every 5 seconds) rather than writing every individual like as a row update.

Architecture Summary

Component	Technology	Purpose
Post storage	PostgreSQL / MySQL (sharded)	Durable post data
Feed cache	Redis Sorted Sets	Pre-built per-user feeds
Social graph	Graph DB or sharded SQL	Follower/following relationships
Fan-out	Kafka + workers	Async fan-out on write
Media	S3 + CDN + transcoder	Image/video storage and delivery
Ranking	ML inference service	Algorithmic feed scoring
Notifications	Push service (APNs, FCM)	New post notifications
Search	Elasticsearch	Post and user search

Key Takeaways

The news feed problem is fundamentally a fan-out problem. Use a hybrid approach: push for regular users, pull for celebrities.
Pre-build feeds in Redis Sorted Sets for fast reads. Merge with real-time celebrity post queries at read time.
Ranking transforms a simple chronological feed into a personalized, engagement-maximizing experience using ML models.
Buffer engagement counters (likes, views) in Redis and batch-flush to the database to handle viral content spikes.
Shard posts by author, social graph by follower, and feed cache by user for optimal query patterns.

Case Study: Design a Social Media Feed