Design a URL Shortener | System Design Case Study

Step 1: Clarify Requirements

Functional Requirements

Given a long URL, generate a unique short URL.
Given a short URL, redirect the user to the original long URL.
Short links expire after a configurable TTL (optional).
Users can optionally choose a custom alias.
Analytics: track click count, referrers, geographic data (optional).

Non-Functional Requirements

Very low latency for redirects (under 50ms).
Highly available: a broken short link is a broken experience.
The system is read-heavy. Read:write ratio of ~100:1 or higher.
Short URLs must be unique: no collisions.

Step 2: Back-of-Envelope Estimates

EstimationWrite: 100 million new URLs / month
     = ~40 URLs/second

Read:  100:1 read-to-write ratio
     = ~4,000 redirects/second

Storage (5 years):
     100M * 12 * 5 = 6 billion records
     Average record: ~500 bytes (short code + long URL + metadata)
     6B * 500B = 3 TB

Short code length:
     Base62 (a-z, A-Z, 0-9) with 7 characters
     = 62^7 = ~3.5 trillion unique codes
     More than enough for 6 billion records

Step 3: High-Level Design

Step 4: Deep Dive

Short Code Generation

Three main approaches:

Hash + Truncate

Hash the long URL (MD5/SHA-256), take the first 7 characters (base62-encoded).
Collision risk: must check the database and retry with different salt on collision.
Same URL always generates the same code (deterministic).

Counter / Snowflake ID

Use an auto-incrementing counter or a distributed ID generator (Snowflake).
Convert the numeric ID to base62 to get the short code.
No collision risk: every ID is unique by definition.
Requires a distributed counter service for multi-node setups.

Database Schema

SchemaTABLE url_mappings (
    short_code   VARCHAR(7)  PRIMARY KEY,   -- base62 code
    long_url     TEXT        NOT NULL,
    user_id      BIGINT,                    -- creator (nullable)
    created_at   TIMESTAMP   DEFAULT NOW(),
    expires_at   TIMESTAMP,                 -- optional TTL
    click_count  BIGINT      DEFAULT 0
);

INDEX idx_long_url ON url_mappings(long_url);
-- For deduplication: check if long_url already exists

Database Choice

Both SQL and NoSQL work well here:

SQL (PostgreSQL, MySQL): Simple key-value lookup by short_code. ACID guarantees for uniqueness. Partition by short_code range for horizontal scaling.
NoSQL (DynamoDB, Cassandra): Excellent for key-value access patterns. Built-in horizontal scaling. DynamoDB with short_code as the partition key gives single-digit millisecond reads.

Read Path (Redirect)

User visits short.ly/abc123.

Server extracts the short code abc123.

Check the cache (Redis). On cache hit, return 301/302 redirect immediately.

On cache miss, query the database. Store the result in cache for future hits.

Return HTTP 301 (permanent) or 302 (temporary) redirect to the long URL.

Asynchronously emit a click event to the analytics pipeline.

301 vs 302 Redirect

301 (Permanent) tells the browser to cache the redirect: subsequent visits skip the server entirely. Good for performance, bad for analytics (you lose visibility). 302 (Temporary) forces the browser to check the server every time: slightly slower but captures every click for analytics.

Caching Strategy

With a 100:1 read-to-write ratio, caching is critical. A Redis cluster caching the top 20% of URLs (by popularity) can serve ~80%+ of all redirects from memory.

Cache key: short_code
Cache value: long_url
Eviction: LRU with a TTL of 24-48 hours.
Write-through: On URL creation, immediately write to cache.

Step 5: Scaling & Bottlenecks

Database sharding: Shard by the first character of the short_code (range-based) or hash the short_code. 26 shards (a-z) or use consistent hashing for more even distribution.
Geographic distribution: Deploy read replicas and cache servers in multiple regions. Redirects are latency-sensitive.
Rate limiting: Limit URL creation per user/IP to prevent abuse.
Analytics pipeline: Use Kafka to buffer click events asynchronously. Write to a columnar store (ClickHouse, BigQuery) for analytics queries.
Expiration: Run a background job to delete expired URLs. Mark as expired in cache to avoid serving stale redirects.

Key Takeaways

A URL shortener is a classic read-heavy system: optimize the read path with aggressive caching.
Base62 encoding of a unique counter is the simplest collision-free approach for generating short codes.
The system requires only a simple key-value lookup, making it well-suited for both SQL and NoSQL databases.
Use 301 redirects for performance or 302 redirects for analytics, depending on your priorities.
Scale reads with cache and replicas; scale writes with sharding and distributed ID generation.

Case Study: Design a URL Shortener