HardSearch Engine · Part 2

Search Engine 2 - Ranking, Autocomplete & Global Scale

DatabasesCachingShardingAnalyticsGeo DistributionSearch

This challenge builds on Search Engine 1 - Web Crawler & Index. Complete it first for the best experience.

Problem Statement

FindIt has expanded from developer docs to a full-scale web search engine indexing 1 billion pages. The system must now handle:

- Link-based ranking (PageRank) - compute a global authority score for every page based on the web link graph. This is a massive offline computation that runs periodically over billions of nodes and edges.Autocomplete / query suggestions - as the user types, suggest completions from a trie of popular queries (updated hourly). Autocomplete must respond in < 50 ms.Personalized results - use search history and click-through data to re-rank results per user.Spell correction - handle misspelled queries ("javscript tutorial" → "javascript tutorial").Multi-region deployment - serve search results from the nearest data center. Index replicas in each region.Query throughput - handle 100,000 search queries per second at peak.

This is a capstone-level challenge combining information retrieval, graph algorithms, ML ranking, and planetary-scale infrastructure.

What You'll Learn

Scale to 1 B pages with PageRank, autocomplete, personalized results, and multi-region serving. Build this architecture under realistic production constraints, then validate tradeoffs in the design lab simulation.

DatabasesCachingShardingAnalyticsGeo DistributionSearch

Constraints

Indexed pages~1,000,000,000
Index size~100 TB
PageRank computationRuns daily (batch)
Query throughput~100,000 QPS
Search latency (P99)< 300 ms
Autocomplete latency< 50 ms
Regions6
Availability target99.99%
ApproachClick to expand

Interview-Ready Approach

1) Clarify Scope and SLOs

  • Problem statement: Scale to 1 B pages with PageRank, autocomplete, personalized results, and multi-region serving.
  • Design for a peak load target around 20,000 RPS (including burst headroom).
  • Indexed pages: ~1,000,000,000
  • Index size: ~100 TB
  • PageRank computation: Runs daily (batch)
  • Query throughput: ~100,000 QPS
  • Search latency (P99): < 300 ms

2) Capacity Planning Method

  • Convert traffic and growth constraints into request rate, storage growth, and concurrency budgets.
  • Keep at least 2-3x safety margin per tier (ingress, compute, storage, async workers).
  • Reserve explicit latency budgets per hop so p95 can be defended in review.

3) Architecture Decisions

  • Databases: Define a clear system-of-record and design read/write paths separately before adding optimizations.
  • Caching: Put cache on hot read paths first and pick cache-aside or write-through explicitly.
  • Sharding: Choose shard keys around access patterns and growth hotspots, not just data size.
  • Analytics: Maintain separate OLTP and analytics paths; stream events into a warehouse/time-series layer.
  • Geo Distribution: Route users to nearest region/edge while keeping write-consistency boundaries explicit.
  • Search: Use primary store for writes and async index updates for search relevance + scale.

4) Reliability and Failure Strategy

  • Use strong write constraints (transactions or conditional writes) and explicit backup/restore strategy.
  • Bound staleness with TTL + invalidation hooks for critical entities.
  • Support rebalancing and hotspot detection from day one.
  • Version event schemas and monitor drop/late-event rates.
  • Design region failover and data residency controls as first-class requirements.

5) Validation Plan

  • Run one peak-load test, one dependency-degradation test, and one failover test.
  • Verify idempotency for all retried writes and async consumers.
  • Track user-facing SLOs first: p95 latency, error rate, and successful throughput.

6) Trade-offs to Call Out in Interviews

  • Databases: SQL gives stronger transactional guarantees; NoSQL often gives better write scaling and flexibility.
  • Caching: Higher hit rate cuts latency/cost, but stale data and invalidation bugs become primary risks.
  • Sharding: Sharding improves horizontal scale but makes cross-shard queries and transactions harder.
  • Analytics: Analytics pipeline unlocks insights, but adds eventual consistency and governance overhead.
  • Geo Distribution: Global latency improves, but cross-region consistency and operations become harder.

Practical Notes

  • PageRank is computed offline using a distributed computation framework (MapReduce / Spark) over the link graph.
  • Autocomplete can use a prefix trie stored in-memory (or Redis) with the top-k completions pre-computed per prefix.
  • Partition the index by document so each query fans out to all shards in parallel, then merge-sort results.

Learn the Concept

Practice Next

Reference SolutionClick to reveal

Why This Solution Works

Request path: The solution keeps ingress, service logic, and stateful dependencies separated so each layer can scale independently.

Reference flow: Web Clients -> DNS -> Load Balancer -> Core Service -> Primary NoSQL DB -> Replica SQL DB -> Redis Cache -> Event Bus

Design strengths

  • Cache sits on the read path to absorb repeated queries and keep DB pressure stable.
  • Analytics pipeline is separated from OLTP path to avoid reporting workloads impacting transactions.

Interview defense

  • This design makes bottlenecks explicit (ingress, core compute, persistence, async workers).
  • It supports progressive scaling without re-architecting the core request path.
  • It keeps correctness-sensitive state changes in durable systems while offloading background work asynchronously.