Topic Hub

Databases in System Design

Databases are the durability and correctness backbone of distributed systems. Most architectural decisions eventually reduce to data modeling, consistency guarantees, and read-write scaling behavior.

Start Practicing: Database Replication & Read Scaling

What It Is

A database is the authoritative system that stores and serves application state with defined guarantees for durability, consistency, isolation, and query behavior. Relational, document, key-value, and specialized engines each optimize for different workloads. The right choice is less about hype and more about matching access patterns, correctness needs, and operational constraints.

When to Use It

Use relational databases when transactions, referential integrity, and complex joins are core requirements. Billing, inventory, and identity systems benefit from strong ACID guarantees.

Use document or key-value stores when access patterns are dominated by single-key lookups with flexible schemas. Session stores, user preferences, and content catalogs are common fits.

Use specialized engines (time-series, graph, search) only when a measurable query pattern cannot be served efficiently by a general-purpose store. Each additional engine adds operational cost.

Why Databases Matters

Product features become reliable only when their data model is reliable. Order systems, messaging timelines, billing workflows, and identity services all depend on predictable write semantics and query patterns. A mismatch between data model and workload creates chronic latency and incident risk.

Scaling pressure usually appears first in storage. Traffic can be distributed across stateless services, but write coordination, index maintenance, and transaction boundaries converge in the database layer. Early schema and indexing decisions strongly influence how gracefully the system grows.

Database architecture also determines recovery posture. Backup cadence, replication topology, and failover procedures define real-world RTO and RPO outcomes. Teams that treat these as secondary concerns often discover during incidents that theoretical durability was never operationalized.

Data quality also drives downstream analytics and machine learning outcomes. If operational schemas lack clear ownership, retention rules, and consistency guarantees, downstream reporting drifts from reality. Good database discipline in transactional paths improves decision quality across the entire organization.

Core Concepts and Mental Models

Model data around access paths, not abstract entities alone. If critical queries need low latency, ensure indexes and partition keys align with those exact filters and sort orders. Query-driven modeling reduces expensive rewrites later when traffic and feature complexity increase.

Separate transactional boundaries from analytical workloads. OLTP systems need tight latency and correctness, while analytics can tolerate batch latency and denormalization. Isolating these patterns protects user-facing paths and gives data teams freedom to optimize heavy reporting separately.

Replication and partitioning solve different problems. Replication improves read scale and availability, while partitioning addresses write throughput and dataset size. Systems that mix these concerns without clear intent often end up with expensive topologies that solve neither bottleneck well.

Key Tradeoffs

Decision	Upside	Downside	Guidance
SQL vs NoSQL	SQL provides transactions and referential integrity	NoSQL offers flexible schema and horizontal write scaling	Choose based on query patterns and consistency needs, not popularity
Normalized vs denormalized schema	Normalization reduces write anomalies	Denormalization reduces join cost on read-heavy paths	Normalize writes, denormalize read models when latency requires it
Single-region vs multi-region	Single-region is simpler and cheaper	Multi-region improves availability and local read latency	Start single-region; add replication when RTO/RPO or latency targets demand it

Common Mistakes

Over-normalization on hot read paths: excessive joins increase latency. Denormalize read models where access frequency justifies the trade.
Treating eventual consistency as free: it reduces coordination cost but introduces stale reads and conflict resolution complexity. Reserve it for paths where business rules tolerate temporal lag.
Database sprawl without justification: each new engine adds backup complexity, operational cost, and incident coordination burden. Add engines only for measurable bottlenecks.

Implementation Playbook

Begin with a schema that captures invariants explicitly. Use constraints, unique indexes, and transaction boundaries to encode business rules near the data layer, not only in application code. This prevents drift across services and reduces silent corruption during edge-case failures.

Instrument query performance continuously. Track slow queries, lock wait time, replication lag, and index hit ratio. Visibility into these signals lets teams catch regressions from feature launches or migration scripts before they become user-facing incidents.

Treat migrations as deployable software. Prefer additive changes, dual-write or backfill phases when needed, and rollback-safe plans. Data migration mistakes are harder to reverse than stateless code rollbacks, so cautious sequencing is a core engineering discipline.

Formalize backup and restore drills. A backup policy is only credible when restore time and data integrity are tested regularly. Teams that rehearse restores by service domain recover faster during real incidents and avoid discover-on-fire surprises in tooling, credentials, or data lineage.

Practice Path for Databases

Course Chapters

Database Fundamentals
Storage models, transactions, and index basics for production systems.
Database Scaling
Replication, partitioning, and write path constraints at scale.
Data Models and Query Languages
Choosing relational, document, or hybrid schema strategies.
Storage Engines and Indexing
B-trees, LSM trees, and the read-write amplification tradeoff.

Guided Labs

Database Replication & Read Scaling
Add read replicas to scale database reads and add failover for high availability.
NoSQL & Document Databases
When to choose NoSQL over SQL and how to model data for document stores like MongoDB.
Schema Design Workshop
Balance normalized OLTP schemas, denormalized read models, and zero-downtime migration controls.
Transactions & Isolation in Practice
Design a checkout flow that keeps correctness under retries, async workers, and partial failures.

Challenge Progression

1.Cake Shop 1 - Going OnlineCake Shop · easy
2.IoT Platform 1 - Smart Home HubIoT Platform · easy
3.Payment Gateway 1 - Online CheckoutPayment Gateway · easy
4.Social Feed 1 - MVP LaunchSocial Feed · easy
5.Bookmark ManagerStarter · easy
6.Contact Form ServiceStarter · easy

Public Solution Walkthroughs

Cake Shop 1 - Going OnlineFull solution walkthrough with architecture breakdown
IoT Platform 1 - Smart Home HubFull solution walkthrough with architecture breakdown
Payment Gateway 1 - Online CheckoutFull solution walkthrough with architecture breakdown
Social Feed 1 - MVP LaunchFull solution walkthrough with architecture breakdown

Back-of-the-Envelope Estimation for System Design Interviews

A step-by-step framework for capacity estimation: QPS, storage, bandwidth, and memory calculations that interviewers actually expect.

10 min read

Database Scaling Strategies: Replication, Sharding, and Partitioning

A practical guide to scaling databases in system design: when to replicate, when to shard, and how partitioning strategies affect your architecture.

9 min read

Frequently Asked Questions

How do I choose between SQL and NoSQL for a new feature?

Start with required invariants and query patterns. SQL is usually better for transactions and relational constraints. NoSQL fits high-throughput, flexible-schema access paths. Many systems use both, but each store should have a clear bounded purpose.

When should I shard a database?

Shard when single-node vertical scaling, indexing, and replication strategies no longer satisfy throughput or storage requirements. Sharding adds major operational and query complexity, so delay it until objective limits are clear.

What database metrics are most important in production?

Watch p95 and p99 query latency, lock contention, replication lag, connection saturation, and error rates by query class. Pair these with storage growth and checkpoint behavior to anticipate capacity issues.

Can read replicas replace a caching strategy?

They solve different layers of the problem. Replicas distribute read load and improve availability, while caches reduce repeated query cost and latency. High-scale systems usually use both with explicit consistency boundaries.

Topic Hub

Databases in System Design

Databases are the durability and correctness backbone of distributed systems. Most architectural decisions eventually reduce to data modeling, consistency guarantees, and read-write scaling behavior.

Start Practicing: Database Replication & Read Scaling

What It Is

When to Use It

Use relational databases when transactions, referential integrity, and complex joins are core requirements. Billing, inventory, and identity systems benefit from strong ACID guarantees.

Use document or key-value stores when access patterns are dominated by single-key lookups with flexible schemas. Session stores, user preferences, and content catalogs are common fits.

Use specialized engines (time-series, graph, search) only when a measurable query pattern cannot be served efficiently by a general-purpose store. Each additional engine adds operational cost.

Why Databases Matters

Core Concepts and Mental Models

Key Tradeoffs

Decision	Upside	Downside	Guidance
SQL vs NoSQL	SQL provides transactions and referential integrity	NoSQL offers flexible schema and horizontal write scaling	Choose based on query patterns and consistency needs, not popularity
Normalized vs denormalized schema	Normalization reduces write anomalies	Denormalization reduces join cost on read-heavy paths	Normalize writes, denormalize read models when latency requires it
Single-region vs multi-region	Single-region is simpler and cheaper	Multi-region improves availability and local read latency	Start single-region; add replication when RTO/RPO or latency targets demand it

Common Mistakes

Over-normalization on hot read paths: excessive joins increase latency. Denormalize read models where access frequency justifies the trade.
Treating eventual consistency as free: it reduces coordination cost but introduces stale reads and conflict resolution complexity. Reserve it for paths where business rules tolerate temporal lag.
Database sprawl without justification: each new engine adds backup complexity, operational cost, and incident coordination burden. Add engines only for measurable bottlenecks.

Implementation Playbook

Practice Path for Databases

Course Chapters

Database Fundamentals
Storage models, transactions, and index basics for production systems.
Database Scaling
Replication, partitioning, and write path constraints at scale.
Data Models and Query Languages
Choosing relational, document, or hybrid schema strategies.
Storage Engines and Indexing
B-trees, LSM trees, and the read-write amplification tradeoff.

Guided Labs

Database Replication & Read Scaling
Add read replicas to scale database reads and add failover for high availability.
NoSQL & Document Databases
When to choose NoSQL over SQL and how to model data for document stores like MongoDB.
Schema Design Workshop
Balance normalized OLTP schemas, denormalized read models, and zero-downtime migration controls.
Transactions & Isolation in Practice
Design a checkout flow that keeps correctness under retries, async workers, and partial failures.

Challenge Progression

1.Cake Shop 1 - Going OnlineCake Shop · easy
2.IoT Platform 1 - Smart Home HubIoT Platform · easy
3.Payment Gateway 1 - Online CheckoutPayment Gateway · easy
4.Social Feed 1 - MVP LaunchSocial Feed · easy
5.Bookmark ManagerStarter · easy
6.Contact Form ServiceStarter · easy

Public Solution Walkthroughs

Cake Shop 1 - Going OnlineFull solution walkthrough with architecture breakdown
IoT Platform 1 - Smart Home HubFull solution walkthrough with architecture breakdown
Payment Gateway 1 - Online CheckoutFull solution walkthrough with architecture breakdown
Social Feed 1 - MVP LaunchFull solution walkthrough with architecture breakdown

Back-of-the-Envelope Estimation for System Design Interviews

A step-by-step framework for capacity estimation: QPS, storage, bandwidth, and memory calculations that interviewers actually expect.

10 min read

Database Scaling Strategies: Replication, Sharding, and Partitioning

A practical guide to scaling databases in system design: when to replicate, when to shard, and how partitioning strategies affect your architecture.

What It Is

When to Use It

Why Databases Matters

Core Concepts and Mental Models

Key Tradeoffs

Common Mistakes

Implementation Playbook

Practice Path for Databases

Course Chapters

Guided Labs

Challenge Progression

Public Solution Walkthroughs

Related Articles

Back-of-the-Envelope Estimation for System Design Interviews

Database Scaling Strategies: Replication, Sharding, and Partitioning

Frequently Asked Questions

How do I choose between SQL and NoSQL for a new feature?

When should I shard a database?

What database metrics are most important in production?

Can read replicas replace a caching strategy?

What It Is

When to Use It

Why Databases Matters

Core Concepts and Mental Models

Key Tradeoffs

Common Mistakes

Implementation Playbook

Practice Path for Databases

Course Chapters

Guided Labs

Challenge Progression

Public Solution Walkthroughs

Related Articles

Back-of-the-Envelope Estimation for System Design Interviews

Database Scaling Strategies: Replication, Sharding, and Partitioning

Frequently Asked Questions

How do I choose between SQL and NoSQL for a new feature?

When should I shard a database?

What database metrics are most important in production?

Can read replicas replace a caching strategy?