Relational Databases (SQL)
Relational databases store data in structured tables with rows and columns. Relationships between tables are expressed through foreign keys. They use SQL (Structured Query Language) for querying.
Examples: PostgreSQL, MySQL, Oracle, SQL Server, SQLite.
ACID Properties
Relational databases guarantee ACID transactions:
- AtomicityA transaction either completes entirely or not at all. If any part fails, the entire transaction is rolled back.
- ConsistencyA transaction moves the database from one valid state to another. All constraints (unique, foreign key, check) are enforced.
- IsolationConcurrent transactions behave as if they were executed sequentially. Different isolation levels trade strictness for performance.
- DurabilityOnce a transaction is committed, the data is permanently written, even if the system crashes immediately after.
Indexing
An index is a data structure (usually a B-tree or B+ tree) that speeds up read queries by avoiding full table scans. Without an index, the database must examine every row to find matches. With an index, it can jump directly to the relevant rows.
Normalization
Normalization organizes data to reduce redundancy and improve integrity:
- 1NF: Each column holds atomic values; no repeating groups.
- 2NF: 1NF + every non-key column depends on the entire primary key.
- 3NF: 2NF + no transitive dependencies (non-key columns depend only on the primary key).
Normalization reduces data duplication but can increase the number of JOINs needed for queries. In high-performance systems, deliberate denormalization, duplicating data across tables, is common to avoid expensive joins.
Non-Relational Databases (NoSQL)
NoSQL databases sacrifice some relational guarantees for flexibility, scalability, or performance. There are four primary categories:
Key-Value Stores
The simplest NoSQL model. Data is stored as key-value pairs. Keys are unique identifiers; values can be anything (strings, JSON, binary). Operations are limited: GET, SET, DELETE. Extremely fast for simple lookups.
Use cases: Caching, session storage, feature flags, rate limiting counters.
Document Stores
Store data as documents (typically JSON or BSON). Each document can have a different structure. Documents are grouped into collections. Support rich queries within documents.
Use cases: Content management systems, user profiles, product catalogs, any domain where schema flexibility matters.
Column-Family Stores
Data is organized by columns rather than rows. Each row can have a different set of columns, and columns are grouped into families. Optimized for writing large volumes of data and for reading specific columns across many rows.
Use cases: Time-series data, IoT event logs, analytics, write-heavy workloads.
Graph Databases
Store data as nodes and edges (relationships). Queries traverse the graph efficiently, making them ideal for connected data. Finding the shortest path or all friends of friends is natural and fast.
Use cases: Social networks, fraud detection, recommendation engines, knowledge graphs.
SQL vs. NoSQL: When to Choose Which
Choose SQL When
- Your data has clear relationships and a stable schema.
- You need ACID transactions.
- You need complex queries with JOINs, aggregations, subqueries.
- Data integrity is critical (financial, medical).
- Your read and write patterns are balanced.
Choose NoSQL When
- Your schema evolves frequently or varies per record.
- You need horizontal scalability above all else.
- Your access patterns are simple (key lookups, document reads).
- You handle massive write volumes.
- Eventual consistency is acceptable for your use case.
Transactions and Isolation Levels
When multiple transactions run concurrently, isolation levels control how much they see of each other's uncommitted work:
| Isolation Level | Dirty Reads | Non-Repeatable Reads | Phantom Reads | Performance |
|---|---|---|---|---|
| Read Uncommitted | Possible | Possible | Possible | Fastest |
| Read Committed | Prevented | Possible | Possible | Fast |
| Repeatable Read | Prevented | Prevented | Possible | Moderate |
| Serializable | Prevented | Prevented | Prevented | Slowest |
Most databases default to Read Committed. PostgreSQL uses Repeatable Read via MVCC (Multi-Version Concurrency Control), which provides each transaction a consistent snapshot of the database without locking reads.
Storage Engines
Under the hood, databases use storage engines that determine how data is physically stored and retrieved:
B-Tree Based (e.g., InnoDB, PostgreSQL)
Organizes data in a balanced tree structure. Each node contains keys and pointers to child nodes. Leaf nodes contain actual data or pointers to data pages. Excellent for read-heavy workloads and range queries. Updates are done in-place.
LSM-Tree Based (e.g., RocksDB, LevelDB, Cassandra)
Log-Structured Merge trees batch writes into in-memory buffers (memtables), then flush them to sorted files (SSTables) on disk. Writes are always sequential, making them extremely fast. Background compaction merges SSTables. Better for write-heavy workloads.
B-Tree
- In-place updates
- Fast random reads
- Write amplification from page splits
- Better read/write balance
LSM-Tree
- Append-only writes
- Higher write throughput
- Read amplification from multiple levels
- Better for write-heavy workloads
Key Takeaways
- Relational databases provide strong guarantees (ACID) and are the right default for structured data with relationships.
- NoSQL databases offer flexibility and horizontal scalability, each type optimized for specific access patterns.
- Indexing is the most impactful optimization for read performance, but over-indexing hurts writes.
- Understand your data access patterns before choosing a database. Read-heavy vs. write-heavy, simple lookups vs. complex queries, single-record vs. range scans.
- Most production systems use multiple databases (polyglot persistence) to match each workload to its optimal store.
Chapter Check-Up
Quick quiz to reinforce what you just learned.