Unique ID Generation

Requirements for Distributed IDs

Before choosing an ID generation strategy, clarify what properties your system needs:

Uniqueness: No two records should ever share the same ID, even across multiple servers, data centers, or time zones. This is the non-negotiable requirement.
Sortability: IDs that are roughly time-ordered allow efficient range queries, make database indexes more cache-friendly, and let you sort by creation time without a separate column.
Scalability: ID generation must not become a bottleneck. Each application server should be able to generate IDs independently without coordinating with a central authority on every request.
Low latency: ID generation should add negligible overhead. Ideally, it should be a local computation, not a network call.
Compactness: Shorter IDs use less storage and are faster to index. A 64-bit integer is far more efficient than a 128-bit UUID in B-tree indexes.

Approach 1: UUID v4 (Random)

A UUID (Universally Unique Identifier) version 4 is a 128-bit randomly generated value, typically represented as a 36-character string like 550e8400-e29b-41d4-a716-446655440000.

How It Works

Generate 122 random bits (6 bits are reserved for version and variant fields). The probability of collision is astronomically low: you would need to generate about 2.7 × 10¹⁸ UUIDs to have a 50% chance of a single collision.

// UUID v4 generation (pseudocode)
function generateUUIDv4():
  bytes = random(16)           // 16 random bytes = 128 bits
  bytes[6] = (bytes[6] & 0x0F) | 0x40  // version 4
  bytes[8] = (bytes[8] & 0x3F) | 0x80  // variant 1
  return formatAsHex(bytes)
  // Example: "f47ac10b-58cc-4372-a567-0e02b2c3d479"

Pros

No coordination needed. Any server, anywhere, can generate a UUID independently.
Extremely low collision probability without any shared state.
Widely supported across all programming languages and databases.

Cons

Not sortable by time. Random UUIDs scatter inserts across the entire B-tree index, causing poor write performance in databases (especially InnoDB which clusters by primary key).
Large size: 128 bits (16 bytes) as binary, 36 characters as string. Significantly larger than a 64-bit integer.
Poor index locality: Random distribution means every insert touches a different page in the B-tree, leading to excessive disk I/O.

Approach 2: Database Auto-Increment

The simplest approach: let the database assign a sequential integer ID using AUTO_INCREMENT (MySQL) or SERIAL / IDENTITY (PostgreSQL).

Why It Breaks in Distributed Systems

Single point of failure: The database is the sole ID generator. If it goes down, no IDs can be created.
Scalability ceiling: All inserts must go through one database (or one master in a primary-replica setup) to maintain sequential ordering.
Merging is difficult: If you have two database shards each using auto-increment, they will produce conflicting IDs (both will have ID 1, 2, 3...).

Mitigation: Offset and Stride

Use different starting offsets and step sizes per shard. For 3 shards:

Shard 1: 1, 4, 7, 10, 13, ...  (start=1, step=3)
Shard 2: 2, 5, 8, 11, 14, ...  (start=2, step=3)
Shard 3: 3, 6, 9, 12, 15, ...  (start=3, step=3)

This ensures uniqueness but IDs are no longer globally sorted by time. Adding or removing shards is also painful because you must change the step size across all shards.

Approach 3: Twitter Snowflake

Twitter's Snowflake is the most widely referenced distributed ID generator. It produces 64-bit IDs that are roughly time-sorted, unique across data centers, and generated without coordination between machines.

Bit Layout

1 bit: Sign bit, always 0 (ensures positive numbers).
41 bits: Millisecond timestamp since a custom epoch. 2⁴¹ milliseconds ≈ 69 years from the epoch before overflow.
5 bits: Datacenter ID (0-31), allowing up to 32 data centers.
5 bits: Machine/worker ID (0-31), allowing 32 machines per data center.
12 bits: Sequence number (0-4095), allowing 4,096 IDs per millisecond per machine.

Throughput

Each machine can generate 4,096 IDs per millisecond = 4,096,000 IDs per second. With 32 data centers × 32 machines = 1,024 machines, the system can generate over 4 billion IDs per second globally.

// Snowflake ID generation (pseudocode)
class SnowflakeGenerator:
  epoch = 1288834974657  // Twitter custom epoch (Nov 4, 2010)
  datacenter_id          // 5 bits, assigned at deployment
  machine_id             // 5 bits, assigned at deployment
  sequence = 0
  last_timestamp = -1

  function generate():
    timestamp = currentTimeMillis() - epoch

    if timestamp == last_timestamp:
      sequence = (sequence + 1) & 0xFFF  // 12-bit mask
      if sequence == 0:
        timestamp = waitForNextMillis()  // Block until next ms
    else:
      sequence = 0

    last_timestamp = timestamp

    id = (timestamp << 22)
       | (datacenter_id << 17)
       | (machine_id << 12)
       | sequence

    return id

Pros

64-bit: fits in a database BIGINT, twice as compact as UUID.
Time-sorted: IDs increase over time, so B-tree inserts are always at the end (excellent write performance).
No coordination: each machine generates IDs independently using only a local clock.
Extractable metadata: you can decode the timestamp, data center, and machine from any ID.

Cons

Clock dependency: If a machine's clock goes backward (NTP adjustment), duplicate IDs could be generated. Snowflake guards against this by refusing to generate IDs when the clock moves backward.
Machine ID assignment: The datacenter and machine IDs must be globally unique and pre-assigned, typically via configuration management (ZooKeeper, etcd, or environment variables).
Not cryptographically random: IDs are predictable and leak information about creation time and infrastructure topology.

Approach 4: ULID (Universally Unique Lexicographically Sortable Identifier)

ULID combines the best of UUID and Snowflake. It is a 128-bit identifier that is lexicographically sortable and encoded as a 26-character Crockford Base32 string.

ULID structure:
  Timestamp:  48 bits (millisecond precision, ~8,900 years)
  Randomness: 80 bits (cryptographically random)

Example: 01ARZ3NDEKTSV4RRFFQ69G5FAV
         |----------|----------------|
          timestamp     randomness
          (10 chars)    (16 chars)

Key properties:
  - Lexicographically sortable (string comparison works)
  - Compatible with UUID format (128 bits)
  - No machine ID coordination needed
  - Monotonic within same millisecond (increment random part)

Pros

Sortable by time using simple string comparison.
No coordination between servers (unlike Snowflake's machine ID requirement).
UUID-compatible: can be stored in UUID columns without schema changes.
URL-safe encoding (Crockford Base32 avoids ambiguous characters).

Cons

128 bits, so larger than Snowflake's 64 bits.
Not as widely adopted as UUID (fewer native database/language support).
Random component is not deterministic, so two ULIDs generated in the same millisecond on the same machine are not strictly ordered (though monotonic implementations exist).

Approach 5: Ticket Servers (Flickr Approach)

Flickr's approach uses dedicated "ticket server" databases whose sole purpose is to generate sequential IDs using REPLACE INTO and LAST_INSERT_ID().

-- Ticket server schema
CREATE TABLE tickets (
  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
  stub CHAR(1) NOT NULL DEFAULT '',
  PRIMARY KEY (id),
  UNIQUE KEY stub (stub)
) ENGINE=InnoDB;

-- Generate next ID
REPLACE INTO tickets (stub) VALUES ('a');
SELECT LAST_INSERT_ID();

-- Use two ticket servers with odd/even IDs for HA:
-- Server 1: auto_increment_offset=1, auto_increment_increment=2
-- Server 2: auto_increment_offset=2, auto_increment_increment=2

Pros

Simple to implement and understand.
Produces numeric, sortable IDs.
Two servers provide high availability.

Cons

Network round trip for every ID (adds latency).
Ticket servers become a dependency and potential bottleneck.
Requires careful management of the odd/even server pairing.

Comparison Table

Approach	Size	Sortable	Coordination	Throughput	Best For
UUID v4	128 bits	No	None	Unlimited	Simple systems, cross-system interop
Auto-Increment	32/64 bits	Yes	Single DB	DB limited	Single-server apps, prototypes
Snowflake	64 bits	Yes (time)	Machine ID	4M/s/machine	High-scale distributed systems
ULID	128 bits	Yes (lexicographic)	None	Unlimited	When you need sortable UUIDs
Ticket Server	64 bits	Yes	Ticket DB	DB limited	Medium-scale, simple infra

When to Use What

Building a small to medium web app? Database auto-increment is fine. Do not over-engineer.
Need cross-service unique IDs with no coordination? UUID v4 or ULID. Choose ULID if you need sortability.
Building a high-throughput distributed system (social network, messaging)? Snowflake or a Snowflake-like approach. The 64-bit size and time-sortability make it ideal.
Migrating from UUIDs but need sortability? ULID is a drop-in replacement (same 128-bit size).
System design interview? Start with requirements, then walk through Snowflake. It is the most commonly asked approach.

Interview Tip

When asked about unique ID generation, start by clarifying requirements: "Do we need the IDs to be sortable? How many IDs per second? Do we need them to be compact (64 bits) or is 128 bits acceptable? Do we have multiple data centers?" These questions demonstrate that you understand the trade-offs and are not just memorizing solutions.

Summary

Unique ID generation is a foundational problem in distributed systems. The right approach depends on your scale, sortability needs, and tolerance for coordination overhead. For most large-scale systems, a Snowflake-style generator (64-bit, time-sorted, machine-local) is the gold standard. For simpler needs or UUID compatibility, ULID provides an excellent balance of sortability and simplicity.

Requirements for Distributed IDs

Approach 1: UUID v4 (Random)

How It Works

Pros

Cons

Approach 2: Database Auto-Increment

Why It Breaks in Distributed Systems

Mitigation: Offset and Stride

Approach 3: Twitter Snowflake

Bit Layout

Throughput

Pros

Cons

Approach 4: ULID (Universally Unique Lexicographically Sortable Identifier)

Pros

Cons

Approach 5: Ticket Servers (Flickr Approach)

Pros

Cons

Comparison Table

When to Use What

Interview Tip

Summary

Chapter Check-Up