EasyStarter

Pastebin Service

DatabasesStorageAPI Design

Problem Statement

PasteIt is building a text-sharing service where developers paste code snippets and share them via a link. Features:

- Create pastes - submit text content with an optional title, syntax language (for highlighting), and expiration time (10 minutes, 1 hour, 1 day, 1 week, never).View pastes - anyone with the link can view the paste with syntax highlighting. No authentication required.Private pastes - optionally password-protect a paste or mark it as "unlisted" (not indexed but accessible via direct link).Raw view - a /raw endpoint that returns the plain text content (for `curl` / scripts).Automatic cleanup - expired pastes are deleted by a background job.

Expected traffic: 50,000 new pastes per day, 500,000 paste views per day.

What You'll Learn

Design a Pastebin-like service for sharing text snippets with syntax highlighting and expiration. Build this architecture under realistic production constraints, then validate tradeoffs in the design lab simulation.

DatabasesStorageAPI Design

Constraints

New pastes/day~50,000
Paste views/day~500,000
Max paste size512 KB
View latency< 200 ms
Total stored pastes~10 million
Availability target99.5%
ApproachClick to expand

Interview-Ready Approach

1) Clarify Scope and SLOs

  • Problem statement: Design a Pastebin-like service for sharing text snippets with syntax highlighting and expiration.
  • Design for a peak load target around 100 RPS (including burst headroom).
  • New pastes/day: ~50,000
  • Paste views/day: ~500,000
  • Max paste size: 512 KB
  • View latency: < 200 ms
  • Total stored pastes: ~10 million

2) Capacity Planning Method

  • Convert traffic and growth constraints into request rate, storage growth, and concurrency budgets.
  • Keep at least 2-3x safety margin per tier (ingress, compute, storage, async workers).
  • Reserve explicit latency budgets per hop so p95 can be defended in review.

3) Architecture Decisions

  • Databases: Define a clear system-of-record and design read/write paths separately before adding optimizations.
  • Storage: Use object storage for large blobs and keep metadata/authorization separate in the API tier.
  • API Design: Standardize API boundaries, idempotency keys, pagination, and error contracts first.

4) Reliability and Failure Strategy

  • Use strong write constraints (transactions or conditional writes) and explicit backup/restore strategy.
  • Enforce lifecycle policies, retention tiers, and checksum validation.
  • Apply strict input validation and backward-compatible versioning.

5) Validation Plan

  • Run one peak-load test, one dependency-degradation test, and one failover test.
  • Verify idempotency for all retried writes and async consumers.
  • Track user-facing SLOs first: p95 latency, error rate, and successful throughput.

6) Trade-offs to Call Out in Interviews

  • Databases: SQL gives stronger transactional guarantees; NoSQL often gives better write scaling and flexibility.
  • Storage: Object storage is cheap and durable, but random low-latency reads are weaker than databases/caches.
  • API Design: Rich APIs improve developer speed but can create long-term compatibility burden.

Practical Notes

  • Store paste content in an object store (S3) keyed by a unique ID; metadata (title, language, expiration) in a database.
  • A cron job runs periodically to delete expired pastes from both the database and object store.
  • CDN caching for popular pastes can reduce read load significantly.

Learn the Concept

Practice Next

Reference SolutionClick to reveal

Why This Solution Works

Request path: The solution keeps ingress, service logic, and stateful dependencies separated so each layer can scale independently.

Reference flow: Web Clients -> API Gateway -> API Service -> Primary SQL DB -> Object Storage

Design strengths

  • The architecture keeps synchronous paths short and isolates stateful dependencies behind clear boundaries.

Interview defense

  • This design makes bottlenecks explicit (ingress, core compute, persistence, async workers).
  • It supports progressive scaling without re-architecting the core request path.
  • It keeps correctness-sensitive state changes in durable systems while offloading background work asynchronously.