MediumCloud Drive · Part 1

Cloud Drive 1 - Personal File Storage

StorageDatabasesAPI DesignCDNAuth

Problem Statement

SkyVault is building a personal cloud storage service (think Dropbox / Google Drive). Users can:

- Upload & download files - drag and drop files up to 5 GB. Large files should use chunked/resumable uploads so a network interruption doesn't require starting over.Folder structure - organize files in a hierarchical folder tree.Sync - a desktop client syncs a local folder with the cloud. Changes made on one device should appear on others within 30 seconds.Sharing - generate shareable links with optional password protection and expiration dates.Versioning - keep the last 10 versions of every file so users can restore previous versions.Storage quotas - free users get 5 GB; paid users get 2 TB.

SkyVault targets 1 million users storing an average of 10 GB each (10 PB total).

What You'll Learn

Design a Dropbox-like cloud storage service for 1 M users with sync, sharing, and versioning. Build this architecture under realistic production constraints, then validate tradeoffs in the design lab simulation.

StorageDatabasesAPI DesignCDNAuth

Constraints

Registered users1,000,000
Total storage~10 PB
Max file size5 GB
Upload throughput~10,000 concurrent uploads
Sync latency< 30 seconds
File versions retained10 per file
Availability target99.9%
Durability target99.999999999% (11 nines)
ApproachClick to expand

Interview-Ready Approach

1) Clarify Scope and SLOs

  • Problem statement: Design a Dropbox-like cloud storage service for 1 M users with sync, sharing, and versioning.
  • Design for a peak load target around 100 RPS (including burst headroom).
  • Registered users: 1,000,000
  • Total storage: ~10 PB
  • Max file size: 5 GB
  • Upload throughput: ~10,000 concurrent uploads
  • Sync latency: < 30 seconds

2) Capacity Planning Method

  • Convert traffic and growth constraints into request rate, storage growth, and concurrency budgets.
  • Keep at least 2-3x safety margin per tier (ingress, compute, storage, async workers).
  • Reserve explicit latency budgets per hop so p95 can be defended in review.

3) Architecture Decisions

  • Storage: Use object storage for large blobs and keep metadata/authorization separate in the API tier.
  • Databases: Define a clear system-of-record and design read/write paths separately before adding optimizations.
  • API Design: Standardize API boundaries, idempotency keys, pagination, and error contracts first.
  • CDN: Serve static and cacheable content from edge and keep origin strictly for misses and dynamic requests.
  • Auth: Centralize identity verification and keep authorization checks close to domain resources.

4) Reliability and Failure Strategy

  • Enforce lifecycle policies, retention tiers, and checksum validation.
  • Use strong write constraints (transactions or conditional writes) and explicit backup/restore strategy.
  • Apply strict input validation and backward-compatible versioning.
  • Define cache keys and purge workflows before launch to avoid stale/global outages.
  • Use short-lived tokens and secure key rotation workflows.

5) Validation Plan

  • Run one peak-load test, one dependency-degradation test, and one failover test.
  • Verify idempotency for all retried writes and async consumers.
  • Track user-facing SLOs first: p95 latency, error rate, and successful throughput.

6) Trade-offs to Call Out in Interviews

  • Storage: Object storage is cheap and durable, but random low-latency reads are weaker than databases/caches.
  • Databases: SQL gives stronger transactional guarantees; NoSQL often gives better write scaling and flexibility.
  • API Design: Rich APIs improve developer speed but can create long-term compatibility burden.
  • CDN: Long TTL improves latency/cost; short TTL improves freshness.
  • Auth: Central auth simplifies policy, but makes auth service availability/security critical.

Practical Notes

  • Chunked uploads (e.g., 4 MB chunks) enable resumable uploads and deduplication at the chunk level.
  • Content-addressable storage - hash each chunk and deduplicate across users to save space.
  • A metadata service (SQL database) tracks the folder tree, file versions, and permissions; actual bytes go to object storage (S3).

Learn the Concept

Practice Next

Reference SolutionClick to reveal

Why This Solution Works

Request path: The solution keeps ingress, service logic, and stateful dependencies separated so each layer can scale independently.

Reference flow: Web Clients -> DNS -> CDN Edge -> Load Balancer -> API Gateway -> API Service -> Auth Service -> Primary SQL DB

Design strengths

  • Security controls are enforced at ingress to protect downstream capacity.

Interview defense

  • This design makes bottlenecks explicit (ingress, core compute, persistence, async workers).
  • It supports progressive scaling without re-architecting the core request path.
  • It keeps correctness-sensitive state changes in durable systems while offloading background work asynchronously.