Security, Authentication & Encryption

Security is not a feature you bolt on at the end. Every system design must consider how users are authenticated, how data is protected in transit and at rest, and how APIs are secured against abuse. This chapter covers the essential security concepts that arise in system design interviews and real-world engineering.

Authentication vs. Authorization

These two concepts are often confused but serve fundamentally different purposes:

  • Authentication (AuthN): Verifying who you are. "Prove your identity." Examples: entering a password, scanning a fingerprint, presenting a certificate.
  • Authorization (AuthZ): Determining what you can do. "Given your identity, do you have permission to perform this action?" Examples: role-based access control (RBAC), access control lists (ACLs), policy engines.

Authentication always happens first. You must know who someone is before you can decide what they are allowed to do. A common mistake in system design is conflating the two or assuming that authentication alone is sufficient.

Password Hashing

Storing passwords in plaintext is the most fundamental security violation. If your database is breached, every user account is immediately compromised. Instead, passwords must be hashed using a slow, salted, purpose-built hashing algorithm.

How It Works

  • When a user registers, generate a random salt (unique per user).
  • Compute hash = slowHash(password + salt).
  • Store only the hash and salt (never the plaintext password).
  • On login, retrieve the user's salt, recompute the hash, and compare it to the stored hash.

Recommended Algorithms

AlgorithmTypeKey FeatureStatus
bcryptAdaptive hashConfigurable work factor (cost parameter)Industry standard, widely supported
Argon2idMemory-hard hashResistant to GPU/ASIC attacks (requires lots of memory)Winner of Password Hashing Competition (2015), recommended for new systems
scryptMemory-hard hashTunable memory and CPU costGood alternative, used by some cryptocurrency systems
SHA-256Fast hashFast computationNOT suitable for passwords (too fast, brute-forceable)
MD5Fast hashBroken, collision-proneNEVER use for anything security-related

Security Pitfall: Fast Hashes

Never use SHA-256, SHA-1, or MD5 for password hashing. These algorithms are designed to be fast, which means an attacker with a GPU can try billions of passwords per second. bcrypt and Argon2 are intentionally slow (100ms+ per hash), making brute-force attacks impractical. A cost factor of 10-12 for bcrypt or 64 MB memory for Argon2id is a reasonable starting point.

Session-Based Authentication vs. Token-Based Authentication (JWT)

After a user proves their identity (login), the system needs a way to remember them across subsequent requests. There are two dominant approaches.

Session-Based Auth

  • User logs in; server creates a session and stores it (in memory, Redis, or database).
  • Server sends a session ID as an HTTP cookie.
  • On each request, browser sends the cookie; server looks up the session to identify the user.
  • State lives on the server.

Pros

  • Easy to revoke: delete the session from the store.
  • Session data (roles, preferences) stays server-side, not exposed to the client.
  • Cookie security features (HttpOnly, Secure, SameSite) protect against XSS and CSRF.

Cons

  • Requires server-side storage (Redis cluster for scale).
  • Sticky sessions or shared session store needed in multi-server deployments.
  • Does not work well for mobile apps or third-party API access.

Token-Based Auth (JWT)

  • User logs in; server creates a JSON Web Token containing user claims (ID, roles, expiry).
  • Token is signed with a secret (HMAC) or key pair (RSA/ECDSA) and sent to the client.
  • Client stores the token (localStorage, cookie, or memory) and sends it in the Authorization: Bearer header.
  • State lives on the client.

Pros

  • Stateless: no server-side session store needed. Any server can verify the token.
  • Works across domains, mobile apps, microservices, and third-party integrations.
  • Self-contained: the token carries all necessary user information.

Cons

  • Hard to revoke: the token is valid until it expires. Requires a deny-list for immediate revocation.
  • Token size: JWTs can be large (1-2 KB), adding overhead to every request.
  • If stored in localStorage, vulnerable to XSS attacks.
// JWT structure (three Base64URL-encoded parts separated by dots)
// Header.Payload.Signature

// Header
{
  "alg": "RS256",
  "typ": "JWT"
}

// Payload (Claims)
{
  "sub": "user_12345",        // Subject (user ID)
  "name": "Alice",
  "role": "admin",
  "iat": 1708300800,          // Issued at (Unix timestamp)
  "exp": 1708304400           // Expires at (1 hour later)
}

// Signature
RSASHA256(
  base64UrlEncode(header) + "." + base64UrlEncode(payload),
  privateKey
)

OAuth 2.0

OAuth 2.0 is a delegation protocol that allows a user to grant a third-party application limited access to their resources on another service, without sharing their password. When you click "Login with Google," you are using OAuth 2.0.

Key Roles

  • Resource Owner: The user who owns the data (e.g., you).
  • Client: The third-party application requesting access (e.g., a task management app).
  • Authorization Server: The service that authenticates the user and issues tokens (e.g., Google's auth server).
  • Resource Server: The API that holds the user's data (e.g., Google Calendar API).

Authorization Code Flow (with PKCE)

The Authorization Code flow with PKCE (Proof Key for Code Exchange) is the recommended flow for web and mobile applications. PKCE prevents authorization code interception attacks.

User (Browser) Your App Auth Server (Google, etc.) 1. Click "Login with Google" 2. Redirect to auth server (with code_challenge) 3. User logs in & consents 4. Redirect with auth code 5. Exchange code + code_verifier for access_token + refresh_token 6. Return tokens Resource Server (Google API) 7. API call with access_token 8. Return user data

PKCE (Proof Key for Code Exchange)

PKCE adds a layer of protection against code interception. The client generates a random code_verifier and sends its SHA-256 hash (code_challenge) in step 2. In step 5, the client sends the original code_verifier. The auth server verifies the hash matches before issuing tokens. This ensures that even if an attacker intercepts the authorization code, they cannot exchange it without the code_verifier.

SSO and SAML

Single Sign-On (SSO) allows users to log in once and access multiple applications without re-authenticating. It is standard in enterprise environments.

  • SAML 2.0 (Security Assertion Markup Language): An XML-based protocol primarily used for enterprise SSO. An Identity Provider (IdP) like Okta or Azure AD authenticates the user and sends a signed XML assertion to the Service Provider (SP). Verbose but extremely well-established.
  • OpenID Connect (OIDC): A modern identity layer built on top of OAuth 2.0. Uses JSON instead of XML, and provides an id_token (a JWT) that contains user identity claims. Preferred for new applications.

HTTPS and TLS

HTTPS is HTTP over TLS (Transport Layer Security). It encrypts all data in transit between client and server, preventing eavesdropping and tampering.

TLS Handshake (Simplified)

  1. Client Hello: Client sends supported TLS versions and cipher suites.
  2. Server Hello: Server selects a cipher suite and sends its certificate (containing its public key), signed by a Certificate Authority (CA).
  3. Certificate Verification: Client verifies the certificate chain against trusted CAs (pre-installed in the OS/browser).
  4. Key Exchange: Client and server perform a Diffie-Hellman key exchange (or similar) to establish a shared symmetric session key. The server's private key is used to authenticate the exchange but never transmitted.
  5. Encrypted Communication: All subsequent data is encrypted with the symmetric session key (e.g., AES-256-GCM). Symmetric encryption is used because it is orders of magnitude faster than asymmetric encryption.

Certificate Authorities (CAs)

CAs are trusted third parties that verify domain ownership and issue TLS certificates. The browser trusts a set of root CAs (built into the OS). Certificates form a chain: Root CA → Intermediate CA → Server Certificate. Let's Encrypt provides free, automated certificates and is used by the majority of websites today.

Encryption at Rest vs. In Transit

Encryption In Transit

  • Protects data while it moves between systems (client to server, service to service).
  • Implemented via TLS/HTTPS.
  • Prevents eavesdropping, man-in-the-middle attacks, and data tampering.
  • Standard practice: all modern services should use HTTPS exclusively.

Encryption At Rest

  • Protects data while it is stored on disk (databases, file systems, backups).
  • Implemented via AES-256 encryption at the storage layer.
  • Protects against physical theft of disks and unauthorized access to storage.
  • Key management is critical: use a KMS (AWS KMS, HashiCorp Vault) to manage encryption keys. Never store keys alongside the encrypted data.

API Security

APIs are the attack surface of modern applications. Every public endpoint is a potential vector for abuse.

API Keys

  • A long random string sent in a header (X-API-Key) or query parameter.
  • Identifies the calling application (not the user). Used for rate limiting, usage tracking, and billing.
  • Not a substitute for authentication. API keys can be leaked and do not prove user identity.
  • Store hashed (like passwords). Transmit over HTTPS only.

Rate Limiting

  • Protects APIs from abuse, DDoS attacks, and runaway scripts.
  • Implement per-user, per-IP, or per-API-key limits.
  • Return 429 Too Many Requests with a Retry-After header.
  • Use algorithms like Token Bucket or Sliding Window (see Chapter 13).

CORS (Cross-Origin Resource Sharing)

  • Browser security mechanism that restricts which domains can make requests to your API.
  • The server responds with Access-Control-Allow-Origin headers specifying allowed origins.
  • Preflight requests (OPTIONS) are sent for non-simple requests (custom headers, PUT/DELETE methods).
  • Never set Access-Control-Allow-Origin: * on authenticated endpoints.

CSRF (Cross-Site Request Forgery)

  • An attack where a malicious site tricks a user's browser into making authenticated requests to your API (exploiting cookies that are automatically sent).
  • Defense: CSRF tokens (a random token included in forms and validated server-side), SameSite=Strict or SameSite=Lax cookie attribute, and requiring custom headers (e.g., X-Requested-With) that cannot be set by cross-origin forms.

Common Security Mistakes in System Design

  • Storing passwords in plaintext or with fast hashes: Always use bcrypt or Argon2id.
  • Not using HTTPS everywhere: All communication, including internal service-to-service, should use TLS.
  • Storing JWTs in localStorage: Vulnerable to XSS. Prefer HttpOnly cookies or in-memory storage with refresh token rotation.
  • Long-lived tokens without refresh: Access tokens should expire quickly (15 minutes). Use refresh tokens for seamless re-authentication.
  • Exposing internal IDs: Sequential IDs let attackers enumerate resources. Use UUIDs or opaque tokens for public-facing identifiers.
  • Trusting client-side validation: Always validate and sanitize input on the server. Client-side validation is for UX, not security.
  • Missing authorization checks: Authenticating a user is not enough. Every endpoint must verify the user has permission for the specific resource they are accessing (broken access control is consistently the #1 vulnerability in the OWASP Top 10).

Defense in Depth

No single security measure is sufficient. Production systems layer multiple defenses:

  • Network layer: Firewalls, VPCs, private subnets, security groups.
  • Transport layer: TLS everywhere (mutual TLS for service-to-service).
  • Application layer: Input validation, parameterized queries (prevent SQL injection), output encoding (prevent XSS).
  • Authentication layer: Multi-factor authentication (MFA), password policies, account lockout.
  • Authorization layer: RBAC, principle of least privilege, resource-level permissions.
  • Data layer: Encryption at rest, key rotation, audit logging.
  • Monitoring layer: Intrusion detection, anomaly detection, security alerts.

Summary

Security in system design spans authentication (proving identity), authorization (enforcing permissions), encryption (protecting data), and API hardening (preventing abuse). In interviews, demonstrating awareness of password hashing, JWT vs. session trade-offs, OAuth flows, TLS, and common vulnerabilities like CSRF and XSS shows that you design systems that are not only functional but also secure. Security is a cross-cutting concern that affects every layer of your architecture.

Chapter Check-Up

Quick quiz to reinforce what you just learned.