⚡ System Design & Machine Learning Playbook

An interactive reference guide for engineers preparing for System Design, Cloud, and ML interviews.

🌐 Open Interactive Site · 📚 Browse All Topics · 🎯 Start Here · 🤝 Contribute

✨ How To Use This Repo

Most interview resources are either too scattered or too theoretical. This repo is organized around three practical tracks:

Track	Best for	Start here
Core System Design	Distributed systems, cloud/platform, APIs, storage, scaling	docs/
AI & Machine Learning	ML system design, agents, classic ML, deep learning, LLMs	docs/machine-learning/README.md
Reference & Practice Appendix	Templates, cheat sheets, LeetCode patterns, LLD	docs/reference/README.md

Use the interactive site when you want navigation, quiz mode, and progress tracking. Use the Markdown docs when you want dense references you can skim before an interview.

🚀 What Makes It Useful

Feature	Description
🎯 48 interview-ready topics	Core system design, cloud/platform, AI/ML, security, and interview reference material
🌙 Dark / Light mode	Persisted preference, instant toggle with `d`
✅ Progress tracking	Mark topics as read. Your progress saves locally.
🔖 Bookmarks	Save topics to revisit. Accessible from any page.
🃏 Quiz / Flashcard mode	Randomized flashcard review across all 48 topics
📖 Inline reader	Read every topic without leaving the page — with prev/next navigation
⌨️ Keyboard-first	`/` search, `q` quiz, `b` bookmarks, `?` shortcuts
📊 Visual progress bar	See your overall completion at a glance
🗺️ 3 learning paths	Beginner, Mid-Level, and Advanced tracks
🔍 Live search	Searches title, category, summary, and tags
🎨 Category color coding	Every domain has its own visual identity
🚀 Zero setup	Open in browser. No install. No build step.

🗂️ Topic Coverage (48 Topics)

🟠 Foundation (4)

📐 The System Design Interview Framework — 4-step universal structure: Clarify → Estimate → Design → Deep Dive
🔢 Numbers Every Engineer Must Know — Latency hierarchy, scale reference points, back-of-envelope formulas
💾 IO Fundamentals: Read vs Write — Latency hierarchy, random vs sequential access, OS page cache, write amplification
🔌 Networking & Concurrency — TCP vs UDP, HTTP/1.1 vs HTTP/2 vs HTTP/3 (QUIC), event loop, goroutines

🟣 Data Storage (5)

🗄️ Database Selection Guide — SQL vs NoSQL tension, 7 database types with when-to-use decision matrix
⚡ Caching Deep Dive — 5 cache layers, read/write patterns, eviction, cache invalidation strategies
📨 Message Queues & Event Streaming — Queue vs Kafka event log, delivery guarantees, DLQ, outbox pattern
🌐 Storage & CDN — Object/block/file storage, CDN pull vs push, cache invalidation
🔩 Database Internals — B-tree vs LSM, indexes, replication, CDC, sharding, ACID vs BASE, isolation levels

🔵 API & Networking (4)

🔌 API Design & API Gateway — REST vs gRPC vs GraphQL, gateway responsibilities, rate limiting algorithms
⚖️ Load Balancing & Networking — L4 vs L7, round-robin/least-connections/consistent hashing, health checks
🔴 Real-time Communication — Polling, SSE, WebSockets compared; scaling stateful WS servers with Redis pub/sub
🚦 Rate Limiting In Depth — Every algorithm compared, distributed Redis implementation, failure modes

☁️ Cloud & Platform (5)

☁️ Cloud Fundamentals & Shared Responsibility — Regions, availability zones, managed services, shared responsibility, environment boundaries
🖥️ Compute & Deployment Patterns — VMs vs containers vs Kubernetes vs serverless, autoscaling, canary/blue-green rollout
🌍 Cloud Networking & Traffic Management — VPCs, subnets, DNS, CDN/WAF, API gateways, service-to-service traffic
🪪 IAM, Secrets & Governance — Least privilege, workload identity, secret rotation, KMS, audit and guardrails
📉 Reliability, Observability & Cost — Multi-AZ vs multi-region, RTO/RPO, SLOs, budget alarms, cost-aware scaling

🟢 Distributed Systems (5)

🌐 Distributed System Fundamentals — CAP, consistency models, consistent hashing, Saga vs 2PC, quorum, vector clocks
🔄 Core Design Patterns — Fan-out (social feed), CQRS, event sourcing, outbox pattern, inventory contention
🧱 Microservices vs Monolith — When to decompose, service discovery, sync vs async communication
🛡️ Resilience Patterns — Timeouts, retries + jitter, circuit breaker, fallbacks, backpressure, load shedding
🔒 Distributed Locking — Why local locks fail, Redis Redlock, fencing tokens

🟡 Search & Analytics (4)

🔍 Search & Typeahead Systems — Inverted index, prefix trie autocomplete, relevance ranking (TF-IDF, BM25)
📊 Stream Processing & Top-K Systems — Count-Min Sketch, Lambda vs Kappa architecture, Flink, windowing
📍 Geo & Location Systems — Geohash, quadtree, proximity queries, Uber-style driver matching
🎲 Probabilistic Data Structures — Bloom filter, HyperLogLog, Count-Min Sketch at massive scale

🟩 Scale & Reliability (6)

📡 Observability & Monitoring — Metrics, logs, traces (three pillars), SLOs, error budgets, OpenTelemetry
📈 High Availability & Auto Scaling — Active-passive vs active-active, autoscaling signals, multi-region patterns
🆔 Unique ID Generation — UUID v4/v7/ULID, Twitter Snowflake, ticket servers — when to use each
📄 API Pagination — Why offset pagination fails, cursor-based and keyset pagination at scale
🔔 Notification System Design — Multi-channel delivery, fan-out at scale, idempotency, retry + DLQ
🔁 Advanced Data Patterns — Pre-computation, materialized views, ETL vs ELT, hot spot problem, backfill

🔴 Security (4)

🔐 Security & Authentication — Sessions vs JWT, OAuth 2.0 flow, API security checklist
🪪 Authorization, SSO & MFA — RBAC/ABAC/ReBAC, OIDC vs SAML, step-up authentication, passkeys
🛡️ Privacy & Data Compliance — PII handling, encryption strategies, GDPR/CCPA, data residency
🔑 Secrets Management & Threat Modeling — secret rotation, API keys, KMS/HSM, STRIDE, attack paths

🩷 AI & Machine Learning (5)

🤖 Machine Learning in System Design — feature store, recommendation and ranking systems, rollout strategy, drift, serving latency, rollback
🧠 AI Agent System Design — planner/reactor loops, function calling, retrieval, observability, agent benchmarks, model routing, budgets, safety
📈 Classic Machine Learning — Bias-variance, Naive Bayes, KNN, bagging vs boosting, SHAP/LIME, calibration, XGBoost, SVM, PCA
🔬 Deep Learning — Weight init, backprop, CNNs, LSTMs, full Transformer deep-dive, GANs, VAEs, diffusion, distillation, GQA/MQA
💬 LLM Interview Questions — Tokenization, RAG, LoRA/QLoRA, RLHF/DPO, scaling laws, MoE, multi-modal models, KV cache, CoT

🩵 Specialized Systems (2)

📝 Real-time Collaboration (Google Docs) — OT vs CRDT, operation logs, full Google Docs architecture
🎣 Webhooks System Design — Signed delivery, exponential retry, idempotency keys, full architecture

🟦 Reference (4)

🎯 Common Scenarios & Solutions — 17 scenario cheat sheets covering classic patterns plus multi-tenant SaaS, webhooks, recommendation/ranking, and multi-region reliability
📋 Reusable Design Templates — 12 full blueprints with architecture diagrams: YouTube, Twitter, WhatsApp, Uber, TinyURL, Rate Limiter, Metrics, TicketMaster, AI Agent, Typeahead, Google Docs, LeetCode
🧩 LeetCode Question Patterns — 21 algorithm patterns with code templates: arrays, two pointers, sliding window, trees, graphs, DP, backtracking, tries, segment tree, and more
🏗️ Low-Level System Design (LLD) — SOLID principles, 10 design patterns with code, 11 classic LLD questions (LRU Cache, Parking Lot, Elevator, Rate Limiter, ATM, Tic-Tac-Toe, Logger, Library)

🛤️ Learning Paths

Pick a path based on your experience level, then use the interactive site to track your progress.

🌱 Beginner — Build your foundation (6 topics)

Interview Framework → Numbers to Know → Database Selection → Caching Deep Dive → API Design & Gateway → Rate Limiting

🚀 Mid-Level — Master distributed systems and platform basics (9 topics)

Distributed Fundamentals → Cloud Fundamentals → Compute & Deployment → Resilience Patterns → Observability → High Availability → Microservices → Notifications → Authorization / MFA

🏆 Advanced — Push beyond the standard interview (8 topics)

AI Agent System Design → ML System Design → Cloud Networking → IAM / Governance → Reliability, Observability & Cost → Real-time Collaboration → Probabilistic DS → DB Internals

⌨️ Keyboard Shortcuts

Open the interactive site and press ? to see all shortcuts:

Key	Action
`/`	Focus search
`d`	Toggle dark mode
`q`	Start quiz / flashcard mode
`b`	Toggle bookmarks panel
`?`	Show all keyboard shortcuts
`Esc`	Close reader / clear search / close panel
`Space`	Reveal quiz answer
`→` / `←`	Next / previous quiz card or topic

🚀 Quick Start

Option A — Interactive site (recommended)

Open the full Interactive Website here 🌐

No install. Works offline after first load. Progress saves to your browser. Includes an inline reader, quiz/flashcard mode, dark mode, and bookmarks.

Option B — Run locally

git clone https://github.com/Ali-Meh619/System_Design_ML_Principles.git
cd System_Design_ML_Principles
# Open site/index.html in your browser — no server needed

Option C — Read on GitHub

Navigate to docs/ and click any topic. GitHub renders Markdown natively.

📁 Repository Structure

System_Design_ML_Principles/
├── site/                       # Interactive web app (no build step)
│   ├── index.html              # Main SPA — dark mode, quiz, bookmarks, inline reader
│   ├── styles.css              # Full design system with dark/light mode
│   ├── app.js                  # All interactive features
│   └── topics.js               # Topic registry with icons, difficulty, tags, paths
├── docs/                       # 48 topic documents
│   ├── foundation/             # Interview framework, estimation, I/O, networking
│   ├── api-networking/         # APIs, load balancing, rate limiting, realtime
│   ├── cloud-platform/         # Cloud foundations, deployment, networking, IAM, reliability
│   ├── data/                   # Databases, caching, queues, internals
│   ├── distributed/            # CAP, consistency, microservices, resilience, patterns
│   ├── search/                 # Full-text search, typeahead, geo, stream processing
│   ├── scale/                  # Observability, HA, ID gen, pagination, notifications
│   ├── security/               # Auth, AuthZ, privacy, secrets, threat modeling
│   ├── machine-learning/       # ML systems, agents, Classic ML, DL, LLMs
│   ├── specialized/            # Collaboration and webhook-heavy systems
│   └── reference/              # Templates, cheat sheets, LeetCode, LLD
└── assets/                     # Architecture diagram images

📖 Recommended Topic Structure

The strongest docs in this repo use a consistent interview-prep structure. Not every legacy page is identical yet, but new and upgraded docs aim to follow this pattern:

## Problem
What are we solving? When does this come up in an interview?

## Options
What are the main approaches? (with trade-off table)

## Recommended Default
What to pick and why, with the specific caveats.

## Failure Modes
What breaks? How do you detect and recover?

## Metrics
What do you measure to know it's working?

## Interview Answer Sketch
The concise 2-minute answer you'd give under time pressure.

🤝 Contributing

Contributions make this better for everyone:

Fork the repo and create a branch: git checkout -b feat/your-topic
Follow the recommended topic structure above — especially defaults, trade-offs, failure modes, and metrics
Add the topic to site/topics.js with an icon, difficulty, and tags
Open a PR using the provided template

Every substantial addition should include:

✅ When to use
❌ When NOT to use
💥 Common failure modes
📊 Measurable success metrics

See CONTRIBUTING.md for the full guide.

⭐ If this helped you

Star the repo — it helps others discover it
Share it with your team or study group
Open issues for topics you'd like to see covered
Submit PRs to improve existing content

📄 License

Built with ❤️ for engineers who take system design seriously.

⭐ Star on GitHub · 🌐 Open Interactive Site · 🐛 Report Issue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡ System Design & Machine Learning Playbook

An interactive reference guide for engineers preparing for System Design, Cloud, and ML interviews.

✨ How To Use This Repo

🚀 What Makes It Useful

🗂️ Topic Coverage (48 Topics)

🛤️ Learning Paths

🌱 Beginner — Build your foundation (6 topics)

🚀 Mid-Level — Master distributed systems and platform basics (9 topics)

🏆 Advanced — Push beyond the standard interview (8 topics)

⌨️ Keyboard Shortcuts

🚀 Quick Start

📁 Repository Structure

📖 Recommended Topic Structure

🤝 Contributing

⭐ If this helped you

📄 License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

⚡ System Design & Machine Learning Playbook

An interactive reference guide for engineers preparing for System Design, Cloud, and ML interviews.

✨ How To Use This Repo

🚀 What Makes It Useful

🗂️ Topic Coverage (48 Topics)

🛤️ Learning Paths

🌱 Beginner — Build your foundation (6 topics)

🚀 Mid-Level — Master distributed systems and platform basics (9 topics)

🏆 Advanced — Push beyond the standard interview (8 topics)

⌨️ Keyboard Shortcuts

🚀 Quick Start

📁 Repository Structure

📖 Recommended Topic Structure

🤝 Contributing

⭐ If this helped you

📄 License