Most interview resources are either too scattered or too theoretical. This repo is organized around three practical tracks:
| Track | Best for | Start here |
|---|---|---|
| Core System Design | Distributed systems, cloud/platform, APIs, storage, scaling | docs/ |
| AI & Machine Learning | ML system design, agents, classic ML, deep learning, LLMs | docs/machine-learning/README.md |
| Reference & Practice Appendix | Templates, cheat sheets, LeetCode patterns, LLD | docs/reference/README.md |
Use the interactive site when you want navigation, quiz mode, and progress tracking. Use the Markdown docs when you want dense references you can skim before an interview.
| Feature | Description |
|---|---|
| π― 48 interview-ready topics | Core system design, cloud/platform, AI/ML, security, and interview reference material |
| π Dark / Light mode | Persisted preference, instant toggle with d |
| β Progress tracking | Mark topics as read. Your progress saves locally. |
| π Bookmarks | Save topics to revisit. Accessible from any page. |
| π Quiz / Flashcard mode | Randomized flashcard review across all 48 topics |
| π Inline reader | Read every topic without leaving the page β with prev/next navigation |
| β¨οΈ Keyboard-first | / search, q quiz, b bookmarks, ? shortcuts |
| π Visual progress bar | See your overall completion at a glance |
| πΊοΈ 3 learning paths | Beginner, Mid-Level, and Advanced tracks |
| π Live search | Searches title, category, summary, and tags |
| π¨ Category color coding | Every domain has its own visual identity |
| π Zero setup | Open in browser. No install. No build step. |
π Foundation (4)
- π The System Design Interview Framework β 4-step universal structure: Clarify β Estimate β Design β Deep Dive
- π’ Numbers Every Engineer Must Know β Latency hierarchy, scale reference points, back-of-envelope formulas
- πΎ IO Fundamentals: Read vs Write β Latency hierarchy, random vs sequential access, OS page cache, write amplification
- π Networking & Concurrency β TCP vs UDP, HTTP/1.1 vs HTTP/2 vs HTTP/3 (QUIC), event loop, goroutines
π£ Data Storage (5)
- ποΈ Database Selection Guide β SQL vs NoSQL tension, 7 database types with when-to-use decision matrix
- β‘ Caching Deep Dive β 5 cache layers, read/write patterns, eviction, cache invalidation strategies
- π¨ Message Queues & Event Streaming β Queue vs Kafka event log, delivery guarantees, DLQ, outbox pattern
- π Storage & CDN β Object/block/file storage, CDN pull vs push, cache invalidation
- π© Database Internals β B-tree vs LSM, indexes, replication, CDC, sharding, ACID vs BASE, isolation levels
π΅ API & Networking (4)
- π API Design & API Gateway β REST vs gRPC vs GraphQL, gateway responsibilities, rate limiting algorithms
- βοΈ Load Balancing & Networking β L4 vs L7, round-robin/least-connections/consistent hashing, health checks
- π΄ Real-time Communication β Polling, SSE, WebSockets compared; scaling stateful WS servers with Redis pub/sub
- π¦ Rate Limiting In Depth β Every algorithm compared, distributed Redis implementation, failure modes
βοΈ Cloud & Platform (5)
- βοΈ Cloud Fundamentals & Shared Responsibility β Regions, availability zones, managed services, shared responsibility, environment boundaries
- π₯οΈ Compute & Deployment Patterns β VMs vs containers vs Kubernetes vs serverless, autoscaling, canary/blue-green rollout
- π Cloud Networking & Traffic Management β VPCs, subnets, DNS, CDN/WAF, API gateways, service-to-service traffic
- πͺͺ IAM, Secrets & Governance β Least privilege, workload identity, secret rotation, KMS, audit and guardrails
- π Reliability, Observability & Cost β Multi-AZ vs multi-region, RTO/RPO, SLOs, budget alarms, cost-aware scaling
π’ Distributed Systems (5)
- π Distributed System Fundamentals β CAP, consistency models, consistent hashing, Saga vs 2PC, quorum, vector clocks
- π Core Design Patterns β Fan-out (social feed), CQRS, event sourcing, outbox pattern, inventory contention
- π§± Microservices vs Monolith β When to decompose, service discovery, sync vs async communication
- π‘οΈ Resilience Patterns β Timeouts, retries + jitter, circuit breaker, fallbacks, backpressure, load shedding
- π Distributed Locking β Why local locks fail, Redis Redlock, fencing tokens
π‘ Search & Analytics (4)
- π Search & Typeahead Systems β Inverted index, prefix trie autocomplete, relevance ranking (TF-IDF, BM25)
- π Stream Processing & Top-K Systems β Count-Min Sketch, Lambda vs Kappa architecture, Flink, windowing
- π Geo & Location Systems β Geohash, quadtree, proximity queries, Uber-style driver matching
- π² Probabilistic Data Structures β Bloom filter, HyperLogLog, Count-Min Sketch at massive scale
π© Scale & Reliability (6)
- π‘ Observability & Monitoring β Metrics, logs, traces (three pillars), SLOs, error budgets, OpenTelemetry
- π High Availability & Auto Scaling β Active-passive vs active-active, autoscaling signals, multi-region patterns
- π Unique ID Generation β UUID v4/v7/ULID, Twitter Snowflake, ticket servers β when to use each
- π API Pagination β Why offset pagination fails, cursor-based and keyset pagination at scale
- π Notification System Design β Multi-channel delivery, fan-out at scale, idempotency, retry + DLQ
- π Advanced Data Patterns β Pre-computation, materialized views, ETL vs ELT, hot spot problem, backfill
π΄ Security (4)
- π Security & Authentication β Sessions vs JWT, OAuth 2.0 flow, API security checklist
- πͺͺ Authorization, SSO & MFA β RBAC/ABAC/ReBAC, OIDC vs SAML, step-up authentication, passkeys
- π‘οΈ Privacy & Data Compliance β PII handling, encryption strategies, GDPR/CCPA, data residency
- π Secrets Management & Threat Modeling β secret rotation, API keys, KMS/HSM, STRIDE, attack paths
π©· AI & Machine Learning (5)
- π€ Machine Learning in System Design β feature store, recommendation and ranking systems, rollout strategy, drift, serving latency, rollback
- π§ AI Agent System Design β planner/reactor loops, function calling, retrieval, observability, agent benchmarks, model routing, budgets, safety
- π Classic Machine Learning β Bias-variance, Naive Bayes, KNN, bagging vs boosting, SHAP/LIME, calibration, XGBoost, SVM, PCA
- π¬ Deep Learning β Weight init, backprop, CNNs, LSTMs, full Transformer deep-dive, GANs, VAEs, diffusion, distillation, GQA/MQA
- π¬ LLM Interview Questions β Tokenization, RAG, LoRA/QLoRA, RLHF/DPO, scaling laws, MoE, multi-modal models, KV cache, CoT
π©΅ Specialized Systems (2)
- π Real-time Collaboration (Google Docs) β OT vs CRDT, operation logs, full Google Docs architecture
- π£ Webhooks System Design β Signed delivery, exponential retry, idempotency keys, full architecture
π¦ Reference (4)
- π― Common Scenarios & Solutions β 17 scenario cheat sheets covering classic patterns plus multi-tenant SaaS, webhooks, recommendation/ranking, and multi-region reliability
- π Reusable Design Templates β 12 full blueprints with architecture diagrams: YouTube, Twitter, WhatsApp, Uber, TinyURL, Rate Limiter, Metrics, TicketMaster, AI Agent, Typeahead, Google Docs, LeetCode
- π§© LeetCode Question Patterns β 21 algorithm patterns with code templates: arrays, two pointers, sliding window, trees, graphs, DP, backtracking, tries, segment tree, and more
- ποΈ Low-Level System Design (LLD) β SOLID principles, 10 design patterns with code, 11 classic LLD questions (LRU Cache, Parking Lot, Elevator, Rate Limiter, ATM, Tic-Tac-Toe, Logger, Library)
Pick a path based on your experience level, then use the interactive site to track your progress.
Interview Framework β Numbers to Know β Database Selection β Caching Deep Dive β API Design & Gateway β Rate Limiting
Distributed Fundamentals β Cloud Fundamentals β Compute & Deployment β Resilience Patterns β Observability β High Availability β Microservices β Notifications β Authorization / MFA
AI Agent System Design β ML System Design β Cloud Networking β IAM / Governance β Reliability, Observability & Cost β Real-time Collaboration β Probabilistic DS β DB Internals
Open the interactive site and press ? to see all shortcuts:
| Key | Action |
|---|---|
/ |
Focus search |
d |
Toggle dark mode |
q |
Start quiz / flashcard mode |
b |
Toggle bookmarks panel |
? |
Show all keyboard shortcuts |
Esc |
Close reader / clear search / close panel |
Space |
Reveal quiz answer |
β / β |
Next / previous quiz card or topic |
Option A β Interactive site (recommended)
No install. Works offline after first load. Progress saves to your browser. Includes an inline reader, quiz/flashcard mode, dark mode, and bookmarks.
Option B β Run locally
git clone https://github.com/Ali-Meh619/System_Design_ML_Principles.git
cd System_Design_ML_Principles
# Open site/index.html in your browser β no server neededOption C β Read on GitHub
Navigate to docs/ and click any topic. GitHub renders Markdown natively.
System_Design_ML_Principles/
βββ site/ # Interactive web app (no build step)
β βββ index.html # Main SPA β dark mode, quiz, bookmarks, inline reader
β βββ styles.css # Full design system with dark/light mode
β βββ app.js # All interactive features
β βββ topics.js # Topic registry with icons, difficulty, tags, paths
βββ docs/ # 48 topic documents
β βββ foundation/ # Interview framework, estimation, I/O, networking
β βββ api-networking/ # APIs, load balancing, rate limiting, realtime
β βββ cloud-platform/ # Cloud foundations, deployment, networking, IAM, reliability
β βββ data/ # Databases, caching, queues, internals
β βββ distributed/ # CAP, consistency, microservices, resilience, patterns
β βββ search/ # Full-text search, typeahead, geo, stream processing
β βββ scale/ # Observability, HA, ID gen, pagination, notifications
β βββ security/ # Auth, AuthZ, privacy, secrets, threat modeling
β βββ machine-learning/ # ML systems, agents, Classic ML, DL, LLMs
β βββ specialized/ # Collaboration and webhook-heavy systems
β βββ reference/ # Templates, cheat sheets, LeetCode, LLD
βββ assets/ # Architecture diagram images
The strongest docs in this repo use a consistent interview-prep structure. Not every legacy page is identical yet, but new and upgraded docs aim to follow this pattern:
## Problem
What are we solving? When does this come up in an interview?
## Options
What are the main approaches? (with trade-off table)
## Recommended Default
What to pick and why, with the specific caveats.
## Failure Modes
What breaks? How do you detect and recover?
## Metrics
What do you measure to know it's working?
## Interview Answer Sketch
The concise 2-minute answer you'd give under time pressure.Contributions make this better for everyone:
- Fork the repo and create a branch:
git checkout -b feat/your-topic - Follow the recommended topic structure above β especially defaults, trade-offs, failure modes, and metrics
- Add the topic to
site/topics.jswith an icon, difficulty, and tags - Open a PR using the provided template
Every substantial addition should include:
- β When to use
- β When NOT to use
- π₯ Common failure modes
- π Measurable success metrics
See CONTRIBUTING.md for the full guide.
- Star the repo β it helps others discover it
- Share it with your team or study group
- Open issues for topics you'd like to see covered
- Submit PRs to improve existing content
MIT Β© 2026. Free to use, share, and build on.
Built with β€οΈ for engineers who take system design seriously.
β Star on GitHub Β Β·Β π Open Interactive Site Β Β·Β π Report Issue