A comprehensive interactive course on Transformer architecture, from basic concepts to advanced topics. Designed for Python developers with no prior machine learning background.
This course teaches the complete Transformer architecture through:
- Interactive math explanations with KaTeX rendering
- Runnable Python code using Pyodide (runs in browser)
- Interactive visualizations with D3.js and custom components
- Step-by-step progression from linear algebra to cutting-edge research
- Foundations - Linear algebra, neural networks, PyTorch primer
- Before Transformers - RNNs, LSTMs, seq2seq attention
- Attention Is All You Need - The groundbreaking 2017 paper
- Self-Attention - Query/Key/Value mechanism deep dive
- Multi-Head Architecture - Parallel attention and layer details
- Encoder-Decoder - Full architecture walkthrough
- Tokenization - BPE, WordPiece, SentencePiece strategies
- Landmark Models - BERT, GPT series, LLaMA, Mistral evolution
- Training Process - Pre-training, fine-tuning, RLHF, DPO
- Efficient Methods - LoRA, quantization, Flash Attention
- RAG & Applications - Retrieval-Augmented Generation
- Frontier Topics - MoE, Mamba, scaling laws, latest research
- Framework: Astro 5 + Starlight (static site generation)
- Math: KaTeX for server-side equation rendering
- Interactive Code: Pyodide (Python WASM runtime)
- Visualizations: D3.js for charts and diagrams
- Styling: Custom CSS with neutral color palette
- Theme: Built-in dark/light mode toggle
- Node.js 18+
- npm or yarn
# Clone the repository
git clone https://github.com/your-username/transformer-course.git
cd transformer-course
# Install dependencies
npm install
# Start development server
npm run devThe site will be available at http://localhost:4321
# Start dev server with hot reload
npm run dev
# Build for production
npm run build
# Preview production build
npm run preview
# Type check
npm run astro checkInline Python code execution with NumPy support:
<PyodideRunner
title="Try Matrix Multiplication"
initialCode="import numpy as np
A = np.array([[1, 2], [3, 4]])
print('Matrix A:', A)"
/>Math + Code + Visual explanations:
<ThreeWayView
title="Dot Product"
mathContent="$$\mathbf{a} \cdot \mathbf{b} = \sum_{i} a_i b_i$$"
codeContent="dot_product = np.dot(a, b)"
/>Interactive attention weight visualizations:
<AttentionHeatmap
tokens={["The", "cat", "sat"]}
attentionWeights={[[0.8, 0.1, 0.1], [0.2, 0.6, 0.2], [0.1, 0.3, 0.6]]}
/>Compare different tokenization strategies:
<TokenizerPlayground
title="BPE vs WordPiece vs SentencePiece"
initialText="The quick brown fox jumps over the lazy dog."
/>transformer-course/
βββ astro.config.mjs # Astro configuration
βββ package.json # Dependencies and scripts
βββ tsconfig.json # TypeScript configuration
βββ src/
β βββ assets/
β β βββ custom.css # Theme and component styles
β β βββ logo.svg # Course logo
β βββ components/
β β βββ AttentionHeatmap.astro
β β βββ PyodideRunner.astro
β β βββ ThreeWayView.astro
β β βββ TokenizerPlayground.astro
β βββ content/
β βββ docs/ # Course content (MDX files)
β βββ index.mdx # Landing page
β βββ 01-foundations/
β βββ 02-before-transformers/
β βββ 03-attention-is-all-you-need/
β βββ 04-self-attention/
β βββ ... (modules 5-12)
βββ public/
βββ images/ # Static diagrams and assets
The course uses a neutral slate/zinc color palette instead of Starlight's default purple theme. Colors are defined in src/assets/custom.css:
:root {
--sl-color-accent: #475569; /* Slate-600 */
--sl-color-accent-high: #334155; /* Slate-700 */
--sl-color-accent-low: #f1f5f9; /* Slate-100 */
}
[data-theme='dark'] {
--sl-color-accent: #94a3b8; /* Slate-400 */
--sl-color-accent-high: #cbd5e1; /* Slate-300 */
}The course references and builds upon:
- Papers: "Attention Is All You Need", BERT, GPT series, LLaMA, etc.
- Books: "Hands-On Large Language Models" (Alammar), "Build a Large Language Model From Scratch" (Raschka)
- Blogs: Jay Alammar's "Illustrated Transformer", Lilian Weng's blog
- Code: nanoGPT, Hugging Face Transformers, Annotated Transformer
- Courses: Stanford CS224N, Hugging Face NLP Course
npm install -g vercel
vercel --prodnpm run build
# Upload dist/ folder to Netlifynpm run build
# Push dist/ contents to gh-pages branch- Lighthouse scores: 95+ Performance, 100 Accessibility
- Bundle size: ~200KB (Astro's zero-JS by default)
- Math rendering: Server-side KaTeX (no client-side MathJax)
- Interactive components: Loaded only when needed (Astro Islands)
Contributions welcome! Please see areas for improvement:
- Additional visualizations (3D architecture explorer, training animations)
- More code exercises for each module
- Advanced topics (recent papers, new architectures)
- Accessibility improvements
- Mobile optimization
- Use semantic HTML and ARIA labels
- Test interactive components in both themes
- Ensure math renders correctly in KaTeX
- Keep bundle size minimal (leverage Astro Islands)
- Follow the established visual design patterns
Special thanks to:
- Jay Alammar for the "Illustrated Transformer" inspiration
- Andrej Karpathy for educational approach to AI
- The Hugging Face team for democratizing transformers
- Original transformer paper authors (Vaswani et al.)
Ready to understand the architecture that powers modern AI?
Run npm run dev and open http://localhost:4321 to start the course.