Hello fellow ML geeks, my name is Andrey and I know about ML much-much less than I want to. While I do not think I will ever reach the level of "finally good enough", I still want to learn in a good company. Join in!
How it works:
- Read a chapter from the list.
- Jump on a call.
- Listen to me talk over some slides or present yourself (pretty pretty pretty please!)
- Take part in the discussion and learn something.
- Slides will be uploaded to this repo, recording uploaded to Youtube
- Join the chat for important announcements and discussion: Discord, Telegram (anouncements will be duplicated)
Table of contents:
Schedule: every Tuesday, 18:00 (London time)
| Date | Topic | Presented by | Slides | Recording |
|---|---|---|---|---|
| 2 Apr, 2026 | Training Compass | @fxlrnrpt | youtube | |
| 9 Apr, 2026 | Every Big Model Starts with a Small Ablation | @fxlrnrpt | youtube | |
| 16 Apr, 2026 | Architecture: Attention | @fxlrnrpt | youtube | |
| 28 Apr, 2026 | Architecture: Embedding Sharing, Intro to Positional Encoding | @fxlrnrpt | youtube | |
| TBD | Positional Encoding Deep Dive and Long Context | TBD | ||
| TBD | Improving Stability, Additional Considerations | TBD | ||
| TBD | Architecture: MoE, To MoE or Not to MoE | TBD | ||
| TBD | Architecture: Hybrid Models | TBD | ||
| TBD | Architecture: Tokenizer | TBD | ||
| TBD | Architecture: SmolLM3, Rules of Engagement | TBD | ||
| TBD | Optimizers | TBD | ||
| TBD | Learning Rate, Batch Size, Scaling Laws for Hyperparameters, Scaling Laws: How Many Parameters, How Much Data? | TBD | ||
| TBD | Optimizer and Training Hyperparameters: SmolLM3, Rules of Engagement | TBD | ||
| TBD | The Art of Data Curation | TBD | ||
| TBD | The Training Marathon: Checklist, Scaling Surprises, Staying the Course | TBD | ||
| TBD | The Training Marathon: Mid-training, wrapping up | TBD | ||
| TBD | Post-Training Compass | TBD | ||
| TBD | Post-Training: Evals, Tools | TBD | ||
| TBD | SFT | TBD | ||
| TBD | Prefernce optimization | TBD | ||
| TBD | Going On-Policy and Beyond Supervised Labels | TBD | ||
| TBD | Distillation (custom extension chapter!) | TBD | ||
| TBD | Continuous pre-training (custom extension chapter!) | TBD | ||
| TBD | PEFT: LoRA, prefix-tuning, etc (custom extension chapter!) | TBD | ||
| TBD | Infrastructure | TBD |
| Date | Topic | Presented by | Slides | Recording |
|---|---|---|---|---|
| 12 Feb, 2026 | Ch.9 Diving into the GPUs [fused kernels:] | @stalkermustang | slides | recording |
| 5 Feb, 2026 | Ch.9 Diving into the GPUs [:fused kernels] | @fxlrnrpt | slides | recording |
| 22 Jan, 2026 | Ch.8 Finding the Best Training Configuration | @fxlrnrpt | slides | recording |
| 15 Jan, 2026 | Ch.7 5D Parallelism in a Nutshell | @fxlrnrpt | slides | recording |
| 18 Dec, 2025 | Ch.6 Expert Parallelism | @stalkermustang | slides | recording |
| 11 Dec, 2025 | Ch.5 Pipeline Parallelism | @fxlrnrpt | slides | recording |
| 4 Dec, 2025 | Ch.4 Context Parallelism | @stalkermustang | slides | recording |
| 27 Nov, 2025 | Ch.3.2 Sequence Parallelism | @fxlrnrpt | slides | recording |
| 20 Nov, 2025 | Ch.3.1 Tensor Parallelism | @fxlrnrpt | slides | recording |
| 13 Nov, 2025 | Ch.2.2 Model Parallelism [ZERO:] | @fxlrnrpt | slides | recording |
| 06 Nov, 2025 | Ch.2.1 Data Parallelism [:ZERO] | @fxlrnrpt | slides | recording |
| 30 Oct, 2025 | Ch.1 First Steps: Training on One GPU | @stalkermustang | slides | recording |
| Date | Topic | Presented by | Slides | Recording |
|---|---|---|---|---|
| 9 Feb, 2026 | Chapter 1: Prompt Chaining | @fxlrnrpt | slides | recording |
| 11 Feb, 2026 | Chapters 2-4: Routing, Parallelization, Reflection | @anyaepie | slides | recording |
| 13 Feb, 2026 | Chapters 5-7: Tool Use, Planning, Multi-Agent | @fxlrnrpt | slides | recording |
| 16 Feb, 2026 | Chapters 8-12: Memory Management, Learning and Adaptation, MCP, Goal Setting and Monitoring, Exception Handling and Recovery | @anyaepie | slides | recording |
| 18 Feb, 2026 | Chapters 13-16, 18: Human-in-the-Loop, Guardrails/Safety Patterns, RAG, A2A, Resource-Aware Optimization | @julia-meshcheryakova, @fxlrnrpt | slides1, slides2 | recording |
| 20 Feb, 2026 | Chapters 19-20: Evaluation and Monitoring, Prioritization, | @AnCh7 | slides | recording |
| 27 Feb, 2026 | Chapters 17, 21, Appendix | Reasoning Technique, Exploration and Discovery, Appendix | @fxlrnrpt | slides |