TorchCode: The Complete Guide to LeetCode for PyTorch — ML Interview Practice
TorchCode is LeetCode for PyTorch — practice implementing softmax, attention, GPT-2, and more from scratch with instant auto-grading. Jupyter-based, self-hosted or try online. 40+ problems across 6 categories. 1,400+ stars, by duoan.
What Is TorchCode?
Top companies (Meta, Google DeepMind, OpenAI) expect ML engineers to implement core operations from memory. Reading papers isn't enough — you need to write softmax, LayerNorm, MultiHeadAttention, and full Transformer blocks.
TorchCode gives you a structured practice environment: no cloud, no signup, no GPU needed. Just make run or try it on Hugging Face.
- Stars: 1,400+ ⭐
- Forks: 95
- Language: Jupyter Notebook
- Topics: interview, leetcode, pytorch
- Online: HuggingFace Space
40+ Problems in 6 Categories
🧱 Fundamentals — "Implement X from scratch"
Write these without torch.nn:
| # | Problem | Signature |
|---|---|---|
| 01 | ReLU | relu(x) |
| 02 | Softmax | my_softmax(x, dim) |
| 03 | Linear Layer | SimpleLinear (y = xW^T + b) |
| 04 | LayerNorm | my_layer_norm(x, γ, β) |
| 07 | BatchNorm | my_batch_norm(x, γ, β) |
| 08 | RMSNorm | rms_norm(x, weight) |
| 15 | SwiGLU MLP | SwiGLUMLP |
| 16 | Cross-Entropy Loss | cross_entropy_loss(logits, targets) |
| 17 | Dropout | MyDropout |
| 18 | Embedding | MyEmbedding |
| 19 | GELU | my_gelu(x) |
| 20 | Kaiming Init | kaiming_init(weight) |
| 21 | Gradient Clipping | clip_grad_norm(params, max_norm) |
| 22 | Conv2d | my_conv2d(x, weight, ...) |
| 31 | Gradient Accumulation | accumulated_step(model, opt, ...) |
| 40 | Linear Regression | LinearRegression |
🧠 Attention Mechanisms
| # | Problem | Signature |
|---|---|---|
| 05 | Scaled Dot-Product | scaled_dot_product_attention(Q, K, V) |
| 06 | Multi-Head Attention | MultiHeadAttention |
| 09 | Causal Self-Attention | causal_attention(Q, K, V) |
| 10 | Grouped Query Attention | GroupQueryAttention |
| 11 | Sliding Window | sliding_window_attention(Q, K, V, w) |
| 12 | Linear Attention | linear_attention(Q, K, V) |
| 14 | KV Cache | KVCacheAttention |
| 23 | Cross-Attention | MultiHeadCrossAttention |
| 24 | RoPE | apply_rope(q, k) |
| 25 | Flash Attention | flash_attention(Q, K, V, block_size) |
🏗️ Architecture & Adaptation
| # | Problem | Signature |
|---|---|---|
| 13 | GPT-2 Block | GPT2Block |
| 26 | LoRA | LoRALinear |
| 27 | ViT Patch Embedding | PatchEmbedding |
| 28 | Mixture of Experts | MixtureOfExperts |
⚙️ Training & Optimization
| # | Problem | Signature |
|---|---|---|
| 29 | Adam Optimizer | MyAdam |
| 30 | Cosine LR Scheduler | cosine_lr_schedule(step, ...) |
🎯 Inference & Decoding
| # | Problem | Signature |
|---|---|---|
| 32 | Top-k / Top-p Sampling | sample_top_k_top_p(logits, ...) |
| 33 | Beam Search | beam_search(log_prob_fn, ...) |
| 34 | Speculative Decoding | speculative_decode(target, draft, ...) |
🔬 Advanced — Differentiators
| # | Problem | Signature |
|---|---|---|
| 35 | BPE Tokenizer | SimpleBPE |
| 36 | INT8 Quantization | Int8Linear |
| 37 | DPO Loss | dpo_loss(chosen, rejected, ...) |
| 38 | GRPO Loss | grpo_loss(logps, rewards, ...) |
| 39 | PPO Loss | ppo_loss(new_logps, old_logps, ...) |
How It Works
- Open a Jupyter notebook template
- Implement the function/class
- Run the auto-grading cell
- Instant pass/fail feedback
Docker, pip (torch-judge), Colab, or HuggingFace — zero GPU needed.
TorchCode vs Alternatives
Category: This is a PyTorch coding interview practice platform.
| Feature | TorchCode | LeetCode ML | d2l.ai | fast.ai |
|---|---|---|---|---|
| Focus | PyTorch interview practice | Algorithm problems | DL textbook | DL course |
| Stars | 1.4K ⭐ | N/A (platform) | ~24K ⭐ | ~27K ⭐ |
| Auto-Grading | ✅ Instant | ✅ | ❌ | ❌ |
| Problems | 40+ ML-specific | Limited ML | Exercises | Exercises |
| Attention Variants | 10 types | ❌ | 2-3 | 1-2 |
| RL Losses (DPO/PPO) | ✅ | ❌ | ❌ | ❌ |
| Flash Attention | ✅ | ❌ | ❌ | ❌ |
| MoE, LoRA, ViT | ✅ | ❌ | Partial | ❌ |
| No GPU Needed | ✅ | ✅ | ❌ | ❌ |
| Jupyter-Based | ✅ | ❌ | ✅ | ✅ |
| Self-Hosted | ✅ Docker | ❌ Cloud | ❌ | ❌ |
| Online | ✅ HuggingFace | ✅ | ✅ | ❌ |
When to choose TorchCode: You're preparing for ML coding interviews and need to practice implementing PyTorch primitives from scratch with auto-grading.
When to choose d2l.ai: You want a comprehensive DL textbook with explanations and theory.
When to choose fast.ai: You want a practical DL course with top-down learning approach.
Conclusion
TorchCode fills a critical gap in ML interview prep. While LeetCode covers algorithms, TorchCode covers what ML interviews actually ask: implement softmax, attention, GPT-2 blocks, LoRA, Flash Attention, DPO/PPO losses — all from scratch, all auto-graded, all in Jupyter. With 40+ problems across 6 categories and zero GPU requirement, it's the most targeted PyTorch interview practice platform.
