TorchCode: The Complete Guide to LeetCode for PyTorch — ML Interview Practice

TorchCode is LeetCode for PyTorch — practice implementing softmax, attention, GPT-2, and more from scratch with instant auto-grading. Jupyter-based, self-hosted or try online. 40+ problems across 6 categories. 1,400+ stars, by duoan.

TorchCode on GitHub

What Is TorchCode?

Top companies (Meta, Google DeepMind, OpenAI) expect ML engineers to implement core operations from memory. Reading papers isn't enough — you need to write softmax, LayerNorm, MultiHeadAttention, and full Transformer blocks.

TorchCode gives you a structured practice environment: no cloud, no signup, no GPU needed. Just make run or try it on Hugging Face.

Stars: 1,400+ ⭐
Forks: 95
Language: Jupyter Notebook
Topics: interview, leetcode, pytorch
Online: HuggingFace Space

40+ Problems in 6 Categories

🧱 Fundamentals — "Implement X from scratch"

Write these without torch.nn:

#	Problem	Signature
01	ReLU	`relu(x)`
02	Softmax	`my_softmax(x, dim)`
03	Linear Layer	`SimpleLinear` (y = xW^T + b)
04	LayerNorm	`my_layer_norm(x, γ, β)`
07	BatchNorm	`my_batch_norm(x, γ, β)`
08	RMSNorm	`rms_norm(x, weight)`
15	SwiGLU MLP	`SwiGLUMLP`
16	Cross-Entropy Loss	`cross_entropy_loss(logits, targets)`
17	Dropout	`MyDropout`
18	Embedding	`MyEmbedding`
19	GELU	`my_gelu(x)`
20	Kaiming Init	`kaiming_init(weight)`
21	Gradient Clipping	`clip_grad_norm(params, max_norm)`
22	Conv2d	`my_conv2d(x, weight, ...)`
31	Gradient Accumulation	`accumulated_step(model, opt, ...)`
40	Linear Regression	`LinearRegression`

🧠 Attention Mechanisms

#	Problem	Signature
05	Scaled Dot-Product	`scaled_dot_product_attention(Q, K, V)`
06	Multi-Head Attention	`MultiHeadAttention`
09	Causal Self-Attention	`causal_attention(Q, K, V)`
10	Grouped Query Attention	`GroupQueryAttention`
11	Sliding Window	`sliding_window_attention(Q, K, V, w)`
12	Linear Attention	`linear_attention(Q, K, V)`
14	KV Cache	`KVCacheAttention`
23	Cross-Attention	`MultiHeadCrossAttention`
24	RoPE	`apply_rope(q, k)`
25	Flash Attention	`flash_attention(Q, K, V, block_size)`

🏗️ Architecture & Adaptation

#	Problem	Signature
13	GPT-2 Block	`GPT2Block`
26	LoRA	`LoRALinear`
27	ViT Patch Embedding	`PatchEmbedding`
28	Mixture of Experts	`MixtureOfExperts`

⚙️ Training & Optimization

#	Problem	Signature
29	Adam Optimizer	`MyAdam`
30	Cosine LR Scheduler	`cosine_lr_schedule(step, ...)`

🎯 Inference & Decoding

#	Problem	Signature
32	Top-k / Top-p Sampling	`sample_top_k_top_p(logits, ...)`
33	Beam Search	`beam_search(log_prob_fn, ...)`
34	Speculative Decoding	`speculative_decode(target, draft, ...)`

🔬 Advanced — Differentiators

#	Problem	Signature
35	BPE Tokenizer	`SimpleBPE`
36	INT8 Quantization	`Int8Linear`
37	DPO Loss	`dpo_loss(chosen, rejected, ...)`
38	GRPO Loss	`grpo_loss(logps, rewards, ...)`
39	PPO Loss	`ppo_loss(new_logps, old_logps, ...)`

How It Works

Open a Jupyter notebook template
Implement the function/class
Run the auto-grading cell
Instant pass/fail feedback

Docker, pip (torch-judge), Colab, or HuggingFace — zero GPU needed.

TorchCode vs Alternatives

Category: This is a PyTorch coding interview practice platform.

Feature	TorchCode	LeetCode ML	d2l.ai	fast.ai
Focus	PyTorch interview practice	Algorithm problems	DL textbook	DL course
Stars	1.4K ⭐	N/A (platform)	~24K ⭐	~27K ⭐
Auto-Grading	✅ Instant	✅	❌	❌
Problems	40+ ML-specific	Limited ML	Exercises	Exercises
Attention Variants	10 types	❌	2-3	1-2
RL Losses (DPO/PPO)	✅	❌	❌	❌
Flash Attention	✅	❌	❌	❌
MoE, LoRA, ViT	✅	❌	Partial	❌
No GPU Needed	✅	✅	❌	❌
Jupyter-Based	✅	❌	✅	✅
Self-Hosted	✅ Docker	❌ Cloud	❌	❌
Online	✅ HuggingFace	✅	✅	❌

When to choose TorchCode: You're preparing for ML coding interviews and need to practice implementing PyTorch primitives from scratch with auto-grading.

When to choose d2l.ai: You want a comprehensive DL textbook with explanations and theory.

When to choose fast.ai: You want a practical DL course with top-down learning approach.

Conclusion

TorchCode fills a critical gap in ML interview prep. While LeetCode covers algorithms, TorchCode covers what ML interviews actually ask: implement softmax, attention, GPT-2 blocks, LoRA, Flash Attention, DPO/PPO losses — all from scratch, all auto-graded, all in Jupyter. With 40+ problems across 6 categories and zero GPU requirement, it's the most targeted PyTorch interview practice platform.

Explore TorchCode on GitHub

TorchCode: The Complete Guide to LeetCode for PyTorch — ML Interview Practice

TorchCode: The Complete Guide to LeetCode for PyTorch — ML Interview Practice

What Is TorchCode?

40+ Problems in 6 Categories

🧱 Fundamentals — "Implement X from scratch"

🧠 Attention Mechanisms

🏗️ Architecture & Adaptation

⚙️ Training & Optimization

🎯 Inference & Decoding

🔬 Advanced — Differentiators

How It Works

TorchCode vs Alternatives

Conclusion

Resources

Tags

Claude Code Best Practice: The Complete Guide to Mastering Agentic Coding

Paperclip: The Complete Guide to Open-Source Orchestration for Zero-Human Companies

Crawlee Python: The Complete Guide to Web Scraping and Browser Automation