NanoChat: The Complete Guide to Karpathy's $100 ChatGPT Training Harness
NanoChat is the simplest experimental harness for training LLMs by Andrej Karpathy. Train your own GPT-2 capability model for ~$48 (2 hours on 8×H100), then talk to it in a ChatGPT-like web UI. One dial (--depth) auto-configures all hyperparameters. 45,300+ stars, Python, MIT.
What Is NanoChat?
A minimal, hackable LLM training codebase that covers every stage of building a chatbot: tokenization, pretraining, supervised fine-tuning (SFT), reinforcement learning (RL), evaluation, inference, and a chat UI. Originally GPT-2 cost ~$43,000 to train in 2019 — NanoChat does it for $48. On spot instances, ~$15.
- Language: Python
- License: MIT
- Stars: 45,300+ ⭐
- Forks: 5,996
- Contributors: 47
- Author: Andrej Karpathy (former Director of AI at Tesla, founding team OpenAI)
The $100 ChatGPT
bash runs/speedrun.sh # 2 hours on 8×H100 = ~$48
python -m scripts.chat_web # Talk to your model
That's it. One script trains a GPT-2 capability model. One command starts the chat UI. The --depth parameter (number of transformer layers) is the single complexity dial — all other hyperparameters (width, heads, learning rate, training horizon, weight decay) are calculated automatically in an optimal way.
Full LLM Pipeline
1. Tokenizer
| File | Purpose |
|---|---|
tok_train.py | Train BPE tokenizer |
tok_eval.py | Evaluate compression rate |
tokenizer.py | BPE wrapper in GPT-4 style |
2. Pretraining
| File | Purpose |
|---|---|
base_train.py | Train base model |
base_eval.py | CORE score, bits per byte, samples |
dataloader.py | Tokenizing distributed data loader |
dataset.py | Download/read utils for pretraining data |
3. Fine-tuning
| File | Purpose |
|---|---|
chat_sft.py | Supervised fine-tuning |
chat_rl.py | Reinforcement learning |
4. Evaluation
| Task | Description |
|---|---|
| CORE | Base model evaluation (DCLM paper) |
| ARC | Multiple choice science questions |
| GSM8K | 8K grade school math |
| MMLU | Multiple choice, broad topics |
| HumanEval | Simple Python coding |
| SpellingBee | Spell/count letters |
| SmolTalk | Conglomerate dataset from HF |
5. Inference & Chat
| File | Purpose |
|---|---|
engine.py | Efficient inference with KV cache |
execution.py | LLM Python code execution (tool use) |
chat_cli.py | CLI chat interface |
chat_web.py | ChatGPT-like web UI |
The --depth Dial
NanoChat's key innovation: one parameter controls everything.
--depth 26 → GPT-2 capability (~$48)
--depth N → Auto-configures width, heads, LR, training horizon, weight decay
The miniseries training script (miniseries.sh) produces a series of compute-optimal models at different depths.
Guides by Karpathy
- Beating GPT-2 for <<$100 — The nanochat journey (Feb 2026)
- Miniseries v1 — First nanochat model series (Jan 2026)
- Adding abilities — "Counting r in strawberry" guide
- Infusing identity — Customize personality via synthetic data + SFT mixing
- Original nanochat post — Introduction (Oct 2025)
Hardware Flexibility
| Setup | Time | Cost |
|---|---|---|
| 8×H100 | ~2 hours | ~$48 |
| 8×H100 spot | ~2 hours | ~$15 |
| 8×A100 | Slower | Less |
| Single GPU | ~16 hours | Varies |
| CPU/MPS | Very slow | $0 |
Works on any PyTorch-supported hardware. Gradient accumulation automatically kicks in for single GPU.
NanoChat vs Alternatives
Category: This is a minimal LLM training harness.
| Feature | NanoChat | nanoGPT | LitGPT | Axolotl |
|---|---|---|---|---|
| Focus | Full chatbot pipeline | Pretraining only | Full pipeline | Fine-tuning |
| Stars | 45.3K ⭐ | ~37K ⭐ | ~10K ⭐ | ~8K ⭐ |
| Author | Karpathy | Karpathy | Lightning AI | Community |
| Stages | All 6 (tok → pre → SFT → RL → eval → chat) | Pretraining | Pre + SFT + eval | SFT + RL |
| Single Dial | ✅ --depth | ❌ | ❌ | ❌ |
| Chat UI | ✅ Web + CLI | ❌ | ✅ | ❌ |
| Code Execution | ✅ Tool use | ❌ | ❌ | ❌ |
| RL Training | ✅ | ❌ | ❌ | ✅ |
| Cost to GPT-2 | ~$48 | ~$100s | Varies | N/A (fine-tune) |
| Tokenizer Training | ✅ | ❌ | ❌ | ❌ |
| Hackable | ✅ Minimal | ✅ Minimal | Medium | Config-heavy |
When to choose NanoChat: You want the complete chatbot pipeline from tokenizer to chat UI, designed by Karpathy, with one complexity dial.
When to choose nanoGPT: You want pretraining only in the simplest possible code (NanoChat's predecessor).
When to choose LitGPT: You need a production-ready pipeline with more model architectures.
When to choose Axolotl: You want fine-tuning existing models with many adapter options.
Conclusion
NanoChat is Andrej Karpathy's masterclass in LLM training distilled to its essence. One --depth dial, one speedrun script, every stage from tokenizer training through RL to a ChatGPT-like web UI. Train your own GPT-2 for $48 and talk to it. With 45.3K stars, 6K forks, and guides that teach you how to add abilities and infuse personality, it's the most educational and accessible LLM training codebase in existence.
