NanoChat: The Complete Guide to Karpathy's $100 ChatGPT Training Harness

NanoChat is the simplest experimental harness for training LLMs by Andrej Karpathy. Train your own GPT-2 capability model for ~$48 (2 hours on 8×H100), then talk to it in a ChatGPT-like web UI. One dial (--depth) auto-configures all hyperparameters. 45,300+ stars, Python, MIT.

NanoChat on GitHub

What Is NanoChat?

A minimal, hackable LLM training codebase that covers every stage of building a chatbot: tokenization, pretraining, supervised fine-tuning (SFT), reinforcement learning (RL), evaluation, inference, and a chat UI. Originally GPT-2 cost ~$43,000 to train in 2019 — NanoChat does it for $48. On spot instances, ~$15.

Language: Python
License: MIT
Stars: 45,300+ ⭐
Forks: 5,996
Contributors: 47
Author: Andrej Karpathy (former Director of AI at Tesla, founding team OpenAI)

The $100 ChatGPT

bash runs/speedrun.sh      # 2 hours on 8×H100 = ~$48
python -m scripts.chat_web  # Talk to your model

That's it. One script trains a GPT-2 capability model. One command starts the chat UI. The --depth parameter (number of transformer layers) is the single complexity dial — all other hyperparameters (width, heads, learning rate, training horizon, weight decay) are calculated automatically in an optimal way.

Full LLM Pipeline

1. Tokenizer

File	Purpose
`tok_train.py`	Train BPE tokenizer
`tok_eval.py`	Evaluate compression rate
`tokenizer.py`	BPE wrapper in GPT-4 style

2. Pretraining

File	Purpose
`base_train.py`	Train base model
`base_eval.py`	CORE score, bits per byte, samples
`dataloader.py`	Tokenizing distributed data loader
`dataset.py`	Download/read utils for pretraining data

3. Fine-tuning

File	Purpose
`chat_sft.py`	Supervised fine-tuning
`chat_rl.py`	Reinforcement learning

4. Evaluation

Task	Description
CORE	Base model evaluation (DCLM paper)
ARC	Multiple choice science questions
GSM8K	8K grade school math
MMLU	Multiple choice, broad topics
HumanEval	Simple Python coding
SpellingBee	Spell/count letters
SmolTalk	Conglomerate dataset from HF

5. Inference & Chat

File	Purpose
`engine.py`	Efficient inference with KV cache
`execution.py`	LLM Python code execution (tool use)
`chat_cli.py`	CLI chat interface
`chat_web.py`	ChatGPT-like web UI

The --depth Dial

NanoChat's key innovation: one parameter controls everything.

--depth 26   →  GPT-2 capability (~$48)
--depth N    →  Auto-configures width, heads, LR, training horizon, weight decay

The miniseries training script (miniseries.sh) produces a series of compute-optimal models at different depths.

Guides by Karpathy

Beating GPT-2 for <<$100 — The nanochat journey (Feb 2026)
Miniseries v1 — First nanochat model series (Jan 2026)
Adding abilities — "Counting r in strawberry" guide
Infusing identity — Customize personality via synthetic data + SFT mixing
Original nanochat post — Introduction (Oct 2025)

Hardware Flexibility

Setup	Time	Cost
8×H100	~2 hours	~$48
8×H100 spot	~2 hours	~$15
8×A100	Slower	Less
Single GPU	~16 hours	Varies
CPU/MPS	Very slow	$0

Works on any PyTorch-supported hardware. Gradient accumulation automatically kicks in for single GPU.

NanoChat vs Alternatives

Category: This is a minimal LLM training harness.

Feature	NanoChat	nanoGPT	LitGPT	Axolotl
Focus	Full chatbot pipeline	Pretraining only	Full pipeline	Fine-tuning
Stars	45.3K ⭐	~37K ⭐	~10K ⭐	~8K ⭐
Author	Karpathy	Karpathy	Lightning AI	Community
Stages	All 6 (tok → pre → SFT → RL → eval → chat)	Pretraining	Pre + SFT + eval	SFT + RL
Single Dial	✅ --depth	❌	❌	❌
Chat UI	✅ Web + CLI	❌	✅	❌
Code Execution	✅ Tool use	❌	❌	❌
RL Training	✅	❌	❌	✅
Cost to GPT-2	~$48	~$100s	Varies	N/A (fine-tune)
Tokenizer Training	✅	❌	❌	❌
Hackable	✅ Minimal	✅ Minimal	Medium	Config-heavy

When to choose NanoChat: You want the complete chatbot pipeline from tokenizer to chat UI, designed by Karpathy, with one complexity dial.

When to choose nanoGPT: You want pretraining only in the simplest possible code (NanoChat's predecessor).

When to choose LitGPT: You need a production-ready pipeline with more model architectures.

When to choose Axolotl: You want fine-tuning existing models with many adapter options.

Conclusion

NanoChat is Andrej Karpathy's masterclass in LLM training distilled to its essence. One --depth dial, one speedrun script, every stage from tokenizer training through RL to a ChatGPT-like web UI. Train your own GPT-2 for $48 and talk to it. With 45.3K stars, 6K forks, and guides that teach you how to add abilities and infuse personality, it's the most educational and accessible LLM training codebase in existence.

Explore NanoChat on GitHub

NanoChat: The Complete Guide to Karpathy's $100 ChatGPT Training Harness

NanoChat: The Complete Guide to Karpathy's $100 ChatGPT Training Harness

What Is NanoChat?

The $100 ChatGPT

Full LLM Pipeline

1. Tokenizer

2. Pretraining

3. Fine-tuning

4. Evaluation

5. Inference & Chat

The --depth Dial

Guides by Karpathy

Hardware Flexibility

NanoChat vs Alternatives

Conclusion

Resources

Tags

Claude Code Best Practice: The Complete Guide to Mastering Agentic Coding

Paperclip: The Complete Guide to Open-Source Orchestration for Zero-Human Companies

Crawlee Python: The Complete Guide to Web Scraping and Browser Automation