GitHub - hrasto/nanoGPT: The simplest, fastest repository for training/finetuning medium-sized GPTs.

This is a fork of nanoGPT.

changes:

reporting loss in nats per byte (instead of nats per token)
model forward also returns attention scores of shape (batch_size, num_layers, num_heads, seq_len, seq_len), but it only works when not using flash attention
vocab/tokenization schemes
- there are now 2 possibilities:
  1. byte (default; ord/chr => encode/decode with ascii text)
  2. gpt2 (using tiktoken.get_encoding('gpt2'))
- for generation, sets automatically to byte if config.vocab_size==256, else sets to gpt2
- for training, need to specify argument vocab (possible values 'byte', 'gpt2')

Name		Name	Last commit message	Last commit date
Latest commit History 221 Commits
assets		assets
config		config
data		data
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bench.py		bench.py
configurator.py		configurator.py
model.py		model.py
sample.py		sample.py
scaling_laws.ipynb		scaling_laws.ipynb
test.ipynb		test.ipynb
train.py		train.py
transformer_sizing.ipynb		transformer_sizing.ipynb

Provide feedback