Skip to content
/ nanoGPT Public
forked from karpathy/nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

License

Notifications You must be signed in to change notification settings

hrasto/nanoGPT

 
 

Repository files navigation

This is a fork of nanoGPT.

changes:

  • reporting loss in nats per byte (instead of nats per token)
  • model forward also returns attention scores of shape (batch_size, num_layers, num_heads, seq_len, seq_len), but it only works when not using flash attention
  • vocab/tokenization schemes
    • there are now 2 possibilities:
      1. byte (default; ord/chr => encode/decode with ascii text)
      2. gpt2 (using tiktoken.get_encoding('gpt2'))
    • for generation, sets automatically to byte if config.vocab_size==256, else sets to gpt2
    • for training, need to specify argument vocab (possible values 'byte', 'gpt2')

About

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%