Skip to content

Latest commit

 

History

History

SmallLanguageModel

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Small Language Model

This is just a small language model that I made for fun. It's not very good but it's a good way to learn about GPT like transformer models actually work.

NOTE: Training is not well optimized and the dataset is rather small so there is a lot of room for improvement.

Before using standard methods

Training loss over time

After making some trivial changes

  • Added RMSNorm
  • Added Gradient Clipping
  • 1/10th the learning rate

Training loss over time

Training loss over time

NOTE Differences in this model and GPT3 were taken from LLaMa and PaLM

References