Name		Name	Last commit message	Last commit date
parent directory ..
__pycache__		__pycache__
data		data
0-get-data.sh		0-get-data.sh
1-tokenizer.py		1-tokenizer.py
2-train.py		2-train.py
3-generate.py		3-generate.py
model.py		model.py
readme.md		readme.md

readme.md

Small Language Model

This is just a small language model that I made for fun. It's not very good but it's a good way to learn about GPT like transformer models actually work.

NOTE: Training is not well optimized and the dataset is rather small so there is a lot of room for improvement.

Before using standard methods

After making some trivial changes

Added RMSNorm
Added Gradient Clipping
1/10th the learning rate

Training loss over time

NOTE Differences in this model and GPT3 were taken from LLaMa and PaLM

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SmallLanguageModel

SmallLanguageModel

readme.md

Small Language Model

References

Files

SmallLanguageModel

Directory actions

More options

Directory actions

More options

Latest commit

History

SmallLanguageModel

Folders and files

parent directory

readme.md

Small Language Model

References