FractalFormer

this is a project where I create self-similarity at (hopefully) all levels of a decoder-only transformer. The idea is to take all of the things learned in my matryoshkaGPT replication project and instead of having a series of single russian nesting dolls inside of each other, each "inside" contains multiple similar russian nesting dolls. Think of it like how each triangle in Surpinski's Triangle has three triangles within it. I think at some point this will allow me to do interesting things such as

multiple small models for MOE speculative decoding in parallel to increase chances of a match
a new weird version of MOE where all experts exist simultaneously rather than being gated
infinite fusion of transformer models of a given size into transformer models of a larger size
take advantage of the fact that language has a fractal-structure^1 ^2 to create an (infinitely? effectively infinitely?) extendable maximum context length if i can figure out how to properly borrow ideas from my previous next-concept prediction project and/or from Multi-Word Tokenization for Sequence Compression. more on this later
specialize a model for use with conversational swarm intelligence
i think i can eventually meet the criteria for consciousness as defined in psychology of consciousness paper

Repo Guide

FractalFormer_base.ipynb: currently the only file that is both functional and readable. This is where i recommend you start if you're curious about the project because it's not only heavily commented but also has extensive print statements so you can see for yourself what's happening. I do however need to go update all the images and give more thorough walkthroughs in the pre-code markdown cells. Check out the following video I made on this file:

FractalFormer_ModelMerging.ipynb: This document is currently in a very early stage, i think all i've done so far is make a dynamically defined hyperparameter config file. Basically I'd like to train multiple separate non-FractalFormer small models, freeze their weights, and then concatenate & merge them into a proper FractalFormer as defined in the previous document. If you'd like to contribute and want more details on the task at hand let me know; i think this is one that i could properly convey to someone who knows what they're doing with coding transformers in pytorch.
FractalFormer_UpstreamResidual.ipynb: Further work on this file has been delayed until the above is finished. Not sure I can fully convey why i'm doing what i'm doing here as I'm still working largely off of intuition. Basically, in the base version when you perform inference you have to choose between which of the models you want to run and they all are capable of running separately, but here in UpstreamResidaul what I want to do is for any given model you want to run inference on, all of its sub-models will also run in parallel and their residual states will be concatenated and added to that of the model of interest. This is essentially how i create a connection between all the models in my eventual goal into creating a kind of hive-mind. Might switch this to an additive or multiplicative LoRA later to filter the total amount of information that gets transferred to a smaller subspace.
FractalFormer_DownstreamResidual.ipynb: like the previous document except instead of sending the residual streams from the small models up to the big ones, i split apart the residual streams of the big model and send it down to the small ones. I think this may be useful for my MOE idea down the line.
FractalFormer_InbuiltTokenizer.ipynb: the idea here is to use byte-level tokens and let the model essentially create its own tokens, thereby completely getting rid of the tokenization step in language modeling but still having a similarly emergent framework to allow for an effecitve vocab size larger than 256. I'm messing around with different potential ways to do this over in weird_embeddings.ipynb but we're a ways off from me having something concrete to explain. Basically for now i just plan on adding & norming the embeddings of bytes together to create a "concept" and then having the higher level models think in terms of concepts rather than the 256 vocab options they have.
config.py, tokenizer.py, and FractalFormer_base.py WERE all code directly copied from FractalFormer_base.ipynb so that the classes & functions can then be imported into the other files. I say "were" because i've recently changed the jupyter notebook and need to remember to copy & paste the new stuff into the .py files.
input.txt is just TinyShakespare. Eventually we'll branch out but this is fine for testing on my macbook.
tokenizers/tokenizer.model is a very simple tokenizer that takes the 65 unqiue characters in tinyshakespeare and turns them into 128 tokens. Originally made for the repo that I build all my models off of here. Might eventually get deleted if the concept-byte thing works, or it might hang around for even longer.
models/ contains all of the models i've trained so far, which as of right now consists of 3 checkpoints of a roughly 1m parameter model from FractalFormer_base.ipynb. I don't think i'll be going past 1 million parameters for the foreseeable future, really until i nail down an architecture that i like well enough to start do legit testing.
images/ is where i put drawings that help demonstrate what's happening in the code. One of these days i'll make a far easier guide.

Relevant inspiration papers that weren't already cited:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FractalFormer

Repo Guide

Relevant inspiration papers that weren't already cited:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
images		images
models		models
tokenizers		tokenizers
.gitignore		.gitignore
FractalFormer_DownstreamResidual.ipynb		FractalFormer_DownstreamResidual.ipynb
FractalFormer_InbuiltTokenizer.ipynb		FractalFormer_InbuiltTokenizer.ipynb
FractalFormer_ModelMerging.ipynb		FractalFormer_ModelMerging.ipynb
FractalFormer_UpstreamResidual.ipynb		FractalFormer_UpstreamResidual.ipynb
FractalFormer_base.ipynb		FractalFormer_base.ipynb
FractalFormer_base.py		FractalFormer_base.py
README.md		README.md
configs.py		configs.py
input.txt		input.txt
tokenizer.py		tokenizer.py
weird_embeddings.ipynb		weird_embeddings.ipynb

evintunador/FractalFormer

Folders and files

Latest commit

History

Repository files navigation

FractalFormer

Repo Guide

Relevant inspiration papers that weren't already cited:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages