Skip to content
/ bytebpe Public

Fast GPT2 style byte-level BPE learner and tokenizer

Notifications You must be signed in to change notification settings

leo-du/bytebpe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bytebpe

GPT2 style byte-level performant BPE learner and tokenizer

Python binding implemented with PyBind11

Build

mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make -j

Then, in the the build directory, you can run the Python REPL:

>>> import bytebpe
>>> bpe = bytebpe.ByteBPE()
>>> bpe.[tab]
bpe.decode(          bpe.encode_token(    bpe.load_from_file(  
bpe.encode_line(     bpe.learn(           bpe.save_to_file(

About

Fast GPT2 style byte-level BPE learner and tokenizer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages