.

openai · hauntsaninja · Jan 30, 2024 · Oct 6, 2023 · Jan 30, 2024 · Jan 30, 2024
commit ede70dcc697ee7ad5c3061a43a8843db5f45d162
diff --git a/README.md b/README.md
@@ -22,6 +22,7 @@ The tokeniser API is documented in `tiktoken/core.py`.
 Example code using `tiktoken` can be found in the
 [OpenAI Cookbook](https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb).
 
+
 ## Performance
 
 `tiktoken` is between 3-6x faster than a comparable open source tokeniser:
@@ -31,6 +32,7 @@ Example code using `tiktoken` can be found in the
 Performance measured on 1GB of text using the GPT-2 tokeniser, using `GPT2TokenizerFast` from
 `tokenizers==0.13.2`, `transformers==4.24.0` and `tiktoken==0.2.0`.
 
+
 ## Getting help
 
 Please post questions in the [issue tracker](https://github.com/openai/tiktoken/issues).
@@ -40,7 +42,7 @@ If you work at OpenAI, make sure to check the internal documentation or feel fre
 
 ## What is BPE anyway?
 
-Large language models don't see text like you and I, instead they see a sequence of numbers (known as tokens).
+Language models don't see text like you and I, instead they see a sequence of numbers (known as tokens).
 Byte pair encoding (BPE) is a way of converting text into tokens. It has a couple desirable
 properties:
 1) It's reversible and lossless, so you can convert tokens back into the original text
@@ -65,10 +67,12 @@ enc = SimpleBytePairEncoding.from_tiktoken("cl100k_base")
 enc.encode("hello world aaaaaaaaaaaa")
 ```
 
+
 ## Extending tiktoken
 
 You may wish to extend `tiktoken` to support new encodings. There are two ways to do this.
 
+
 **Create your `Encoding` object exactly the way you want and simply pass it around.**
 
 ```python
@@ -124,4 +128,3 @@ setup(
 
 Then simply `pip install ./my_tiktoken_extension` and you should be able to use your
 custom encodings! Make sure **not** to use an editable install.
-