Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Terminal-Based Visualization Tool for Tokenized Data Points in Tiktoken Tokenizer #314

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

LVivona
Copy link

@LVivona LVivona commented Jun 18, 2024

Key Features

  • Token Visualization: Display token, and their positions in the input text.
  • Interactive Interface: Allows users to input text and see the tokenized output in real-time.
  • Enhanced Debugging: Facilitates better understanding of tokenizer behavior and helps in identifying issues.

Benefits

  • Improved Usability: Provides a straightforward way to visualize and analyze tokenized data.
  • Immediate Feedback: Users can quickly see the effects of changes to the tokenizer.
  • Educational: Helps users grasp model information and the tokenization process more effectively.
import tiktoken
enc = tiktoken.get_encoding("gpt2")
enc.environment()
Screenshot 2024-06-18 at 12 17 01 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant