Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incorrect segmentation for non-ASCII characters #386

Merged
merged 5 commits into from
May 2, 2023

Conversation

ItsLuized
Copy link
Contributor

@ItsLuized ItsLuized commented May 2, 2023

Recreated @robertkiel PR

Changes the segmentation code to respect UTF-8 character boundaries

Current situation

splitUtf8Strings alters strings whenever there is a non-ASCII character at maxBytes, such that joining the chunks leads to a different string.

New behavior

Respect characters with encoded length > 1 and split string accordingly

@ItsLuized ItsLuized added bug Something isn't working enhancement New feature or request labels May 2, 2023
@ItsLuized ItsLuized requested review from nionis and a team May 2, 2023 14:59
@nionis nionis self-assigned this May 2, 2023
@nionis nionis mentioned this pull request May 2, 2023
38 tasks
@diegoalzate diegoalzate merged commit 6c5f475 into main May 2, 2023
@diegoalzate diegoalzate deleted the robertkiel/fix-utf8-representation branch May 2, 2023 15:44
nionis added a commit that referenced this pull request Jun 9, 2023
nionis added a commit that referenced this pull request Jun 9, 2023
nionis pushed a commit that referenced this pull request Jun 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants