Skip to content

Wrap lines at Unicode word boundaries, using Intl.Segmenter.

License

Notifications You must be signed in to change notification settings

hildjj/wrap-segments

Repository files navigation

wrap-segments

Wrap lines at Unicode word boundaries, using Intl.Segmenter.

Existing wrapping libraries tend to work very well on plain ASCII-7 text. However, the world has lots of other text that needs to be wrapped.

You might want to turn this:

f̵̩̣̺ö̶̧̧̢o̶̥̩̗̹ ̶̨̢͔̳b̷̧̥͍̥a̷̛̦͓̜r̴̡͕̳̪

into this:

f̵̩̣̺ö̶̧̧̢o̶̥̩̗̹ ̶̨̢͔̳
b̷̧̥͍̥a̷̛̦͓̜r̴̡͕̳̪

by wrapping every 4 grapheme clusters.

Installation

npm install wrap-segments

API

None of the options are required, and you can omit the options entirely to take all of the defaults. The below example shows the default options:

import {SegmentWrapper} from '../lib/index.js'

const w = new SegmentWrapper({
  escape: identityTransform, // Escape inputs before proessing
  indent: '', // Can be a string or number
  indentChar: ' ', // If indent is a number, repeat this that many times
  indentEmpty: false, // If the input is empty, still indent?
  indentFirst: true, // Indent the first line?
  isEmpty: /^\s*$/u, // Is a given text segment empty?  Only applies to non-wordLike segments.
  isNewline: /((?![\r\n\v\f\x85\u2028\u2029])\s)*[\r\n\v\f\x85\u2028\u2029]+(\s*)/gu, // Replace newlines matching this with newlineReplacement
  locale: DEFAULT_LOCALE, // Default is calculated by the JS runtime
  newline: '\n', // Insert this at the end of every line
  newlineReplacement: ' ', // What to replace isNewline with
  trim: true, // Trim whitespace from the end of the input
  width: 80, // In grapheme clusters, *including* indent
})

const wrapped = w.wrap('Lorem Ipsum...')

Generated API documentation is available.

Command line

A CLI is available as a separate package.

Caveats

  • This hasn't been tested with enough languages. Please submit an issue or PR if you speak Korean, a language that uses the Devanagari script, a language that uses a right-to-left script such as Arabic or Hebrew, etc.
  • This does not implement the full line breaking algorithm from Unicode TR14. I'm hoping that the Intl.Segmenter word boundaries are "close enough" for most cases. It's hard to get access to all of the needed properties from the JS runtime without including version-specific Unicode data, which I don't want to do. However, there are some rules in that algorithm that would be worth adding, with some careful thought.

Tests codecov

About

Wrap lines at Unicode word boundaries, using Intl.Segmenter.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published