parse-entities

Parse HTML character references.

What is this?

This is a small and powerful decoder of HTML character references (often called entities).

When should I use this?

You can use this for spec-compliant decoding of character references. It’s small and fast enough to do that well. You can also use this when making a linter, because there are different warnings emitted with reasons for why and positional info on where they happened.

Install

This package is ESM only. In Node.js (version 14.14+, 16.0+), install with npm:

npm install parse-entities

In Deno with esm.sh:

import {parseEntities} from 'https://esm.sh/parse-entities@3'

In browsers with esm.sh:

<script type="module">
  import {parseEntities} from 'https://esm.sh/parse-entities@3?bundle'
</script>

Use

import {parseEntities} from 'parse-entities'

console.log(parseEntities('alpha &amp bravo')))
// => alpha & bravo

console.log(parseEntities('charlie &copycat; delta'))
// => charlie ©cat; delta

console.log(parseEntities('echo &copy; foxtrot &#8800; golf &#x1D306; hotel'))
// => echo © foxtrot ≠ golf 𝌆 hotel

API

This package exports the identifier parseEntities. There is no default export.

`parseEntities(value[, options])`

Parse HTML character references.

`options`

Configuration (optional).

`options.additional`

Additional character to accept (string?, default: ''). This allows other characters, without error, when following an ampersand.

`options.attribute`

Whether to parse value as an attribute value (boolean?, default: false). This results in slightly different behavior.

`options.nonTerminated`

Whether to allow nonterminated references (boolean, default: true). For example, &copycat for ©cat. This behavior is compliant to the spec but can lead to unexpected results.

`options.position`

Starting position of value (Position or Point, optional). Useful when dealing with values nested in some sort of syntax tree. The default is:

{line: 1, column: 1, offset: 0}

`options.warning`

Error handler (Function?).

`options.text`

Text handler (Function?).

`options.reference`

Reference handler (Function?).

`options.warningContext`

Context used when calling warning ('*', optional).

`options.textContext`

Context used when calling text ('*', optional).

`options.referenceContext`

Context used when calling reference ('*', optional)

Returns

string — decoded value.

`function warning(reason, point, code)`

Error handler.

Parameters

this (*) — refers to warningContext when given to parseEntities
reason (string) — human readable reason for emitting a parse error
point (Point) — place where the error occurred
code (number) — machine readable code the error

The following codes are used:

Code	Example	Note
`1`	`foo &amp bar`	Missing semicolon (named)
`2`	`foo &#123 bar`	Missing semicolon (numeric)
`3`	`Foo &bar baz`	Empty (named)
`4`	`Foo &#`	Empty (numeric)
`5`	`Foo &bar; baz`	Unknown (named)
`6`	`Foo baz`	Disallowed reference
`7`	`Foo &#xD800; baz`	Prohibited: outside permissible unicode range

`function text(value, position)`

Text handler.

Parameters

this (*) — refers to textContext when given to parseEntities
value (string) — string of content
position (Position) — place where value starts and ends

`function reference(value, position, source)`

Character reference handler.

Parameters

this (*) — refers to referenceContext when given to parseEntities
value (string) — decoded character reference
position (Position) — place where source starts and ends
source (string) — raw source of character reference

Types

This package is fully typed with TypeScript. It exports the additional types Options, WarningHandler, ReferenceHandler, and TextHandler.

Compatibility

This package is at least compatible with all maintained versions of Node.js. As of now, that is Node.js 14.14+ and 16.0+. It also works in Deno and modern browsers.

Security

This package is safe: it matches the HTML spec to parse character references.

wooorm/stringify-entities — encode HTML character references
wooorm/character-entities — info on character references
wooorm/character-entities-html4 — info on HTML4 character references
wooorm/character-entities-legacy — info on legacy character references
wooorm/character-reference-invalid — info on invalid numeric character references

Contribute

Yes please! See How to Contribute to Open Source.

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
.github/workflows		.github/workflows
lib		lib
.editorconfig		.editorconfig
.gitignore		.gitignore
.npmrc		.npmrc
.prettierignore		.prettierignore
funding.yml		funding.yml
index.d.ts		index.d.ts
index.js		index.js
license		license
package.json		package.json
readme.md		readme.md
test.js		test.js
tsconfig.json		tsconfig.json

License

wooorm/parse-entities

Folders and files

Latest commit

History

Repository files navigation

parse-entities

Contents

What is this?

When should I use this?

Install

Use

API

parseEntities(value[, options])

options

options.additional

options.attribute

options.nonTerminated

options.position

options.warning

options.text

options.reference

options.warningContext

options.textContext

options.referenceContext

Returns

function warning(reason, point, code)

Parameters

function text(value, position)

Parameters

function reference(value, position, source)

Parameters

Types

Compatibility

Security

Related

Contribute

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 14

Sponsor this project

Contributors 6

Languages

`parseEntities(value[, options])`

`options`

`options.additional`

`options.attribute`

`options.nonTerminated`

`options.position`

`options.warning`

`options.text`

`options.reference`

`options.warningContext`

`options.textContext`

`options.referenceContext`

`function warning(reason, point, code)`

`function text(value, position)`

`function reference(value, position, source)`