Parsey
Parsey is a parser for context-free grammars. It utilizes earley parsing, a top-down chart parsing algorithm that can parse a large set of grammars.
Parsey takes a list of rules (which can be defined using the Rule
constructor)
and some input sentence, then constructs a parse tree for the given language.
The many applications of this include abstract syntax tree creation, NLP, and
arithmetic, to name a few.
Installation
Install from npm:
npm install parsey --save
Usage
const Rule = Rule;const Sym = Sym;const parse = parse; const sum = 'sum';const prod = 'prod';const factor = 'factor'; // Rule( [lhs : Sym], [rhs : Array], [valuator : Function] ) let grammar = // sum sum sum '+' prod x + y sum prod x // product prod prod '*' factor x * y prod factor x // factor factor '(' sum ')' x factor /\d+/ ; let expr = '2 + 3 * (1 + 4)';let parseTree = ; console;// => 17
Rules and Sym(bol)s
A Sym is nothing more than an object that references a specific symbol for use
in grammar rules, with a name
attribute that describes the symbol (names need
not be unique).
const Sym = Sym;let sum = 'sum';let prod = 'prod';
Rules describe the structure of a language. They contain a left-hand side consisting of a sole symbol, a right-hand side consisting of a list of symbols, and an optional evaluation function.
const Rule = Rule;// lhs rhs valuatorlet sumRule = sum sum '+' prod x + y;
This rule states that a sum can be any sum
, followed by a +
, followed by
any prod
. The rhs can be populated with Sym
objects, strings, or RegExp
objects.
This rule's valuator is a function that takes 3 arguments (one for each symbol on the rhs), and adds the first and last (we don't care about the '+').
The Rule class is extended from Array, so the rhs symbols correspond to a Rule's indices and all of the methods on Array.prototype will apply to the rhs.
The CFG
The CFG is a glorified array, a container for your rules. It has some methods
like rule()
and getSymbols()
, which can be handy for creating rules from
strings.
const CFG = CFG;let grammar = ; grammar;grammar;grammar; grammar; grammar;grammar;
rule()
will try to find existing symbols in the grammar that have the same
names as the ones that appear in a production string, and will create new Sym
objects if a symbol could not be found.
It will also treat string-looking symbols like "the"
as terminal symbols and
leave them as strings, and likewise will turn things like /regex/
into RegExp
objects, which are also terminals.
Parsing
Given a list of Rules, parsing becomes as simple as
const parse = parse;;
The parse()
function returns a parse tree, which is a node of the following
structure:
item: Rule children: Array
Elements in children can be either nodes of the above structure, or strings.
Example:
// 1 * 2 item: <Rule 'prod' prod '*' factor> children: item: <Rule 'factor' /\d+/> children: '1' '*' item: <Rule 'factor' /\d+/> children: '2'
Examples
Check out examples to see parsey in action for various use cases!
Testing
Testing parsey requres jasmine (2.0+). All specs can be found in spec/
. To run
all tests, simply run npm test
in the project's root.
License
MIT