Zorex: the omnipotent regex engine

Zorex blurs the line between regex engine and advanced parsing algorithms used to parse programming languages.

With the most powerful of regex engines today, you can't parse HTML (a context-free language) or XML (a context-sensitive language), but you can with Zorex.

⚠️ Project status: in-development ⚠️

Under heavy development, not ready for use currently. Follow me on Twitter for updates.

How does it work?

Behind the scenes, Zorex parses a small DSL (the "zorex syntax", a regex-like syntax that enables opt-in EBNF-like syntax) and then at runtime builds a parser specifically for your input grammar.

It's a bit like a traditional parser generator, but done at runtime (instead of through code generation) and with a deep level of syntactic compatibility with traditional regex engines.

It uses an optimized GLL parser combinator framework called Combn to support parsing some of the most complex languages, including left-and-right recursive context-free and some context-sensitive languages, in a fast way.

A quick note about academic terminology

Technically, Zorex is "an advanced pattern matching engine", and it is arguably incorrect to call it a regular expression engine because regular expressions by nature cannot parse non-regular languages (such as HTML).

Any regex engine that supports backtracking, however, is also "not a regular expression engine", as the author of Perl's regex engine Larry Wall puts it:

“Regular expressions” […] are only marginally related to real regular expressions. Nevertheless, the term has grown with the capabilities of our pattern matching engines, so I’m not going to try to fight linguistic necessity here. I will, however, generally call them “regexes” (or “regexen”, when I’m in an Anglo-Saxon mood).

Since the aim of Zorex is to maintain a deep level of syntactical compatibility with other regex engines people are familiar with, and further extend that to support parsing more complex non-regular languages, we call Zorex a regex engine.

Name		Name	Last commit message	Last commit date
Latest commit History 255 Commits
.github		.github
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
build.zig		build.zig

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

Zorex: the omnipotent regex engine

⚠️ Project status: in-development ⚠️

How does it work?

A quick note about academic terminology

About

Licenses found

Sponsor this project

Contributors 2

Languages

License

Licenses found

hexops/zorex

Folders and files

Latest commit

History

Repository files navigation

Zorex: the omnipotent regex engine

⚠️ Project status: in-development ⚠️

How does it work?

A quick note about academic terminology

About

Topics

Resources

License

Licenses found

Stars

Watchers

Forks

Sponsor this project

Contributors 2

Languages