Skip to content
/ zorex Public

Zorex: the omnipotent regex engine

License

Unknown and 2 other licenses found

Licenses found

Unknown
LICENSE
Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT
Notifications You must be signed in to change notification settings

hexops/zorex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Zorex: the omnipotent regex engine Hexops logo

CI

Zorex blurs the line between regex engine and advanced parsing algorithms used to parse programming languages.

With the most powerful of regex engines today, you can't parse HTML (a context-free language) or XML (a context-sensitive language), but you can with Zorex.

⚠️ Project status: in-development ⚠️

Under heavy development, not ready for use currently. Follow me on Twitter for updates.

How does it work?

Behind the scenes, Zorex parses a small DSL (the "zorex syntax", a regex-like syntax that enables opt-in EBNF-like syntax) and then at runtime builds a parser specifically for your input grammar.

It's a bit like a traditional parser generator, but done at runtime (instead of through code generation) and with a deep level of syntactic compatibility with traditional regex engines.

It uses an optimized GLL parser combinator framework called Combn to support parsing some of the most complex languages, including left-and-right recursive context-free and some context-sensitive languages, in a fast way.

A quick note about academic terminology

Technically, Zorex is "an advanced pattern matching engine", and it is arguably incorrect to call it a regular expression engine because regular expressions by nature cannot parse non-regular languages (such as HTML).

Any regex engine that supports backtracking, however, is also "not a regular expression engine", as the author of Perl's regex engine Larry Wall puts it:

“Regular expressions” […] are only marginally related to real regular expressions. Nevertheless, the term has grown with the capabilities of our pattern matching engines, so I’m not going to try to fight linguistic necessity here. I will, however, generally call them “regexes” (or “regexen”, when I’m in an Anglo-Saxon mood).

Since the aim of Zorex is to maintain a deep level of syntactical compatibility with other regex engines people are familiar with, and further extend that to support parsing more complex non-regular languages, we call Zorex a regex engine.

About

Zorex: the omnipotent regex engine

Topics

Resources

License

Unknown and 2 other licenses found

Licenses found

Unknown
LICENSE
Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Sponsor this project

 

Languages