This is a fork original LPeg library. It contains unstable modifications. If you search for original library, see this page.
This fork contains some experimental optimizations to the original LPeg library. It's a drop-in replacement and does not change pattern semantics, it just optimize them, on demand. The optimization ratio can be huge (hundreds of times faster in most favorable cases).
The optimizations provided are focused on cases where a pattern has a lot
of alternatives, for instance P'foo' + P'bar' + P'baz' + ...
. The vanilla
implementation would place a checkpoint a the beginning of the string and test
each pattern until it fails, rewind to the start of string and try next
pattern.
When you have hundreds or thousands of alternatives, it is very inefficient
(but it was not the intended use after all). This repository contains a
collection of patches to transform patterns into more efficient forms by
applying some rules pretty munch like the
Aho-Corasick algorithm.
For instance, the previous example is more efficient as
(P'ba' * (P'r' + P'z')) + P'foo'
. By merging the common prefix, LPeg virtual
machine can narrow string matching munch faster.
There is also some optimization to the LPeg VM diriectly: the ITestVector
has been added to test all alternatives in one go for the current character
rather than each alternative separately.
This is currently a work in progress and is considered unstable. If you find a bug, or if optimization changes your pattern semantics an any way: please send a bug report.
If you're interested in details about this work, you may want to read these blog articles about the design and results of these optimizations.
Optimizations are not enabled by default, you have to do it manually by calling
the :optimize
method.
local m = require"lpeg"
-- build pattern normally
p = (m.P'aaa' + m.P'bbb' + m.P'acc' + m.P'ddd')
-- now optimize it
opt_p = p:optimize()
print(opt_p:match 'acc')
The returned pattern is a regular LPeg pattern, that can still be used to build larger patterns (optimizations will be kept).
The optimize()
method takes an optional integer argument to set the limit of
optimization passes. The default limit (500) is more than enough for most
patterns, but huge ones needs up to thousands of passes to be fully optimized.
This can be quite long. the optimizer still needs to be... optimized !