A fast regular expression matcher written in C and accessible from Python3.
Install the latest stable release to Python with:
pip install https://github.com/tchlux/regex/archive/1.1.0.zip
In order to install the current files in this repository (potentially less stable) use:
pip install git+https://github.com/tchlux/regex.git
This module can also be installed by simply copying out the regex
subdirectory and placing it anywhere in the Python path.
import regex
reg = "rx"
string = "find the symbol 'rx' in this string"
print(regex.match(reg, string))
string = "the symbol is missing here"
print(regex.match(reg, string))
Descriptions of the match
function is contained in the help
documentation. Descriptions of the full list of allowed regular
expression syntax can be seen with import regex; help(regex)
.
python3 -m regex "<search-pattern>" "<path-pattern-1>" ["<path-pattern-2>"] [...]
Or
python3 regex.py "<search-pattern>" "<path-pattern-1>" ["<path-pattern-2>"] [...]
Search for the given regular expression in any files that match the path pattern regular expression. This will recurse through the subdirectory tree from the current directory.
This regular expression language is dramatically simplified for speed. The expression is converted into a set of tokens that must be matched, along with jump conditions that point to other tokens when a character in the searched string does and doesn't match the active token in the regular expression.
LANGUAGE SPECIFICATION FROM regex.c
// This file provides code for very fast regular expression matching
// over character arrays for regular expressions. Only accepts
// regular expressions of a reduced form, with syntax:
//
// . any character
// * 0 or more repetitions of the preceding token (group)
// ? 0 or 1 occurrence of the preceding token (group)
// | logical OR, either match the preceding token (group) or
// match the next token (group)
// () define a token group (designate order of operations)