Skip to content

Latest commit

 

History

History
113 lines (73 loc) · 4.59 KB

README.md

File metadata and controls

113 lines (73 loc) · 4.59 KB

irrec

Build Status codecov.io

An implementation of regular expressions based on recursion schemes and Kleene algebras.

The name is a shameless rip-off of irreg, which this library was inspired by. It's different than irreg in that it uses recursion schemes, hence the name.

warning

At this point, this library is just me playing around and learning some things. It provides no stability guarantees, and I don't know if I'll ever even get around to publishing it. That said, if you are interested in actually using it, please let me know! If enough people star this repository maybe I'll do something with it.

creating and matching a string regular expression

import ceedubs.irrec.regex._, Regex._

// `*` denotes that the expression on the right should follow the expression on the left.
val animal: Regex[Char] = (oneOf('b', 'c', 'r') | seq("gn")) * seq("at")

val isAnimal: String => Boolean = animal.stringMatcher
isAnimal("bat")

isAnimal("cat")

isAnimal("rat")

isAnimal("gnat")

isAnimal("hat")

isAnimal("toaster")

printing a regular expression

Regular expressions can be printed in a (hopefully) POSIX style:

animal.pprint

converting a regular expression to a Java Pattern

Regular expressions can be converted to a java.util.regex.Pattern:

animal.toPattern

Currently there is no support for converting a Pattern (or its String form) into an irrec Regex.

generating data that matches a regular expression

Irrec provides support for creating Scalacheck generators that produce values that match a regular expression. This generation is done efficiently as opposed to generating a bunch of random values and then filtering the ones that don't match the regular expression (which would quickly lead to Scalacheck giving up on generating matching values).

val n: Regex[Char] = range('2', '9')
val adjective: Regex[Char] = oneOfR(seq("happy"), seq("tired"), seq("feisty"))
val phrase: Regex[Char] = n * lit(' ') * adjective * lit(' ') * animal * lit('s')
import ceedubs.irrec.regex.RegexGen.regexMatchingStringGen
import org.scalacheck.Gen
import org.scalacheck.Arbitrary.arbitrary
import org.scalacheck.rng.Seed

val phraseGen: Gen[String] = regexMatchingStringGen(arbitrary[Char])(phrase)
Gen.listOfN(3, phraseGen).apply(Gen.Parameters.default, Seed(105769L))

optimizing a regular expression

Irrec has some support for optimizing a regular expression, though at this point it probably won't do much to optimize most regular expressions.

val inefficientRegex: Regex[Char] = lit('a').star.star.star
inefficientRegex.pprint
val moreEfficientRegex: Regex[Char] = inefficientRegex.optimize
moreEfficientRegex.pprint

performance

Irrec has been built with algorithmic performance in mind but at this point, it isn't built to be blazingly fast. It is built with a focus on correctness and clean (by some standard) functional code.

Some benchmark results can be viewed in the benchmarks/results directory. In a sentence (that is a really lossy representation of reality), for common use-cases Java's Pattern performs about an order of magnitude better than irrec (think 10M matches per second vs 1M). However for some extreme cases, irrec performs several orders of magnitude better (think 500k matches per second vs 30).

inspiration and credits

A number of libraries and resources were useful as inspiration and reference implementations for the code in this library. A special thanks goes out to these: