Skip to content
/ grime Public

A language for matching two-dimensional patterns, based on Boolean grammars.

License

Notifications You must be signed in to change notification settings

iatorm/grime

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Grime

Introduction

Grime is a two-dimensional pattern matching language based on Boolean grammars. The basic idea is to construct rectangular patterns from smaller components, and check whether they are found in the input matrix.

You can try Grime online, courtesy of Dennis from PPCG. Be sure to check the tutorial too.

Usage

A Grime program is called a grammar, and the correct file extension for a grammar is .gr, although this is not enforced. The program is called like

runhaskell grime.hs [options] grammarfile matrixfile

where matrixfile is a file containing the character matrix. For example, the digits program would be called like

runhaskell grime.hs digits.gr digit-matrix

If you want a speed boost, you should compile Grime with the command

ghc -O2 grime.hs

and then run the resulting executable:

grime [options] grammarfile matrixfile

There's also a bash script rungrime that compiles Grime if needed and then calls the executable with the provided arguments.

By default, the interpreter prints the first match it finds, but this can be controlled using the option flags:

  • -e: match only the entire matrix, print 1 for match and 0 for no match.
  • -n: print the number of matches, or the entire matrix if -e is also given.
  • -a: print all matches.
  • -p: print also the positions of the matches, in the format (x,y,w,h).
  • -s: do not print the matches themselves.
  • -b: include borders of width 1.
  • -d or -d0: print debug information.
  • -d1: print logs from the matcher.

These can be inserted from the command line, or in the beginning of any line of a grammar file, separated from the rest of the line by a backtick `.

Syntax and Semantics

A Grime grammar consists of one or more definitions, each on a separate line. Each of them defines the value of a nonterminal, and one of them must define the anonymous toplevel nonterminal. The syntax of a definition is either N=E or E, where N is an uppercase letter and E is an expression.

Expressions are constructed as follows.

  • Any character escaped with \ matches any 1x1 rectangle containing that character.
  • . matches any single character.
  • b matches a 1x1 rectangle outside the character matrix. It's only relevant if the -b option is used, or if the input is not rectangular.
  • e matches a 0-height or 0-width segment of the input's edge.
  • $ always matches.
  • f matches any 0-height (flat) rectangle, and t matches any 0-width (thin) rectangle.
  • The pre-defined character groups are digit, uppercase, lowercase, alphabetic, alphanumeric, and symbol.
  • New character classes can be defined by the syntax [a-prt-w,d-gu]. The letters on the left are included, and those on the right are excluded, so this matches exactly the letters abchijklmnoprtvw. If the left side is empty, it is taken to contain all characters and the border. The comma can be omitted if the right side is empty. The characters [],-\ must be escaped with \. \b matches a border.
  • An unescaped uppercase letter is a nonterminal, and matches the expression it is assigned.
  • _ is the anonymous toplevel nonterminal.
  • If P and Q are expressions, then PQ is just their horizontal concatenation, and P/Q is their vertical concatenation, with P on top.
  • P Q is like PQ, but with lower precedence than /. This sometimes saves parentheses.
  • P+ is one or more Ps aligned horizontally, and P/+ is the same aligned vertically.
  • The Boolean operations are denoted P|Q, P&Q and P!. Exclusive OR is P~Q, and difference is P-Q.
  • P? is shorthand for t|P, P/? for f|P, P* for t|P+, and P/* for f|P/+.
  • P{a-b,c-d}, where abcd are nonnegative integers, is a size constraint on P. If P is a character class, then the expression matches any mxn rectangle containing only those characters, provided that m is between a and b inclusive, and n is between c and d inclusive. In other cases, the expression matches any rectangle that has the correct size and that P also matches. If a or c are omitted, they are taken to be 0, and if b or d are omitted, they are infinite. If the hyphen between a and b is omitted, then we use the same number for both ends of the interval. If the entire c-d part is omitted, both axes are constrained. To clarify, {-b} is equivalent to {0-b,0-b}, and {a-,c} is equivalent to {a-infinity,c-c}. The } can be omitted.
  • P:a-b,c-d} is a grid specifier. It matches a rectangle that can be divided into an mxn grid of matches of P, where m is between a and b inclusive, and n is between c and d inclusive. The ranges and } can be omitted, with the same defaulting behavior as in the size constraint, except that the beginning of a range defaults to 1 instead of 0.
  • P#a-b} is a counting specifier. It matches a rectangle that contains between a and b possibly overlapping matches of P, inclusive. The } is optional, a defaults to 1 and b to infinity.
  • PoS, where S is a (greedily parsed) string of the characters 01234567, optionally terminated with }, is an orientation modifier. Each of the characters stands for a rotation (0123) or reflection (4567), which is applied to the expression P. The transformed expressions are combined with Boolean disjunctions. The characters OXNTKHADC encode convenient classes of orientations. The character F fixes an expression, so that its orientation cannot be modified.
  • <P> is a context bracket. It matches any rectangle r that's contained in a larger rectangle that matches P. The rectangle r can be matched by a digit in P; this is 0 by default, and increases with the nesting depth (0 matches the anchor of the innermost brackets, 1 the next one etc).

You can add the prefix ^ to any infix operator or chain of postfix operators to raise its precedence higher than all other operators, or v to lower it; for example, \a\b^|\cv+ is parsed as (\a(\b|\c))+. If i is any infix operator apart from concatenation or /, and p a chain of postfix operators, then PipQ is a shorthand for (PiQ)p; for example, \a|+\b is equivalent to (\a|\b)+. Double quotes "" are used for swapping the \-escaped status of all characters inside them, except for the characters "\/. They allow text grids like "this/is a/grid". Non-literal syntax elements, like nonterminals, operations and parentheses, can be used in quoted expressions by escaping them. Note that quotes define syntax elements, so "ab/cd"+/E will be parsed as (\a\b/\c\d)+/E. All open parentheses, brackets and quotes are implicitly closed at the end of a line. Lines beginning with | are comments, and the parser ignores them.

Notes

Grime does allow paradoxical definitions of the form A=A!; these will result in no match. In general, the matching engine can resolve any grammar where no match of a single nonterminal on a single rectangle depends on itself (although for complex grammars, this can take a while). A nontrivial example of a resolvable grammar is

A=E?\a
B=[ab]+&A!
E=A|B
a`A

Given a one-dimensional string of as and bs, this grammar will match every substring that ends in a. Note how the nonterminal A is self-referential in more than one way (via E directly, and via E through B), but since the E in A must be a proper substring, the system can always be resolved.

Grime supports non-rectangular inputs; the rows are simply aligned to the left, and the gaps can be matched using b.