Parsing and translation framework for rust.
- Procedural macro-based lexers (turn_lexer_derive)
- Procedural macro-based grammars
- ielr parsing
- minimal DFA implementation
- hierarchical token descriptions
- character categories
- sub-expressions
- Two character skipping modes: Regex (regex; defaults to skipping all whitespace; invokes errors) / Permissive (reads until any token starts matching)
- Two regex modes: Token (literal string) / Regex (regular expression)
- `_`: any character
- `*`: repetition of preceding charecter or group 0-`n` times
- `+`: repetition of preceding charecter or group 1-`n` times
- `?`: repetition of preceding charecter or group 0-1 times
- `{m}`: repetition of preceding charecter or group `m` times
- `{m-n}`: repetition of preceding charecter or group `m`-`n` times
- `{-n}`: repetition of preceding charecter or group at most `n` times
- `{m-}`: repetition of preceding charecter or group at least `m` times
- `\`: escape special characters
- `[...]`: set of characters or categories
- `[!...]`: set of forbidden characters or categories
- `A|B`: will match either regex A or B
- `()`: groups regular expression
- `<...>`: reference sub-expression or category
utf-8
- `<lower>`: lowercase utf-8 character
- `<upper>`: uppercase utf-8 character
- `<alpha>`: alphabetic utf-8 character
- `<alnum>`: alphanumeric utf-8 character
- `<digit>`: matches any utf-8 digit
- `<whitespace>`: matches utf-8 whitespace
ASCII
- `<a-z>`: lowercase ASCII letter
- `<A-Z>`: uppercase ASCII letter
- `<a-Z>`: alphabetic ASCII
- `<0-Z>`: alphanumeric ASCII
- `<0b>`: matches binary digits
- `<0-9>`: matches decimal digits
- `<0x>`: matches hex digits
- `< >`: `[ \t\n\r\f\v]`
use turn::Lexer;
#[derive(Lexer)]
#[lexer(skip = r"< >*")]
enum JSONToken {
#[lexer(regex::string_char = "[!\"\\\\]|\\<escaped_char>")]
#[lexer(regex::escaped_char = "[\"\\\\/bfnrt]|u<0x>{4}")]
#[regex = "\"<string_char>*\""]
String,
#[lexer(category::nonzero_digit = "123456789")]
#[lexer(regex::integer = "-?(<nonzero_digit><0-9>*|0)")]
#[lexer(regex::fraction = ".<0-9>*")]
#[lexer(regex::exponent = "(E|e)(\+|-)?<0-9>+")]
#[regex = r"<integer><fraction>?<exponent>?"]
Number,
#[token = "true"]
True,
#[token = "false"]
False,
#[token = "null"]
Null,
#[token = '{']
LBrace,
#[token = '}']
RBrace,
#[token = ',']
Comma,
#[token = ':']
Colon,
#[token = '[']
LBracket,
#[token = ']']
RBracket,
}
- C identifier:
"[_<a-Z>][_<0-Z>]*"
- CamelCaseIndentifier:
"<A-Z><a-Z>*"
- utf-8 CamelCaseIndentifier:
"<upper><alpha>*"
- JSON number:
"-?([123456789]<0-9>*|0)(.<0-9>+)?([eE][+-]?<0-9>+)?"
- ASCII text:
"<a-Z>*"