Skip to content

Latest commit

 

History

History
78 lines (50 loc) · 3.88 KB

Grammar-Analysis.md

File metadata and controls

78 lines (50 loc) · 3.88 KB

Grammar Analysis

The PEGTL contains an analyze()-function that checks a grammar for rules that can go into an infinite loop without consuming input.

Unfortunately, given the expressive power of PEGs and the possibility of arbitrary custom combinator rules, it is impossible to detect all kinds of infinite loops.

It does however catch most cases of left-recursion that are typical for grammars converted from CFGs or other formalisms that gracefully handle left-recursion.

Rule Analysis

In order to run an analysis on a grammar it is necessary to explicitly include <tao/pegtl/analyze.hpp>. Then call tao::pegtl::analyze() with the top-level grammar rule as template argument.

#include <tao/pegtl/analyze.hpp>

const std::size_t issues_found = tao::pegtl::analyze< my_grammar >();

The analyze()-function returns the number of issues found and writes some information about them to std::cout.

Analysing a grammar is usually only done while developing and debugging a grammar, or after changing it.

Regarding the kinds of issues that are detected, consider the following example grammar rules.

struct bar;

struct foo
   : sor< digit, bar > {};

struct bar
   : plus< foo > {};

When attempting to match bar against an input where the next character is not a digit the parser immediately goes into an infinite loop between bar calling foo and then foo calling bar again.

As shown by the example program src/example/pegtl/analyze.cpp, the grammar analysis will correctly detect a cycle without progress in this grammar.

Due to the differences regarding back-tracking and non-deterministic behaviour, this kind of infinite loop is a frequent issue when translating a CFG into a PEG.

Background

In order to look for infinite loops in a grammar, the analyze()-function needs some information about all rules in the grammar. This "information" consists of a classification of the rules according to the following enum, plus, for non-atomic rules, a list of the sub-rules.

// namespace tao::pegtl::analysis

enum class rule_type : char
{
   ANY,
   OPT,
   SEQ,
   SOR
};

This enum value and rule list are provided to the analyze()-function via an analyze_t type member that all rules that are part of a grammar that is to be analysed with analyze() need to define.

The names of the enum values correspond to one of the PEGTL rule classes that has this rule type, however some rule types are used by many different classes.

  • ANY is for rules where "success implies consumption" is true; assumes bounded repetition of conjunction of sub-rules.
  • OPT is for rules where "success implies consumption" is false; assumes bounded repetition of conjunction of sub-rules.
  • SEQ is for rules where consumption on success depends on non-zero bounded repetition of the conjunction of sub-rules.
  • SOR is for rules where consumption on success depends on non-zero bounded repetition of the disjunction of sub-rules.

At the beginning of an analyze()-run the function R::analyze_t::insert() is called for all rules R in the grammar in order to insert the information about the rule R into a data structure.

Custom Rules

For custom rules it should usually be sufficient to follow the lead of the rules supplied with the PEGTL and define analyze_t to either tao::pegtl::analysis::generic or tao::pegtl::analysis::counted. In both cases, the rule_type and the list of sub-rules must be supplied as template parameters. Class tao::pegtl::analysis::counted additionally takes an integer argument Count with the assumption being that a count of zero indicates that everything the rule type is OPT while a non-zero count uses the rule type given as template parameter.

When a custom rule goes beyond what can be currently expressed and all other questions, please contact the authors at taocpp(at)icemx.net.

Copyright (c) 2014-2018 Dr. Colin Hirsch and Daniel Frey