Skip to content

Grammar File Syntax

Conqu3red edited this page Sep 4, 2021 · 5 revisions

Grammar (.gram) files define a PEG like syntax for declaring parsers. It includes some additions to aid in linking your parser to C++ code.

A Grammar file consists of a finite number of configuration options and parse rules.

Configuration Options

All configuration options follow the pattern @name = 'value', these options can be used to control some behaviours of the code generator. Special configurations:

  • header - code that is placed at the top of the generated source file, include other code you need in here.
  • footer - code that is placed at the end of the generated source file.
  • class_name - the name of the generated parser class, default: CustomParser
  • inherits_from - the class your generated parser inherits from, change this if you want to add custom functionality to the Parser class, default: Parser
  • disable_left_recursion - set this to any non-empty value to disable left recursion handling and memoization caches.

Parse Rules

A parse rule consists of a name, return type, and set of statements rule_name[return_type] <statements>

Statements

Statements allow you to write detailed grammar expressions.

A token name must be only A-Z, 0-9 and _ characters.

A ID referring to another statement must be only a-z, 0-9 and _ characters.

Expressions

  • A - Match token A
  • e - Match rule e
  • e* - Match the sub-expression e zero or more times, matches as many as possible
  • e+ - e one or more times
  • e? - e zero or one times, gives an optional back.
  • e | e2 - NOT IMPLEMENTED
  • (a b) - grouped expression, returns tuple
  • v=e - assign result of sub-expression e to variable v
  • &e - and predicate, invoke sub-expression e and then succeeds if e succeeds and fails if e fails, but in either case never consumes any input.
  • !e - not predicate, succeeds if e fails and fails if e succeeds, again consuming no input in either case.

Multiple Statements

stmt[int]
    : e e2 { 1 };
    : e2 e { 2 };

Statements are matched from top to bottom, returning on the first successful one (if any). The statement actions control what each statement returns, although each value must be able to be converted to the ules return type. actions are put into the source code as return action;, make sure you write valid c++ code.

Examples

stmt[int]
   : A B { 1 };

stmt return an integer. it has one statement, which matches the token A then the token B, returning 1 on success.

stmt[int]
    : x=A { x.value.length() };

The A token is assigned to x, you can refer to x in your action, in this case returning the length of the tokens value.

Note: x is of type Token as A is a token

This grammar definition recognises the basic mathematical operations.

start[int]
    :  e=expr EOF  { e };

expr[int]
    :  left=expr ADD right=term   { left + right };
    :  left=expr SUB right=term   { left - right };
    :  e=term { e };

term[int]
    :  left=term MUL right=factor { left * right };
    :  left=term DIV right=factor { left / right };
    :  e=factor { e };

factor[int]
    :  left=item POW right=factor { pow(left, right) };
    :  e=item { e };

item[int]
    :  n=INT { std::stoi(n.value) };
    :  LPAREN e=expr RPAREN { e };

All token definitions are in examples/calc/calc_lexer.hpp and the entire grammar file in examples/calc/calc.gram. Using this grammar it is possible to input simple mathematical expressions like 2 * (3 + 4).

Clone this wiki locally