-
Notifications
You must be signed in to change notification settings - Fork 0
Grammar File Syntax
Grammar (.gram) files define a PEG like syntax for declaring parsers. It includes some additions to aid in linking your parser to C++ code.
A Grammar file consists of a finite number of configuration options and parse rules.
All configuration options follow the pattern @name = 'value'
, these options can be used to control some behaviours of the code generator.
Special configurations:
-
header
- code that is placed at the top of the generated source file, include other code you need in here. -
footer
- code that is placed at the end of the generated source file. -
class_name
- the name of the generated parser class, default:CustomParser
-
inherits_from
- the class your generated parser inherits from, change this if you want to add custom functionality to the Parser class, default:Parser
-
disable_left_recursion
- set this to any non-empty value to disable left recursion handling and memoization caches.
A parse rule consists of a name, return type, and set of statements
rule_name[return_type] <statements>
Statements allow you to write detailed grammar expressions.
A token name must be only A-Z, 0-9 and _ characters.
A ID referring to another statement must be only a-z, 0-9 and _ characters.
-
A
- Match tokenA
-
e
- Match rulee
-
e*
- Match the sub-expressione
zero or more times, matches as many as possible -
e+
-e
one or more times -
e?
-e
zero or one times, gives anoptional
back. -
e | e2
- NOT IMPLEMENTED -
(a b)
- grouped expression, returns tuple -
v=e
- assign result of sub-expressione
to variablev
-
&e
- and predicate, invoke sub-expressione
and then succeeds ife
succeeds and fails if e fails, but in either case never consumes any input. -
!e
- not predicate, succeeds ife
fails and fails ife
succeeds, again consuming no input in either case.
stmt[int]
: e e2 { 1 };
: e2 e { 2 };
Statements are matched from top to bottom, returning on the first successful one (if any). The statement actions control what each statement returns, although each value must be able to be converted to the ules return type. actions are put into the source code as return action;
, make sure you write valid c++ code.
stmt[int]
: A B { 1 };
stmt
return an integer. it has one statement, which matches the token A then the token B, returning 1 on success.
stmt[int]
: x=A { x.value.length() };
The A
token is assigned to x
, you can refer to x in your action, in this case returning the length of the tokens value.
Note: x is of type Token
as A is a token
This grammar definition recognises the basic mathematical operations.
start[int]
: e=expr EOF { e };
expr[int]
: left=expr ADD right=term { left + right };
: left=expr SUB right=term { left - right };
: e=term { e };
term[int]
: left=term MUL right=factor { left * right };
: left=term DIV right=factor { left / right };
: e=factor { e };
factor[int]
: left=item POW right=factor { pow(left, right) };
: e=item { e };
item[int]
: n=INT { std::stoi(n.value) };
: LPAREN e=expr RPAREN { e };
All token definitions are in examples/calc/calc_lexer.hpp
and the entire grammar file in examples/calc/calc.gram
. Using this grammar it is possible to input simple mathematical expressions like 2 * (3 + 4)
.