Error recovery using lexer parser separation #80

ProphetLamb · 2022-11-14T16:14:30Z

Currently, the compilation is unable to produce more than one error, even if this is a recoverable error.
This PR implements a diagnostic pipeline that allows definition of arbitrary diagnostics and logging during paring.

Error recovery using a single stage parser is not really feasible. So we split between parser and lexer.
The lexer allocates memory when unescaping string literals, otherwise only a stream of slices is produced.
I am currently looking into if error recovery is possible in PEG without producing a bad grammar.
For now, all solution I could find are ugly, because PEG doesn't allow us to access the underlying stream. So I am probably going to port the Backus Naur like PEG syntax to a parser combinator (nom).

How to report diagnostics

the diag_report! macro accepts either a Span or a Range<Span> to report an error. Diagnostic messages must be predefined, as constants.

fn str_char_invalid(is: Span) -> IResult<Option<char>> {
    let (i, c) = preceded(char('\\'), anychar)(is.clone())?;
    diag_report!((&is..&i), ERR_INVALID_ESCAPE, c);
    Ok((i, None))
}

diag produces error messages similar to rust.

error[ELN0001]: Invalid integer `02439853427592345284395923845320459823457324953247640359104519845834634538`, number too large to fit in target type
--> test.reds:1:0 to l1:74
| 02439853427592345284395923845320459823457324953247640359104519845834634538

TODO

Is PEG Error Recovery possible ? implement : port to nom
Actually integrate the ported AST into the compiler pipeline
Dont use a boxed AST, instead use a pooled AST via context references. (needed for syntax receivers for additional diagnostics and quick fixes)

jac3km4 · 2022-11-14T20:08:11Z

Hi, this looks like a nice addition!
Some suggestions/ideas:

I haven't used nom much, so I don't know how difficult it'd be, but if we're introducing a new parsing lib, I'd rather have it replace peg entirely rather than mix the two to avoid bloat.
I'd avoid changing the AST to prevent this from spilling all over the codebase. I think this change should focus on the parser only and remain compatible with the old AST. If a new AST is really crucial, you can add the new parser in a new workspace crate and provide a conversion into the existing AST, so that a potential switch can be made in steps (and maybe old AST can be abandoned one day if it works out).

Your change coincided with me switching the main branch to one with preliminary work for the next minor release, which is mostly not tested yet, apologies for that!

ProphetLamb · 2022-11-15T20:36:44Z

@jac3km4 The changes to the AST mostly refer to replacing Box slices with vectors created using bump allocation.
Bump allocation creates a lifetime buffer that you can create a vector in with the similar speed as a stack alloc, bc it is effectively a stack alloc. Growing a vec that is not on top of this bump stack, moves it to the heap, without clearing the reserved space.
This is the best case for allocation a parse tree as flat as possible

71a7580#diff-d951465f240179b5ea43bf25dbc628cb985688c0574305fa4fc71a3eae81a00dR8-R19

You can see this illustrated in the example rule linked. Note that these are allocator vectors from bumpalo::collections::Vec, and have the same lifetime as the parse tree.

Alternatively, we can use the allocator api for std::vec::Vec instead, the bumpalo vec is just a reexport of a specific version of the std lib known to work.

jac3km4 · 2022-11-15T20:43:25Z

@ProphetLamb I've experimented with using a pooled approach early on in the project and I found there was no discernable difference in the CPU profile, it's far from being a bottleneck, and it was more complicated so I dropped the idea

ProphetLamb · 2022-11-15T20:53:15Z

@ProphetLamb I've experimented with using a pooled approach early on in the project and I found there was no discernable difference in the CPU profile, it's far from being a bottleneck, and it was more complicated so I dropped the idea

Luckily, bumpallo does all the work for us. And Box<[T<'ast>]> should be entirely replaceable with Vec<'ast, T<'ast>>.
Using a vec also allows for syntax analyzers modifying the ast post parse for features like metaprogramming, instead of hardcoded ModuleExists annotations, bc we can allocate into 'ast at any time.

jac3km4 · 2022-11-15T20:56:13Z

@ProphetLamb I think bumpalo is the one I've tried, but I can't remember at the moment - I'd be open to this kind of change if it's shown to make a difference in performance, but that's not what I've seen

ProphetLamb · 2022-11-15T21:04:24Z

Interesting. I have transitioned the script lang for the business logic used in the company I work at to pooled allocation, we got around 30% lower compilation times. But that one was written in C++. Maybe rust has just overall better at memory mgmt, idk

jac3km4 · 2022-11-15T21:11:53Z

At the moment the compiler is pretty highly IO-bound because it reads and writes pretty big cache files. These files contain bytecode for ~500k LoC written by the original game devs. I think there are a number of optimizations that could be applied there to significantly cut the compilation time.
Maybe then this would be more easily visible in the profile.
I'd be open to bring in bump allocation, but I think it needs to be based on some observable benefit, because it does complicate the code a little bit in many places.

ProphetLamb · 2022-11-15T21:21:46Z

That totally makes sense. If it does complicate things it is better to use the native approach, tho this is not the case using nom parsers, since they allow accessing the underlying stream context

jac3km4 · 2022-11-15T21:26:25Z

I think bump allocation in the code that parses the binaries could be worth a shot, and maybe zero-copy parsing strings, which I avoided initially because the data loaded from these files gets mutated and mixed in with data produced by the compiler, but there are ways around that like using Cow
But IMO all of this is out of scope for the change you're working on and should be considered as a separate thing

ProphetLamb · 2022-11-15T21:27:51Z

That is a good point for maybe another PR. Should I create an issue for that?

jac3km4 · 2022-11-15T21:28:02Z

sure!

ProphetLamb added 7 commits November 14, 2022 14:01

Add validators

78708aa

Code smell

e9e1ad0

Add docs to validator methods

8d29e2b

remove cps dep

646905c

Add validator docs

62ea951

Soft fail for integers

943a582

Explain macros

a1e13cc

ProphetLamb added 5 commits November 14, 2022 21:11

Recusive descend for multiline comment

4424232

Prepare lexer for parser impl

2aff6ec

Impl slab

227b7ae

Use bump allocation

c49b3c2

Use bumpalo instead

71a7580

ProphetLamb added 7 commits November 15, 2022 22:48

Remove bumpalo

fb5c45f

Copy stuff instead of lifetime ownership

b56c4f4

Use native sized integer for Pos

3d606ae

Diagnostic messages are fragments not sentences

26471e3

Implement Constant parser

c0879a3

move number to parser

d00e893

Separate string interpolation combinator

a7a5d76

ProphetLamb added 2 commits November 16, 2022 18:06

Add parsable + variant combinators

71dfef0

Impl field & func

0ba8f47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error recovery using lexer parser separation #80

Error recovery using lexer parser separation #80

Uh oh!

ProphetLamb commented Nov 14, 2022 •

edited

Loading

Uh oh!

jac3km4 commented Nov 14, 2022 •

edited

Loading

Uh oh!

ProphetLamb commented Nov 15, 2022 •

edited

Loading

Uh oh!

jac3km4 commented Nov 15, 2022

Uh oh!

ProphetLamb commented Nov 15, 2022 •

edited

Loading

Uh oh!

jac3km4 commented Nov 15, 2022

Uh oh!

ProphetLamb commented Nov 15, 2022

Uh oh!

jac3km4 commented Nov 15, 2022 •

edited

Loading

Uh oh!

ProphetLamb commented Nov 15, 2022

Uh oh!

jac3km4 commented Nov 15, 2022

Uh oh!

ProphetLamb commented Nov 15, 2022

Uh oh!

jac3km4 commented Nov 15, 2022

Uh oh!

Uh oh!

Error recovery using lexer parser separation #80

Are you sure you want to change the base?

Error recovery using lexer parser separation #80

Uh oh!

Conversation

ProphetLamb commented Nov 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to report diagnostics

TODO

Uh oh!

jac3km4 commented Nov 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ProphetLamb commented Nov 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jac3km4 commented Nov 15, 2022

Uh oh!

ProphetLamb commented Nov 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jac3km4 commented Nov 15, 2022

Uh oh!

ProphetLamb commented Nov 15, 2022

Uh oh!

jac3km4 commented Nov 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ProphetLamb commented Nov 15, 2022

Uh oh!

jac3km4 commented Nov 15, 2022

Uh oh!

ProphetLamb commented Nov 15, 2022

Uh oh!

jac3km4 commented Nov 15, 2022

Uh oh!

Uh oh!

ProphetLamb commented Nov 14, 2022 •

edited

Loading

jac3km4 commented Nov 14, 2022 •

edited

Loading

ProphetLamb commented Nov 15, 2022 •

edited

Loading

ProphetLamb commented Nov 15, 2022 •

edited

Loading

jac3km4 commented Nov 15, 2022 •

edited

Loading