-
Notifications
You must be signed in to change notification settings - Fork 38
Error recovery using lexer parser separation #80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 0.6.x
Are you sure you want to change the base?
Conversation
Hi, this looks like a nice addition!
Your change coincided with me switching the main branch to one with preliminary work for the next minor release, which is mostly not tested yet, apologies for that! |
@jac3km4 The changes to the AST mostly refer to replacing Box slices with vectors created using bump allocation. 71a7580#diff-d951465f240179b5ea43bf25dbc628cb985688c0574305fa4fc71a3eae81a00dR8-R19 You can see this illustrated in the example rule linked. Note that these are allocator vectors from Alternatively, we can use the allocator api for |
@ProphetLamb I've experimented with using a pooled approach early on in the project and I found there was no discernable difference in the CPU profile, it's far from being a bottleneck, and it was more complicated so I dropped the idea |
Luckily, bumpallo does all the work for us. And |
@ProphetLamb I think bumpalo is the one I've tried, but I can't remember at the moment - I'd be open to this kind of change if it's shown to make a difference in performance, but that's not what I've seen |
Interesting. I have transitioned the script lang for the business logic used in the company I work at to pooled allocation, we got around 30% lower compilation times. But that one was written in C++. Maybe rust has just overall better at memory mgmt, idk |
At the moment the compiler is pretty highly IO-bound because it reads and writes pretty big cache files. These files contain bytecode for ~500k LoC written by the original game devs. I think there are a number of optimizations that could be applied there to significantly cut the compilation time. |
That totally makes sense. If it does complicate things it is better to use the native approach, tho this is not the case using nom parsers, since they allow accessing the underlying stream context |
I think bump allocation in the code that parses the binaries could be worth a shot, and maybe zero-copy parsing strings, which I avoided initially because the data loaded from these files gets mutated and mixed in with data produced by the compiler, but there are ways around that like using |
That is a good point for maybe another PR. Should I create an issue for that? |
sure! |
Currently, the compilation is unable to produce more than one error, even if this is a recoverable error.
This PR implements a diagnostic pipeline that allows definition of arbitrary diagnostics and logging during paring.
Error recovery using a single stage parser is not really feasible. So we split between parser and lexer.
The lexer allocates memory when unescaping string literals, otherwise only a stream of slices is produced.
I am currently looking into if error recovery is possible in PEG without producing a bad grammar.
For now, all solution I could find are ugly, because PEG doesn't allow us to access the underlying stream. So I am probably going to port the Backus Naur like PEG syntax to a parser combinator (nom).
How to report diagnostics
the
diag_report!
macro accepts either aSpan
or aRange<Span>
to report an error. Diagnostic messages must be predefined, as constants.diag produces error messages similar to rust.
TODO