Add support for some F# syntax features by Thorium · Pull Request #166 · ionide/tree-sitter-fsharp

Thorium · 2026-03-26T18:56:32Z

Add support for the following F# syntax features (all tests passing):

open type declarations
FSI directives (#time, #I, #help, #quit)
XML doc comments (/// as distinct xml_doc node)
Type test pattern in atomic patterns (:? Type)
Quotation splicing (%%) prefix operator
fixed expressions
Range expressions in computation expressions
Preprocessor boolean conditions (&&, ||, !, parens, true/false)
Extern/P/Invoke declarations
SRTP trait call expressions (^a : (static member ...) )
Triple-quoted string interpolation ($""" {expr} """)
Operator precedence for && and || (split into 3 levels)
module rec (recursive modules)
and! in computation expressions
struct tuple type annotations (struct (int * int))
Add optional type_argument_constraints to _function_or_value_defn_body for SRTP 'when' constraints on return type annotations
Add optional 'struct' to anon_record_type for struct anonymous record types
Add optional 'then' clause to additional_constr_defn for secondary constructor initialization expressions
Expand fsharp_signature parser: named_module, namespace (global/rec), module_defn, type_definition, exception_definition, import_decl, module_abbrev, compiler_directive_decl, preproc_if support
Fix indentation bug in type extension with ($) identifier test
Add quotation expression support (<@ @>, <@@ @@>) with external tokens
Add multi-dollar triple-quoted string interpolation ($$"""...""", $$$"""...""")
Add module ... = begin...end with begin as external token
Add exception named fields (of field1: type * field2: type)
Add multiline type provider support via _multiline_generic_type
Add signature parser named parameters (curried_spec)
Fix scanner serialize/deserialize bugs (clamped count, bounds check, off-by-one)
Update highlights, injections, indents queries

Scanner changes:

Add FORMAT_TRIPLE_QUOTE_CONTENT external token that stops at unescaped { for interpolation support

Grammar changes:

fsharp/grammar.js: new rules (trait_call_expression, extern_binding, extern_param, and_bang, struct_type, _preproc_expression, xml_doc), extended existing rules (import_decl, prefixed_expression, module_defn, named_module, infix_expression, format_triple_quoted_string)
fsharp_signature/grammar.js: added conflict for operator precedence
common/scanner.h: FORMAT_TRIPLE_QUOTE_CONTENT with {{ escape handling

Thorium · 2026-03-26T19:05:43Z

After this #76 and #156 is fixed.

Thorium · 2026-03-26T19:22:19Z

Ok after the next commit, all open issues should be also fixed.
I'm trying to get this repo out of "wip" status.

After this there are a few known "issues" still, but I don't see them so important:

No native query support (but works as any CE)
No unit of measures (they are just generics?)
Signature files support is improved but not complete (who cares?)
CE code is still partially in comments...
Anything else?

Thorium · 2026-03-27T19:28:29Z

Test in test\highlight\type_definitions.fsx failed:
That's a very interesting test. It says < should be marked as an operator, but it is marked as a function. Which would be wrong if it's like function as operator f < 4 as of (<) f 4 vs f<int>. However, I argue the full call ResizeArray<int>() is actually a function call, the constructor of the ResizeArray and thus as part of that < in this context is a part of a generic function call, not a separate operator. I think it's justified to change the test? So, if you search for "give me all the operators", the resize-array-constructor-call-generic-type-argument shouldn't come as part of that.

Thorium · 2026-03-27T19:57:47Z

I took a random compiling Fantomas linted project:
12 files out of 216 *.fs files failed (24 errors)
So there is still some work to do.

- open type declarations - FSI directives (#time, #I, #help, #quit) - XML doc comments (/// as distinct xml_doc node) - Type test pattern in atomic patterns (:? Type) - Quotation splicing (%%) prefix operator - fixed expressions - Range expressions in computation expressions - Preprocessor boolean conditions (&&, ||, !, parens, true/false) - Extern/P/Invoke declarations - SRTP trait call expressions (^a : (static member ...) ) - Triple-quoted string interpolation ($""" {expr} """) - Operator precedence for && and || (split into 3 levels) - module rec (recursive modules) - and! in computation expressions - struct tuple type annotations (struct (int * int)) - Add optional type_argument_constraints to _function_or_value_defn_body for SRTP 'when' constraints on return type annotations - Add optional 'struct' to anon_record_type for struct anonymous record types - Add optional 'then' clause to additional_constr_defn for secondary constructor initialization expressions - Expand fsharp_signature parser: named_module, namespace (global/rec), module_defn, type_definition, exception_definition, import_decl, module_abbrev, compiler_directive_decl, preproc_if support - Fix indentation bug in type extension with ($) identifier test - Add quotation expression support (<@ @>, <@@ @@>) with external tokens - Add multi-dollar triple-quoted string interpolation ($$"""...""", $$$"""...""") - Add module ... = begin...end with begin as external token - Add exception named fields (of field1: type * field2: type) - Add multiline type provider support via _multiline_generic_type - Add signature parser named parameters (curried_spec) - Fix scanner serialize/deserialize bugs (clamped count, bounds check, off-by-one) - Update highlights, injections, indents queries Scanner changes: - Add FORMAT_TRIPLE_QUOTE_CONTENT external token that stops at unescaped { for interpolation support Grammar changes: - fsharp/grammar.js: new rules (trait_call_expression, extern_binding, extern_param, and_bang, struct_type, _preproc_expression, xml_doc), extended existing rules (import_decl, prefixed_expression, module_defn, named_module, infix_expression, format_triple_quoted_string) - fsharp_signature/grammar.js: added conflict for operator precedence - common/scanner.h: FORMAT_TRIPLE_QUOTE_CONTENT with {{ escape handling

…atterns (ionide#134, ionide#149) - Fix infinite loop during error recovery by returning false from scanner when ERROR_SENTINEL is set, preventing zero-length DEDENT loop - Fix multiline record patterns by adding indent/dedent alternative in record_pattern grammar rule so scanner-emitted INDENT tokens between fields on different lines are handled correctly - Add test case for multiline record patterns in match expressions

…essions The application_expression highlight query previously used a wildcard (_) @function.call that captured the entire first child node. For generic constructor calls like ResizeArray<string>(), this meant the typed_expression spanning 'ResizeArray<string>' was tagged as function.call, causing the '<' at column 19 to incorrectly receive the function.call highlight instead of a bracket highlight. Changes: - Replace the single broad application_expression query with four specific patterns that target only the identifier within long_identifier_or_op, dot_expression, and their typed_expression variants - Add typed_expression '>' @punctuation.bracket to highlight the closing angle bracket consistently with generic_type (the opening '<' uses the _tyapp_open external token which is anonymous and unmatchable in queries) - Update test expectations: remove assertions for '<' (unmatchable) and change '>' from operator to punctuation.bracket

Nsidorenco

Really nice you're picking this up! I fixed the workflow so the CI now tests the parser against the FSharp.Core testsuite again - that should give a pretty good indication of the state of the parser.

Nsidorenco · 2026-03-29T13:39:18Z

+    // During error recovery, all valid_symbols are true and tree-sitter
+    // restores scanner state before each attempt. Emitting zero-length
+    // tokens (DEDENT/PREPROC_END) here causes infinite loops: the parser
+    // can't use the token, recovers, restores state (undoing the pop),
+    // and the scanner emits the same token again forever.
+    // Return false to let tree-sitter's built-in error recovery skip
+    // the problematic character and move on.
+    return false;


If you do not return DEDENT/PREPROC_END tokens during error recovery you get a much worse parse tree during typing since it will I many cases be able to identify a partial parse tree

Effectively, if you use tree-sitter for syntax highlighting and write something like

match x with

It will fail to highlight anything since it lacks the DEDENT token to identify this is a partially correct match-statement

We need to change this before we can merge.

Just because we're in the error recovery case does not mean we can give up in the external scanner.

If we can identify that a INDENT or similar token is valid we should emit that. Likewise, if a DEDENT token is valid or we reached EOF we should emit that. The tree-sitter error recovery mechanism cannot emit external scanner tokens so we need to emit those if they can help the error recovery.

This should be fixed now.

Nsidorenco · 2026-03-29T14:04:05Z

We should generally be wary of the size of the parser. It went from ~30mb to ~50mb here. 30mb was already rather large. An increase in parser size generally comes from an increased ambiguity within the grammar and is probably one of those things where mimicking the language spec won't necessarily lead to a performant tree-sitter parser

Hmm, is the correct approach to try to keep it small, or to first get general F# parsing working and then make it more efficient? The parser is auto-generated, so I guess there are no easy wins with C function pointers (like higher-order functions in F#) or other tricks to make it small, but the grammar should be structured a certain way instead?

I can see a potential issue with iterative development, when we have an auto-generated parser.c in source control, and the end result changes several megabytes per commit, the git repo will grow exponentially.

I had a word with tree-sitter maintainers, and they basically said that a) parser.c doesn't belong to source control b) don't worry about parser.c size, that is more intentionally kept as uncompressed and large, worry more about the binary size.

I had a word with tree-sitter maintainers, and they basically said that a) parser.c doesn't belong to source control

That seems a bit contradictory to what the actual state of the tree sitter ecosystem looks like (tree-sitter/tree-sitter#5269). If we were to remove the parser.c we would AFAIK break support for downstream consumers like nvim-treesitter, which depends on the parser.c to to build the parser.

b) don't worry about parser.c size, that is more intentionally kept as uncompressed and large, worry more about the binary size.

Sure, but you still have to download that uncompressed file before you can use the parser. And a large parser.c will nevertheless also result in a larger binary.

I'm fine with us moderately increasing the parser size while working on a more complete grammar but from my experience the way to reduce parser size is to structure the grammar differently, so the more we increase the parser size, the more work we have to redo to reduce the size again. This guide gives a pretty good indication of where the large size comes from

I tried many different tricks to smaller parser.c and still support all the features by this branch, and only got like a megabyte away, which doesn't really help if it's already 50MB+. If we start to accept compromises, like "treat all numeric types equal (int32=int64)" then we get the size smaller, but at the cost of quality. What would actually cut around 40% size is tree-sitter side parser file structure change, like this tree-sitter/tree-sitter#5488 but it's not a "quick"-win.

I see that nvim-treesitter made it a requirement that downstream users have the tree-sitter cli installed, so we could remove the parser.c from the repo, which I think is worthwhile.

There is definitely some structural change we could make to the grammar, which will bring us further from the language spec, but might make it a better tree-sitter parser. Not sure about the int23=int64 but might be something. The grammar already has things like not differentiating between expressions and expressions inside a computational expressions since that leads to a blowup in parser size for the very small gain of not being able to write let! in a normal expression block.

So if you find any construct where you want to merge them with the trade-off of a loss of accuracy wrt. the language spec I think it is worthwhile to experiment with.

I pushed to my fork if you want to check, but as I said, these are not massive wins, and the question is the possible drawbacks. It seems the large size comes from symbol_count × state_count, so I tried to reduce those.
I tried removing _module_expression (one commit after this branch):
https://github.com/Thorium/tree-sitter-fsharp/tree/remove-module-expression
And then I tried a few other things (4 commits to this branch):
https://github.com/Thorium/tree-sitter-fsharp/tree/misc-testings
But they were experimenting, they seemed to be working, but the wins were not enough to do PRs.

…ic directives

Thorium · 2026-03-30T18:38:13Z

Before this branch sample files with any error:
1750/5317

This branch has sample files with any error:
1286/5317

That still sounds like a lot, but remember, we are not measuring parsed lines:
We are measuring files with zero parsing errors and have an over-75 % success rate.

Thorium · 2026-04-01T10:02:59Z

I tried to continue, but it went just worse. I need a faster (parallel) way to evaluate results and better understand the parser.c growing before I can continue. I think this PR is now "ready".

- Scanner: add * to is_infix_op_start() so multiplication on continuation lines is recognized as an infix operator - Grammar: add 3-part from..step..to alternative to _slice_range_special for step range expressions (e.g. [0..2..10]) - Scanner: emit DEDENT/NEWLINE before returning false in the MULTI_DOLLAR_TRIPLE_QUOTE_START handler, fixing interpolated strings on dedented lines being absorbed into previous let bindings

Require 'with' keyword for standalone type_extension rule, preventing bodyless type definitions (e.g. [<Measure>] type Cent) from being parsed as type extensions that greedily consume following declarations. Record and union type definitions retain support for members both with and without the 'with' keyword via type_extension_elements.

Add _argument_type and _curried_return_type type subsets to correctly parse member signatures. Before this fix, `string -> string * string` in a member signature would incorrectly parse `*` as part of a tuple argument type rather than a tuple return type. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Port of PR ionide#171 - adds srtp_call_expression rule for parsing SRTP trait member invocations like (^T : (member Method : ...) arg). Uses restricted _srtp_type_argument matching only ^-prefixed type params to avoid conflicts with char literals.

…WLINE scanner token Port of PR ionide#172 - adds type_declaration rule for bodyless type definitions (e.g. [<Measure>] type Dollars). Uses a new TYPE_DECL_NEWLINE external scanner token that fires at newline/EOF when the next non-blank line is not more indented, disambiguating bare declarations from types with bodies. Note: measure_op_type was omitted as the existing measure/ measure_quotient rules already handle A/B division in type contexts.

…n_block_for_let

Scanner fixes (common/scanner.h): - Bug 1: 'begin' keyword check no longer corrupts state for identifier 'b' - Bug 2: '@' operator on continuation line no longer produces zero-width ERROR - Bug 3: Trailing semicolon in array comprehension no longer breaks parsing Grammar fixes (fsharp/grammar.js): - Bug 4: '..' range operator now works in multi-line [| |] and [ ] arrays/lists by adding optional(_newline) before _comp_or_range_expression and slice_ranges - Bug 5: [<assembly:...>] attributes followed by bare expressions like () now parse correctly via new _attribute_expression rule in _module_elem All 422 tests pass with no regressions. Parser.c size unchanged (~60MB).

- Regenerated fsharp_signature parser (inherits from fsharp/grammar.js which was modified but signature parser was not regenerated) - Removed unnecessary conflict entry [preproc_if, preproc_if_in_expression] eliminating the tree-sitter generate warning

Thorium · 2026-04-08T17:22:09Z

We seem to be on 1112/5317 now, known issues:

Many of these are intentionally broken test files from the F# compiler's own test suite (diagnostic tests for parse errors, type errors, etc.)
#if/#else/#endif inside match expressions and let bindings
while x && y do and other do keyword ambiguities. Keywords like do, with and in are very overloaded in F#. For example in within a query { ... }, is different from end-of-line on non-light-syntax)
Complex SRTP member constraints (not typical F# usage)

Thorium · 2026-04-14T13:44:39Z

@Nsidorenco can we get this merged? This would improve a lot of existing issues already.

Nsidorenco · 2026-04-15T19:35:41Z

Yes @Thorium. Looks great, thank you for working on this!

Thorium · 2026-04-16T17:30:26Z

Thanks. Is it possible to get 0.1 bumped release so I could test this easier with other tools?

Nsidorenco · 2026-04-16T17:50:28Z

Sure, a new version has been released

Nsidorenco force-pushed the missing-features-added branch from d1ac90c to 5d5adc0 Compare March 29, 2026 13:33

Thorium added 4 commits March 29, 2026 15:51

Some real-world testing based improvements

b3866ac

Nsidorenco force-pushed the missing-features-added branch from 5d5adc0 to 6af9a0f Compare March 29, 2026 13:51

Nsidorenco reviewed Mar 29, 2026

View reviewed changes

Thorium added 2 commits March 30, 2026 19:01

Reduced signature parser size, fixed EOF handling, and improved numer…

626b889

…ic directives

match scoping is a known issue for now, this new test is failing.

4760499

Thorium force-pushed the missing-features-added branch from 020ac3a to 4760499 Compare March 31, 2026 17:43

Address PR feedback from @Nsidorenco

f59929c

alexdarch mentioned this pull request Apr 7, 2026

add support for SRTP call expressions #171

Closed

Thorium and others added 9 commits April 7, 2026 11:06

Add let...in expression support via external _in token and _expressio…

25fa03d

…n_block_for_let

Add hex escape support (\xNN) in character and string literals

26fe84c

Nsidorenco merged commit b576ecf into ionide:main Apr 15, 2026
7 checks passed

Conversation

Thorium commented Mar 26, 2026

Uh oh!

Thorium commented Mar 26, 2026

Uh oh!

Thorium commented Mar 26, 2026

Uh oh!

Thorium commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Thorium commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Nsidorenco left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Nsidorenco Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Thorium commented Mar 30, 2026

Uh oh!

Thorium commented Apr 1, 2026

Uh oh!

Thorium commented Apr 8, 2026

Uh oh!

Thorium commented Apr 14, 2026

Uh oh!

Nsidorenco commented Apr 15, 2026

Uh oh!

Uh oh!

Thorium commented Apr 16, 2026

Uh oh!

Nsidorenco commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Thorium commented Mar 27, 2026 •

edited

Loading

Thorium commented Mar 27, 2026 •

edited

Loading

Nsidorenco Apr 4, 2026 •

edited

Loading