Skip to content

Add support for some F# syntax features#166

Merged
Nsidorenco merged 16 commits intoionide:mainfrom
Thorium:missing-features-added
Apr 15, 2026
Merged

Add support for some F# syntax features#166
Nsidorenco merged 16 commits intoionide:mainfrom
Thorium:missing-features-added

Conversation

@Thorium
Copy link
Copy Markdown
Contributor

@Thorium Thorium commented Mar 26, 2026

Add support for the following F# syntax features (all tests passing):

  • open type declarations
  • FSI directives (#time, #I, #help, #quit)
  • XML doc comments (/// as distinct xml_doc node)
  • Type test pattern in atomic patterns (:? Type)
  • Quotation splicing (%%) prefix operator
  • fixed expressions
  • Range expressions in computation expressions
  • Preprocessor boolean conditions (&&, ||, !, parens, true/false)
  • Extern/P/Invoke declarations
  • SRTP trait call expressions (^a : (static member ...) )
  • Triple-quoted string interpolation ($""" {expr} """)
  • Operator precedence for && and || (split into 3 levels)
  • module rec (recursive modules)
  • and! in computation expressions
  • struct tuple type annotations (struct (int * int))
  • Add optional type_argument_constraints to _function_or_value_defn_body for SRTP 'when' constraints on return type annotations
  • Add optional 'struct' to anon_record_type for struct anonymous record types
  • Add optional 'then' clause to additional_constr_defn for secondary constructor initialization expressions
  • Expand fsharp_signature parser: named_module, namespace (global/rec), module_defn, type_definition, exception_definition, import_decl, module_abbrev, compiler_directive_decl, preproc_if support
  • Fix indentation bug in type extension with ($) identifier test
  • Add quotation expression support (<@ @>, <@@ @@>) with external tokens
  • Add multi-dollar triple-quoted string interpolation ($$"""...""", $$$"""...""")
  • Add module ... = begin...end with begin as external token
  • Add exception named fields (of field1: type * field2: type)
  • Add multiline type provider support via _multiline_generic_type
  • Add signature parser named parameters (curried_spec)
  • Fix scanner serialize/deserialize bugs (clamped count, bounds check, off-by-one)
  • Update highlights, injections, indents queries

Scanner changes:

  • Add FORMAT_TRIPLE_QUOTE_CONTENT external token that stops at unescaped { for interpolation support

Grammar changes:

  • fsharp/grammar.js: new rules (trait_call_expression, extern_binding, extern_param, and_bang, struct_type, _preproc_expression, xml_doc), extended existing rules (import_decl, prefixed_expression, module_defn, named_module, infix_expression, format_triple_quoted_string)
  • fsharp_signature/grammar.js: added conflict for operator precedence
  • common/scanner.h: FORMAT_TRIPLE_QUOTE_CONTENT with {{ escape handling

@Thorium
Copy link
Copy Markdown
Contributor Author

Thorium commented Mar 26, 2026

After this #76 and #156 is fixed.

@Thorium
Copy link
Copy Markdown
Contributor Author

Thorium commented Mar 26, 2026

Ok after the next commit, all open issues should be also fixed.
I'm trying to get this repo out of "wip" status.

After this there are a few known "issues" still, but I don't see them so important:

  • No native query support (but works as any CE)
  • No unit of measures (they are just generics?)
  • Signature files support is improved but not complete (who cares?)
  • CE code is still partially in comments...
  • Anything else?

@Thorium
Copy link
Copy Markdown
Contributor Author

Thorium commented Mar 27, 2026

Test in test\highlight\type_definitions.fsx failed:
That's a very interesting test. It says < should be marked as an operator, but it is marked as a function. Which would be wrong if it's like function as operator f < 4 as of (<) f 4 vs f<int>. However, I argue the full call ResizeArray<int>() is actually a function call, the constructor of the ResizeArray and thus as part of that < in this context is a part of a generic function call, not a separate operator. I think it's justified to change the test? So, if you search for "give me all the operators", the resize-array-constructor-call-generic-type-argument shouldn't come as part of that.

@Thorium
Copy link
Copy Markdown
Contributor Author

Thorium commented Mar 27, 2026

I took a random compiling Fantomas linted project:
12 files out of 216 *.fs files failed (24 errors)
So there is still some work to do.

@Nsidorenco Nsidorenco force-pushed the missing-features-added branch from d1ac90c to 5d5adc0 Compare March 29, 2026 13:33
Thorium added 4 commits March 29, 2026 15:51
- open type declarations
- FSI directives (#time, #I, #help, #quit)
- XML doc comments (/// as distinct xml_doc node)
- Type test pattern in atomic patterns (:? Type)
- Quotation splicing (%%) prefix operator
- fixed expressions
- Range expressions in computation expressions
- Preprocessor boolean conditions (&&, ||, !, parens, true/false)
- Extern/P/Invoke declarations
- SRTP trait call expressions (^a : (static member ...) )
- Triple-quoted string interpolation ($""" {expr} """)
- Operator precedence for && and || (split into 3 levels)
- module rec (recursive modules)
- and! in computation expressions
- struct tuple type annotations (struct (int * int))
- Add optional type_argument_constraints to _function_or_value_defn_body for
  SRTP 'when' constraints on return type annotations
- Add optional 'struct' to anon_record_type for struct anonymous record types
- Add optional 'then' clause to additional_constr_defn for secondary
  constructor initialization expressions
- Expand fsharp_signature parser: named_module, namespace (global/rec),
  module_defn, type_definition, exception_definition, import_decl,
  module_abbrev, compiler_directive_decl, preproc_if support
- Fix indentation bug in type extension with ($) identifier test
- Add quotation expression support (<@ @>, <@@ @@>) with external tokens
- Add multi-dollar triple-quoted string interpolation ($$"""...""", $$$"""...""")
- Add module ... = begin...end with begin as external token
- Add exception named fields (of field1: type * field2: type)
- Add multiline type provider support via _multiline_generic_type
- Add signature parser named parameters (curried_spec)
- Fix scanner serialize/deserialize bugs (clamped count, bounds check, off-by-one)
- Update highlights, injections, indents queries

Scanner changes:
- Add FORMAT_TRIPLE_QUOTE_CONTENT external token that stops at
  unescaped { for interpolation support

Grammar changes:
- fsharp/grammar.js: new rules (trait_call_expression, extern_binding,
  extern_param, and_bang, struct_type, _preproc_expression, xml_doc),
  extended existing rules (import_decl, prefixed_expression, module_defn,
  named_module, infix_expression, format_triple_quoted_string)
- fsharp_signature/grammar.js: added conflict for operator precedence
- common/scanner.h: FORMAT_TRIPLE_QUOTE_CONTENT with {{ escape handling
…atterns (ionide#134, ionide#149)

- Fix infinite loop during error recovery by returning false from scanner
  when ERROR_SENTINEL is set, preventing zero-length DEDENT loop
- Fix multiline record patterns by adding indent/dedent alternative in
  record_pattern grammar rule so scanner-emitted INDENT tokens between
  fields on different lines are handled correctly
- Add test case for multiline record patterns in match expressions
…essions

The application_expression highlight query previously used a wildcard
(_) @function.call that captured the entire first child node. For generic
constructor calls like ResizeArray<string>(), this meant the typed_expression
spanning 'ResizeArray<string>' was tagged as function.call, causing the '<'
at column 19 to incorrectly receive the function.call highlight instead of
a bracket highlight.

Changes:
- Replace the single broad application_expression query with four specific
  patterns that target only the identifier within long_identifier_or_op,
  dot_expression, and their typed_expression variants
- Add typed_expression '>' @punctuation.bracket to highlight the closing
  angle bracket consistently with generic_type (the opening '<' uses the
  _tyapp_open external token which is anonymous and unmatchable in queries)
- Update test expectations: remove assertions for '<' (unmatchable) and
  change '>' from operator to punctuation.bracket
@Nsidorenco Nsidorenco force-pushed the missing-features-added branch from 5d5adc0 to 6af9a0f Compare March 29, 2026 13:51
Copy link
Copy Markdown
Member

@Nsidorenco Nsidorenco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice you're picking this up! I fixed the workflow so the CI now tests the parser against the FSharp.Core testsuite again - that should give a pretty good indication of the state of the parser.

Comment thread common/scanner.h Outdated
Comment on lines +267 to +274
// During error recovery, all valid_symbols are true and tree-sitter
// restores scanner state before each attempt. Emitting zero-length
// tokens (DEDENT/PREPROC_END) here causes infinite loops: the parser
// can't use the token, recovers, restores state (undoing the pop),
// and the scanner emits the same token again forever.
// Return false to let tree-sitter's built-in error recovery skip
// the problematic character and move on.
return false;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do not return DEDENT/PREPROC_END tokens during error recovery you get a much worse parse tree during typing since it will I many cases be able to identify a partial parse tree

Effectively, if you use tree-sitter for syntax highlighting and write something like

match x with

It will fail to highlight anything since it lacks the DEDENT token to identify this is a partially correct match-statement

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to change this before we can merge.

Just because we're in the error recovery case does not mean we can give up in the external scanner.

If we can identify that a INDENT or similar token is valid we should emit that. Likewise, if a DEDENT token is valid or we reached EOF we should emit that. The tree-sitter error recovery mechanism cannot emit external scanner tokens so we need to emit those if they can help the error recovery.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fixed now.

Comment thread fsharp/src/parser.c
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should generally be wary of the size of the parser. It went from ~30mb to ~50mb here. 30mb was already rather large. An increase in parser size generally comes from an increased ambiguity within the grammar and is probably one of those things where mimicking the language spec won't necessarily lead to a performant tree-sitter parser

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, is the correct approach to try to keep it small, or to first get general F# parsing working and then make it more efficient? The parser is auto-generated, so I guess there are no easy wins with C function pointers (like higher-order functions in F#) or other tricks to make it small, but the grammar should be structured a certain way instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see a potential issue with iterative development, when we have an auto-generated parser.c in source control, and the end result changes several megabytes per commit, the git repo will grow exponentially.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a word with tree-sitter maintainers, and they basically said that a) parser.c doesn't belong to source control b) don't worry about parser.c size, that is more intentionally kept as uncompressed and large, worry more about the binary size.

Copy link
Copy Markdown
Member

@Nsidorenco Nsidorenco Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a word with tree-sitter maintainers, and they basically said that a) parser.c doesn't belong to source control

That seems a bit contradictory to what the actual state of the tree sitter ecosystem looks like (tree-sitter/tree-sitter#5269). If we were to remove the parser.c we would AFAIK break support for downstream consumers like nvim-treesitter, which depends on the parser.c to to build the parser.

b) don't worry about parser.c size, that is more intentionally kept as uncompressed and large, worry more about the binary size.

Sure, but you still have to download that uncompressed file before you can use the parser. And a large parser.c will nevertheless also result in a larger binary.

I'm fine with us moderately increasing the parser size while working on a more complete grammar but from my experience the way to reduce parser size is to structure the grammar differently, so the more we increase the parser size, the more work we have to redo to reduce the size again. This guide gives a pretty good indication of where the large size comes from

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried many different tricks to smaller parser.c and still support all the features by this branch, and only got like a megabyte away, which doesn't really help if it's already 50MB+. If we start to accept compromises, like "treat all numeric types equal (int32=int64)" then we get the size smaller, but at the cost of quality. What would actually cut around 40% size is tree-sitter side parser file structure change, like this tree-sitter/tree-sitter#5488 but it's not a "quick"-win.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that nvim-treesitter made it a requirement that downstream users have the tree-sitter cli installed, so we could remove the parser.c from the repo, which I think is worthwhile.

There is definitely some structural change we could make to the grammar, which will bring us further from the language spec, but might make it a better tree-sitter parser. Not sure about the int23=int64 but might be something. The grammar already has things like not differentiating between expressions and expressions inside a computational expressions since that leads to a blowup in parser size for the very small gain of not being able to write let! in a normal expression block.

So if you find any construct where you want to merge them with the trade-off of a loss of accuracy wrt. the language spec I think it is worthwhile to experiment with.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed to my fork if you want to check, but as I said, these are not massive wins, and the question is the possible drawbacks. It seems the large size comes from symbol_count × state_count, so I tried to reduce those.
I tried removing _module_expression (one commit after this branch):
https://github.com/Thorium/tree-sitter-fsharp/tree/remove-module-expression
And then I tried a few other things (4 commits to this branch):
https://github.com/Thorium/tree-sitter-fsharp/tree/misc-testings
But they were experimenting, they seemed to be working, but the wins were not enough to do PRs.

@Thorium
Copy link
Copy Markdown
Contributor Author

Thorium commented Mar 30, 2026

Before this branch sample files with any error:
1750/5317

This branch has sample files with any error:
1286/5317

That still sounds like a lot, but remember, we are not measuring parsed lines:
We are measuring files with zero parsing errors and have an over-75 % success rate.

@Thorium Thorium force-pushed the missing-features-added branch from 020ac3a to 4760499 Compare March 31, 2026 17:43
@Thorium
Copy link
Copy Markdown
Contributor Author

Thorium commented Apr 1, 2026

I tried to continue, but it went just worse. I need a faster (parallel) way to evaluate results and better understand the parser.c growing before I can continue. I think this PR is now "ready".

Thorium and others added 9 commits April 7, 2026 11:06
- Scanner: add * to is_infix_op_start() so multiplication on
  continuation lines is recognized as an infix operator
- Grammar: add 3-part from..step..to alternative to _slice_range_special
  for step range expressions (e.g. [0..2..10])
- Scanner: emit DEDENT/NEWLINE before returning false in the
  MULTI_DOLLAR_TRIPLE_QUOTE_START handler, fixing interpolated strings
  on dedented lines being absorbed into previous let bindings
Require 'with' keyword for standalone type_extension rule, preventing
bodyless type definitions (e.g. [<Measure>] type Cent) from being
parsed as type extensions that greedily consume following declarations.

Record and union type definitions retain support for members both with
and without the 'with' keyword via type_extension_elements.
Add _argument_type and _curried_return_type type subsets to correctly
parse member signatures. Before this fix, `string -> string * string`
in a member signature would incorrectly parse `*` as part of a tuple
argument type rather than a tuple return type.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Port of PR ionide#171 - adds srtp_call_expression rule for parsing SRTP
trait member invocations like (^T : (member Method : ...) arg).
Uses restricted _srtp_type_argument matching only ^-prefixed type
params to avoid conflicts with char literals.
…WLINE scanner token

Port of PR ionide#172 - adds type_declaration rule for bodyless type
definitions (e.g. [<Measure>] type Dollars). Uses a new
TYPE_DECL_NEWLINE external scanner token that fires at newline/EOF
when the next non-blank line is not more indented, disambiguating
bare declarations from types with bodies.

Note: measure_op_type was omitted as the existing measure/
measure_quotient rules already handle A/B division in type contexts.
Scanner fixes (common/scanner.h):
- Bug 1: 'begin' keyword check no longer corrupts state for identifier 'b'
- Bug 2: '@' operator on continuation line no longer produces zero-width ERROR
- Bug 3: Trailing semicolon in array comprehension no longer breaks parsing

Grammar fixes (fsharp/grammar.js):
- Bug 4: '..' range operator now works in multi-line [| |] and [ ] arrays/lists
  by adding optional(_newline) before _comp_or_range_expression and slice_ranges
- Bug 5: [<assembly:...>] attributes followed by bare expressions like () now
  parse correctly via new _attribute_expression rule in _module_elem

All 422 tests pass with no regressions. Parser.c size unchanged (~60MB).
- Regenerated fsharp_signature parser (inherits from fsharp/grammar.js
  which was modified but signature parser was not regenerated)
- Removed unnecessary conflict entry [preproc_if, preproc_if_in_expression]
  eliminating the tree-sitter generate warning
@Thorium
Copy link
Copy Markdown
Contributor Author

Thorium commented Apr 8, 2026

We seem to be on 1112/5317 now, known issues:

  • Many of these are intentionally broken test files from the F# compiler's own test suite (diagnostic tests for parse errors, type errors, etc.)
  • #if/#else/#endif inside match expressions and let bindings
  • while x && y do and other do keyword ambiguities. Keywords like do, with and in are very overloaded in F#. For example in within a query { ... }, is different from end-of-line on non-light-syntax)
  • Complex SRTP member constraints (not typical F# usage)

@Thorium
Copy link
Copy Markdown
Contributor Author

Thorium commented Apr 14, 2026

@Nsidorenco can we get this merged? This would improve a lot of existing issues already.

@Nsidorenco
Copy link
Copy Markdown
Member

Yes @Thorium. Looks great, thank you for working on this!

@Nsidorenco Nsidorenco merged commit b576ecf into ionide:main Apr 15, 2026
7 checks passed
@Thorium
Copy link
Copy Markdown
Contributor Author

Thorium commented Apr 16, 2026

Thanks. Is it possible to get 0.1 bumped release so I could test this easier with other tools?

@Nsidorenco
Copy link
Copy Markdown
Member

Sure, a new version has been released

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants