Skip to content

feat(semantic): add set of AstType that exist in AST #12226

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: 07-11-feat_ast_compute_max_value_that_can_be_mapped_to_asttype_
Choose a base branch
from

Conversation

camchenry
Copy link
Member

@camchenry camchenry commented Jul 11, 2025

Adds a bitset that we compute as we push nodes. Since we have to call add_node for every AST node anyway, adding a small bit of code here to insert a single bit into the set should be very cheap. This enables us to do optimizations later on in the linter to quickly check if a file has any of a specific node kind. I've added some debug assertions to check that the bounds checks are respected so that we can do an insert_unchecked call. Not sure if this is good enough or not though.

@github-actions github-actions bot added A-semantic Area - Semantic C-enhancement Category - New feature or request labels Jul 11, 2025
Copy link
Member Author

camchenry commented Jul 11, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copy link

codspeed-hq bot commented Jul 11, 2025

CodSpeed Instrumentation Performance Report

Merging #12226 will not alter performance

Comparing 07-11-feat_semantic_add_set_of_asttype_that_exist_in_ast (891c8d1) with 07-11-feat_ast_compute_max_value_that_can_be_mapped_to_asttype_ (ea30a17)

Summary

✅ 34 untouched benchmarks

@camchenry camchenry force-pushed the 07-11-feat_semantic_add_set_of_asttype_that_exist_in_ast branch from 42a07b3 to 6a10b4d Compare July 11, 2025 16:27
@camchenry camchenry marked this pull request as ready for review July 11, 2025 16:28
@camchenry camchenry requested a review from Dunqing as a code owner July 11, 2025 16:28
@camchenry camchenry force-pushed the 07-11-feat_ast_compute_max_value_that_can_be_mapped_to_asttype_ branch from 5d8fc44 to f14ce37 Compare July 11, 2025 16:29
@camchenry camchenry requested a review from overlookmotel as a code owner July 11, 2025 16:29
@camchenry camchenry force-pushed the 07-11-feat_semantic_add_set_of_asttype_that_exist_in_ast branch from 6a10b4d to 74e03f9 Compare July 11, 2025 16:29
@camchenry camchenry force-pushed the 07-11-feat_ast_compute_max_value_that_can_be_mapped_to_asttype_ branch from f14ce37 to ea30a17 Compare July 11, 2025 17:20
@camchenry camchenry force-pushed the 07-11-feat_semantic_add_set_of_asttype_that_exist_in_ast branch from 74e03f9 to 891c8d1 Compare July 11, 2025 17:20
@@ -210,6 +226,12 @@ impl<'a> AstNodes<'a> {
let node_id = self.parent_ids.push(parent_node_id);
let node = AstNode::new(kind, scope_id, cfg_id, flags, node_id);
self.nodes.push(node);
debug_assert!((kind.ty() as usize) < self.node_kinds_set.len());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bounds check should be updated to use <= instead of < since the FixedBitSet was created with capacity AST_TYPE_MAX + 1, making AST_TYPE_MAX a valid index. Consider changing to:

debug_assert!((kind.ty() as usize) <= AST_TYPE_MAX as usize);

This ensures the assertion correctly validates that the index is within the allocated capacity.

Suggested change
debug_assert!((kind.ty() as usize) < self.node_kinds_set.len());
debug_assert!((kind.ty() as usize) <= AST_TYPE_MAX as usize);

Spotted by Diamond

Is this helpful? React 👍 or 👎 to let us know.

@overlookmotel
Copy link
Contributor

overlookmotel commented Jul 11, 2025

I have a branch I never got around to polishing and pushing containing a fixed-size bitset (genuinely fixed size - size is const, and determined at compile time).

It's a generalization of this thing:

https://github.com/oxc-project/oxc/blob/242e7210aa37e6eef14457b28aebbc31035d3c28/tasks/ast_tools/src/schema/derives.rs

That might be useful here as it stores the data on stack instead of heap, so should be a bit cheaper, and it encapsulates the unsafe code.

Here is a cut-down version, specialized for AstType:

const USIZE_BITS: usize = usize::BITS as usize;

/// Number of bytes required for bit set which can represent all [`AstType`]s.
const NUM_USIZES: usize = (AST_TYPE_MAX + 1).div_ceil(USIZE_BITS);

/// Bit set with a bit for each [`AstType`].
pub struct AstTypesBitset([usize; NUM_USIZES]);

impl AstTypesBitset {
    /// Create empty [`AstTypesBitset`] with no bits set.
    pub const fn new() -> Self {
        Self([0; NUM_USIZES])
    }

    /// Returns `true` if bit is set for provided [`AstType`].
    pub const fn has(self, ty: AstType) -> bool {
        let (index, mask) = Self::index_and_mask(ty);
        (self.0[index] & mask) != 0
    }

    /// Set bit for provided [`AstType`].
    pub const fn set(&mut self, ty: AstType) {
        let (index, mask) = Self::index_and_mask(ty);
        self.0[index] |= mask;
    }

    /// Returns `true` if any bit is set in both `self` and `other`.
    pub fn intersects(&self, other: &Self) -> bool {
        let mut intersection = 0;
        for (&a, &b) in self.0.iter().zip(other.0.iter()) {
            intersection |= a & b;
        }
        intersection != 0
    }

    /// Get index and mask for an [`AstType`].
    /// Returned `index` is guaranteed not to be out of bounds of the array.
    const fn index_and_mask(ty: AstType) -> (usize, usize) {
        let n = ty as usize;
        let index = n / USIZE_BITS;
        let mask = 1usize << (n % USIZE_BITS);
        (index, mask)
    }
}

Note: The unsafe code may not even be necessary. Maybe compiler can prove from its knowledge of AstType's variants that the index operations cannot be out of bounds, so self.0[index] produces same code as self.0.get_unchecked(index). You could check that in Godbolt.

@overlookmotel
Copy link
Contributor

overlookmotel commented Jul 11, 2025

Yes, it looks like compiler is smart enough to remove the bounds checks. No unsafe needed. https://godbolt.org/z/EjPnEde3G

I've edited the code above to remove the unsafe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-semantic Area - Semantic C-enhancement Category - New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants