Store only immediate descendants; compute transitive on demand#753
Draft
Store only immediate descendants; compute transitive on demand#753
Conversation
e7d7900 to
b451fa7
Compare
Each Namespace previously stored its full transitive descendant set, populated during ancestor linearization by walking every class's ancestor chain and inserting into each ancestor's descendants set. Heavy mixin ancestors like Object, Kernel, Module, and ActiveSupport::Concern accumulated tens of thousands of entries each. This change stores only immediate children (direct superclass, include, prepend, or — for singleton classes — extend). Transitive descendants are computed on demand via a BFS iterator over the direct-child DAG with visited-set dedup. External API behavior is preserved: the MCP get_descendants tool and the FFI rdx_declaration_descendants iterator return the same sets as before. Impact (measured on an internal Ruby monorepo, 1.47M declarations / 1.63M definitions): Max RSS: 5568.58 MB → 4884.39 MB (-684 MB, -12.3%) Total index: 53.05s → 42.26s (-20.3%) Resolution: 43.01s → 30.71s (-28.6%) Resolution time drops because propagate_descendants — which wrote to every ancestor on every recursion — is gone. The write cost is now O(1) per class instead of O(ancestor_chain_depth).
b451fa7 to
ae7a582
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces per-namespace transitive-descendant storage with immediate-only edges. Transitive descendants are now computed on demand via a BFS iterator over the direct-child DAG.
Previously, each
Namespacestored its full transitive descendant set, populated during ancestor linearization by walking every class's ancestor chain and inserting into each ancestor's set. Heavy mixin ancestors (Object,Kernel,Module,ActiveSupport::Concern, etc.) accumulated tens of thousands of descendant entries each.After this change, each namespace stores only its direct children (via superclass, include, prepend, or — for singleton classes — extend). A new
Graph::transitive_descendants(root)iterator does a BFS with visited-set dedup to recover the transitive set on demand.External API behavior is preserved: the MCP
get_descendantstool and the FFIrdx_declaration_descendantsiterator return the same sets as before (root included first, matching prior stored-set semantics).Performance
Three runs per side on an internal Ruby monorepo (1.47M declarations, 1.63M definitions). Release build, same machine, back-to-back runs, stable power.
Timing: robust improvement. Every branch run beat every main run by ≥10s end-to-end, with ~3s spread on each side and no overlap. The resolution-phase win is the dominant effect: eliminating
propagate_descendants— which wrote to every ancestor on every recursion — takes the edge-write cost fromO(ancestor_chain_depth)per class toO(1).Memory: inconclusive from benchmarks. RSS varies by ~700 MB between back-to-back runs on either side (OS / file-system cache effects), which swamps any delta a 3-sample comparison could resolve. The theoretical storage reduction is clear — previously
O(N × avg_ancestor_chain_depth)descendant entries, nowO(N × avg_direct_parents)— but quantifying it would need heap profiling or many more samples under controlled conditions.Declaration and definition counts are identical across sides (no semantic drift).
Design notes
linearize_parent_class,linearize_mixins, and the singleton branch oflinearize_parent_ancestors. Each call site invokes a smallResolver::record_immediate_descendant(parent_id, child_id)helper right before walking into the parent/mixin. The helper is idempotent and skips self-edges (handles cyclic self-inclusion likemodule A; include A; end).linearize_ancestorsis unchanged in shape — it does not take a child parameter. Recording at the call sites keeps the edge responsibility with the relationship being established, and naturally avoids recording edges during read-only traversals likesearch_ancestors.Ancestor::Partialwithout callinglinearize_ancestors, so no edge is recorded against unknown parents — matches prior behavior.DeclarationIds instead of cloning the fullVec<Ancestor>.