split a sub sub circuit#41
Draft
cattabiani wants to merge 23 commits into
Draft
Conversation
ff96fcc to
db26599
Compare
Groundwork for the split_subcircuit external merge fix: - Add 'brainbuilder sonata visualize' CLI command for debugging circuit connectivity (graphviz-based graph of populations and edges) - Add nested extraction equivalence tests (C_B_A vs C_A) that will drive the upcoming fix for external population merging - Add 'viz' optional dependency (graphviz) - Add .kiro to .gitignore split_population.py is unchanged — tests for the external merge bug are expected to fail until the fix is implemented.
6653547 to
5676685
Compare
In _write_subcircuit_external, when multiple edges share the same external source population but their wanted_src_ids have zero overlap with the existing mapping, the max() of an empty Series returned NaN. Fall back to existing_mapping max when is_existing is all False.
The new_nodes dict was overwritten on each edge iteration, so only the last edge's source IDs were written to the nodes.h5 file. Use the accumulated id_mapping index instead, which correctly merges IDs across all edges sharing the same external source population.
External populations (created by create_external) are typed as virtual in the circuit config but should not be re-extracted by do_virtual in nested subcircuit extractions. Filter them out by name prefix.
…subcircuit + _write_subcircuit Separates filtering/gathering logic from writing, allowing callers to interject between the two steps. The do_externals flag selects whether to process genuine virtuals or external_* populations.
…into _set_original_ids Single function with required parent_mapping parameter (None for root). Cleaner caller in _write_mapping using walrus operator.
…ternal_subcircuit both return (write_edge_configs, nodes_to_write) - _filter_virtual_typed_subcircuit now builds WriteEdgeConfigs internally - _write_subcircuit_external replaced by _gather_new_external_subcircuit - _write_subcircuit accepts (write_edge_configs, nodes_to_write) and is reused by both virtual and external paths
When extracting C from B (where B was extracted from A with create_external=True), external_A nodes from both sources (carried-over from B_A's external_A and new from B_A's biophysical A) are now merged into a single external_A population. Key changes: - _write_subcircuit accepts nodes_to_write as list of (source, ids) tuples per population, enabling multi-source node writes - _gather_new_external_subcircuit stores secondary contributions separately (id_mapping_secondary) to avoid index collisions between different parent populations - _write_mapping serializes secondary parent as parent2_id/parent2_name/ original2_id fields (backward-compatible: only present when needed) - _copy_filtered_edges supports appending to existing edge groups - _orchestrate_write_subcircuit_edges defers index writing to after all configs are processed (handles append case) - Equivalence test: C_B_A matches C_A in populations, node/edge counts, and original_ids
Since original_name is asserted to be the same for both parents, original_id can be a single combined list. Add length invariant assert for parent_id + parent2_id == new_id == original_id.
…elper Tests 3 levels of nesting: D_A ≈ D_B_A ≈ D_C_A ≈ D_C_B_A
WriteEdgeConfig now accepts lists for input_path/src_edge_name/src_mapping. _copy_filtered_edges opens input files internally and iterates over all sources in one pass. Removes: append-detection in _copy_filtered_edges, delete-before-recreate in _finalize_edges, deferred indexing in _orchestrate_write_subcircuit_edges.
…StrEnum - PopulationType enum with color property and from_population classmethod - Remove unused id_mapping parameter and redundant src/tgt node assignments
…fusion with population names (A, B, C)
…e_counts _copy_filtered_edges now opens/closes the output file internally and returns edge_count. Node counts derived from WriteEdgeConfig properties (source_node_count, target_node_count). Simplifies callers significantly.
2-level nesting works. 3-level nesting has offset bug: edge file writes correct sgid (offset applied to WriteEdgeConfig.src_mapping) but id_mapping.json doesn't include the offset new_id in the combined list. Debug shows merge loop fires correctly. Issue is in serialization path.
_gather gets a clean id_mapping and applies offset at creation time. No post-hoc DataFrame mutations or copies needed. Merge step just moves already-correct DataFrames into id_mapping_secondary. Removes debug logging added during investigation.
Current approach uses id_mapping_secondary for multi-parent external populations. A future refactor could encapsulate id_mapping as an object that handles multi-source populations internally (auto-offset on add, combined view on read).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
This is a reference to be split in multiple sub-issues. DO NOT MERGE
fix: https://github.com/openbraininstitute/prod-build-circuit/issues/14
Test
Since it is easy to lose track of the explanations in the issue I focused on the circuit in
tests/unit/data/sonata/split_subcircuitand on these extractions:With the convention:
B_Ameans subcircuit B extracted from A. I also created a nice viz tool to understanding at first glance what is going on. Here is the initial state of the art:Objective: the last 2 graphs should be the same (apart numbering)