Skip to content

split a sub sub circuit#41

Draft
cattabiani wants to merge 23 commits into
mainfrom
katta/split_sub_sub_circuit
Draft

split a sub sub circuit#41
cattabiani wants to merge 23 commits into
mainfrom
katta/split_sub_sub_circuit

Conversation

@cattabiani
Copy link
Copy Markdown
Contributor

@cattabiani cattabiani commented May 7, 2026

Context

This is a reference to be split in multiple sub-issues. DO NOT MERGE

fix: https://github.com/openbraininstitute/prod-build-circuit/issues/14

Test

Since it is easy to lose track of the explanations in the issue I focused on the circuit in tests/unit/data/sonata/split_subcircuit and on these extractions:

    subset_B_A = {
        "subset_B_A": ["subset_B_A_popA", "subset_B_A_popB", "subset_B_A_popC"],
        "subset_B_A_popA": {"population": "A", "node_id": [0, 2, 4, 5]},
        "subset_B_A_popB": {"population": "B", "node_id": [0, 2, 4, 5]},
        "subset_B_A_popC": {"population": "C", "node_id": [0, 2, 4, 5]},
    }
    # subset_C_A picks A:{0,2}, B:{0,2}, C:{0,2} from A
    subset_C_A = {
        "subset_C_A": ["subset_C_A_popA", "subset_C_A_popB", "subset_C_A_popC"],
        "subset_C_A_popA": {"population": "A", "node_id": [2, 5]},
        "subset_C_A_popB": {"population": "B", "node_id": [0, 2, 4, 5]},
        "subset_C_A_popC": {"population": "C", "node_id": [0, 2, 4, 5]},
    }
    # subset_C_B_A picks A:{0,1}, B:{0,1}, C:{0,1} from B_A (renumbered IDs of original {0,2})
    subset_C_B_A = {
        "subset_C_B_A": ["subset_C_B_A_popA", "subset_C_B_A_popB", "subset_C_B_A_popC"],
        "subset_C_B_A_popA": {"population": "A", "node_id": [1, 3]},
        "subset_C_B_A_popB": {"population": "B", "node_id": [0, 1, 2, 3]},
        "subset_C_B_A_popC": {"population": "C", "node_id": [0, 1, 2, 3]},
    }

With the convention: B_A means subcircuit B extracted from A. I also created a nice viz tool to understanding at first glance what is going on. Here is the initial state of the art:

Screenshot 2026-05-08 at 12 38 28

Objective: the last 2 graphs should be the same (apart numbering)

@cattabiani cattabiani self-assigned this May 7, 2026
@cattabiani cattabiani changed the base branch from main to katta/fix-curate-dtype May 7, 2026 14:41
@cattabiani cattabiani force-pushed the katta/split_sub_sub_circuit branch from ff96fcc to db26599 Compare May 8, 2026 08:22
@cattabiani cattabiani changed the base branch from katta/fix-curate-dtype to main May 11, 2026 06:37
Groundwork for the split_subcircuit external merge fix:

- Add 'brainbuilder sonata visualize' CLI command for debugging circuit
  connectivity (graphviz-based graph of populations and edges)
- Add nested extraction equivalence tests (C_B_A vs C_A) that will
  drive the upcoming fix for external population merging
- Add 'viz' optional dependency (graphviz)
- Add .kiro to .gitignore

split_population.py is unchanged — tests for the external merge bug
are expected to fail until the fix is implemented.
@cattabiani cattabiani force-pushed the katta/split_sub_sub_circuit branch from 6653547 to 5676685 Compare May 11, 2026 06:41
cattabiani added 22 commits May 11, 2026 08:57
In _write_subcircuit_external, when multiple edges share the same
external source population but their wanted_src_ids have zero overlap
with the existing mapping, the max() of an empty Series returned NaN.

Fall back to existing_mapping max when is_existing is all False.
The new_nodes dict was overwritten on each edge iteration, so only the
last edge's source IDs were written to the nodes.h5 file. Use the
accumulated id_mapping index instead, which correctly merges IDs across
all edges sharing the same external source population.
External populations (created by create_external) are typed as virtual
in the circuit config but should not be re-extracted by do_virtual in
nested subcircuit extractions. Filter them out by name prefix.
…subcircuit + _write_subcircuit

Separates filtering/gathering logic from writing, allowing callers to
interject between the two steps. The do_externals flag selects whether
to process genuine virtuals or external_* populations.
…into _set_original_ids

Single function with required parent_mapping parameter (None for root).
Cleaner caller in _write_mapping using walrus operator.
…ternal_subcircuit both return (write_edge_configs, nodes_to_write)

- _filter_virtual_typed_subcircuit now builds WriteEdgeConfigs internally
- _write_subcircuit_external replaced by _gather_new_external_subcircuit
- _write_subcircuit accepts (write_edge_configs, nodes_to_write) and is
  reused by both virtual and external paths
When extracting C from B (where B was extracted from A with
create_external=True), external_A nodes from both sources (carried-over
from B_A's external_A and new from B_A's biophysical A) are now merged
into a single external_A population.

Key changes:
- _write_subcircuit accepts nodes_to_write as list of (source, ids) tuples
  per population, enabling multi-source node writes
- _gather_new_external_subcircuit stores secondary contributions separately
  (id_mapping_secondary) to avoid index collisions between different parent
  populations
- _write_mapping serializes secondary parent as parent2_id/parent2_name/
  original2_id fields (backward-compatible: only present when needed)
- _copy_filtered_edges supports appending to existing edge groups
- _orchestrate_write_subcircuit_edges defers index writing to after all
  configs are processed (handles append case)
- Equivalence test: C_B_A matches C_A in populations, node/edge counts,
  and original_ids
Since original_name is asserted to be the same for both parents,
original_id can be a single combined list. Add length invariant assert
for parent_id + parent2_id == new_id == original_id.
…elper

Tests 3 levels of nesting: D_A ≈ D_B_A ≈ D_C_A ≈ D_C_B_A
WriteEdgeConfig now accepts lists for input_path/src_edge_name/src_mapping.
_copy_filtered_edges opens input files internally and iterates over all
sources in one pass. Removes: append-detection in _copy_filtered_edges,
delete-before-recreate in _finalize_edges, deferred indexing in
_orchestrate_write_subcircuit_edges.
…StrEnum

- PopulationType enum with color property and from_population classmethod
- Remove unused id_mapping parameter and redundant src/tgt node assignments
…e_counts

_copy_filtered_edges now opens/closes the output file internally and
returns edge_count. Node counts derived from WriteEdgeConfig properties
(source_node_count, target_node_count). Simplifies callers significantly.
2-level nesting works. 3-level nesting has offset bug: edge file writes
correct sgid (offset applied to WriteEdgeConfig.src_mapping) but
id_mapping.json doesn't include the offset new_id in the combined list.
Debug shows merge loop fires correctly. Issue is in serialization path.
_gather gets a clean id_mapping and applies offset at creation time.
No post-hoc DataFrame mutations or copies needed. Merge step just
moves already-correct DataFrames into id_mapping_secondary.

Removes debug logging added during investigation.
Current approach uses id_mapping_secondary for multi-parent external
populations. A future refactor could encapsulate id_mapping as an object
that handles multi-source populations internally (auto-offset on add,
combined view on read).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant