Decouple reference workflow from pymatgen internals#67
Conversation
calculate_species_distances() now accepts arrays (frac_coords, lattice_matrix, species lists) instead of Structure objects. Add indices_for_species() helper replacing structure.indices_from_symbol(). StructureAligner updated to extract arrays at the boundary. Part of #57
site_index_mapping() now accepts arrays instead of Structure objects. Uses all_mic_distances instead of lattice.get_all_distances. Part of #57
…ation_indices get_coordination_indices() now accepts arrays (frac_coords, lattice_matrix, species) instead of a Structure. Computes the full centre-to-coordination distance matrix and filters by cutoff, replacing the per-site get_neighbors() call. CoordinationEnvironmentFinder extracts arrays in __init__. Tests updated to pass arrays directly and to account for minimum-image semantics (each atom pair yields one distance, no periodic-image duplication). Part of #57
map_coordinating_atoms() accepts arrays instead of Structure objects. Uses all_mic_distances instead of lattice.get_all_distances. ReferenceBasedSites._map_environments updated to pass arrays. Part of #57
_create_objective_function now works with pure numpy arrays. _validate_structures accepts species lists instead of Structure composition. The optimisation loop no longer calls structure.copy() or performs per-site assignment on every iteration. Part of #57
ReferenceBasedSites now stores frac_coords, lattice_matrix, and species lists from both structures at construction time. Internal methods use these stored arrays instead of accessing Structure objects directly. Part of #57
Remove dead _translate_coords method and duplicate scipy.optimize import from StructureAligner. Remove unnecessary .copy() on arrays already returned as copies by pymatgen. Extract _effective_ref_coords property to avoid repeating the aligned/reference conditional.
Fix critical issues: - Fix wrong deprecation target: get_coordinating_indices -> get_coordination_indices - Fix incorrect aligned_structure docstring: reference is translated to match target, None when align=False Fix important issues: - Add species/frac_coords length validation at public boundaries for get_coordination_indices, calculate_species_distances, site_index_mapping, and IndexMapper.map_coordinating_atoms - Narrow except Exception to except ValueError in ReferenceBasedSites to avoid masking programming errors - Add trailing newlines to tools.py and reference_based_sites.py - Accept bare string for species filters in site_index_mapping - Fix American spellings in coord_finder.py module docstring - Update stale tools.py module docstring to reflect array-based API Address suggestions: - Add full docstring to StructureAligner.align - Remove stale vertex_species terminology comment
There was a problem hiding this comment.
Pull request overview
This PR advances the pymatgen-decoupling effort by moving the reference workflow’s internal computations (coordination finding, index mapping, and alignment objective evaluation) onto NumPy arrays + a lattice matrix, keeping pymatgen.Structure usage primarily at public API boundaries.
Changes:
- Refactors
site_analysis.toolshelpers (get_coordination_indices,site_index_mapping,calculate_species_distances) to operate on(frac_coords, lattice_matrix, species)rather thanStructure. - Removes temporary
Structureconstruction fromStructureAligner’s optimisation loop by using array-based objective evaluation. - Updates reference-workflow components (
CoordinationEnvironmentFinder,IndexMapper,ReferenceBasedSites) and associated tests to pass/store arrays eagerly.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
site_analysis/tools.py |
Converts key helpers to array-based APIs and uses all_mic_distances for MIC distance matrices. |
site_analysis/reference_workflow/structure_aligner.py |
Makes the optimisation objective and metric evaluation purely array-based; keeps Structure at boundaries. |
site_analysis/reference_workflow/index_mapper.py |
Updates mapping logic to use array inputs + MIC distance matrices, with optional species filtering. |
site_analysis/reference_workflow/coord_finder.py |
Extracts arrays/species in __init__ and delegates to array-based get_coordination_indices. |
site_analysis/reference_workflow/reference_based_sites.py |
Eagerly caches arrays/species; internal mapping uses stored arrays rather than Structures. |
tests/test_tools.py |
Updates tests to call array-based tool APIs; adds/adjusts coordination and distance tests accordingly. |
tests/test_reference_workflow/test_structure_aligner.py |
Updates helper-distance checks and mocks for the array-based aligner internals. |
tests/test_reference_workflow/test_index_mapper.py |
Rewrites tests for array-based IndexMapper.map_coordinating_atoms. |
tests/test_reference_workflow/test_coord_finder.py |
Adjusts coordination tests to reflect MIC-based neighbour counting (often via supercells). |
tests/test_reference_workflow/test_reference_based_sites.py |
Updates mocks/assertions to verify stored-array plumbing in ReferenceBasedSites. |
tests/test_reference_workflow/test_reference_based_sites_integration.py |
Updates expectations for small-cell coordination behavior under MIC distances (no periodic-image duplication). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Fix _validate_structures docstring (not always sorted). Update error message in get_coordination_indices to not reference Structure. Handle empty ref_indices in IndexMapper._find_closest_atom_mapping. Add validation for empty species2_filter match in site_index_mapping.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| species1 = ['Na', 'Cl'] | ||
| sites1 = [PeriodicSite(species=s, coords=c, lattice=lattice) for s, c in zip(species1, coords1)] | ||
| sites2 = [PeriodicSite(species='Na', coords=c, lattice=lattice) for c in coords2] | ||
| structure1 = Structure.from_sites(sites1) | ||
| structure2 = Structure.from_sites(sites2) | ||
| mapping = site_index_mapping(structure1, structure2, species1='Na') | ||
| species2 = ['Na', 'Na'] | ||
| mapping = site_index_mapping(coords1, coords2, lattice_matrix, species1, species2, species1_filter=['Na']) | ||
| np.testing.assert_array_equal(mapping, np.array([1])) |
There was a problem hiding this comment.
This test is named ..._species1_filter_as_string, but it passes species1_filter=['Na'] (a list). That means the code path that normalizes a string filter to a list is never exercised here, and the test name becomes misleading. Pass species1_filter='Na' (string) in this test (and keep the list variant in the _as_list test).
There was a problem hiding this comment.
Fixed in ca164bb. Tests now pass bare strings ('Na') to exercise the string normalisation path.
| structure2 = Structure.from_sites(sites2) | ||
| mapping = site_index_mapping(structure1, structure2, species2='Na', one_to_one_mapping=False) | ||
| mapping = site_index_mapping(coords1, coords2, lattice_matrix, species1, species2, | ||
| species2_filter=['Na'], one_to_one_mapping=False) |
There was a problem hiding this comment.
This test is named ..._species2_filter_as_string, but it passes species2_filter=['Na'] (a list). As a result, the string-handling branch in site_index_mapping() isn’t covered and the test name is inaccurate. Pass species2_filter='Na' in this test and keep the list form in the _as_list test.
| species2_filter=['Na'], one_to_one_mapping=False) | |
| species2_filter='Na', one_to_one_mapping=False) |
The _as_string test variants were passing lists, not exercising the string normalisation code path.
Summary
Removes pymatgen Structure/Lattice usage from all reference workflow internals. Structure objects are now only accepted at public API boundaries and arrays are extracted eagerly.
calculate_species_distances()accepts arrays + species lists instead of Structure objects. Addindices_for_species()helper replacingstructure.indices_from_symbol()site_index_mapping()accepts arrays instead of Structure objectsget_coordination_indices()usesall_mic_distancesinstead ofstructure.get_neighbors(). Computes the full centre-to-coordination distance matrix and filters by cutoffIndexMapper.map_coordinating_atoms()accepts arrays instead of Structure objectsStructureAligneroptimisation loop no longer creates temp-Structure objects per iteration._validate_structuresand_create_objective_functionwork with pure numpy arraysCoordinationEnvironmentFinderextracts arrays in__init__and passes them toget_coordination_indicesReferenceBasedSiteseagerly extracts arrays from both structures in constructor. Internal methods use stored arrays via_effective_ref_coordspropertyPart of the pymatgen decoupling effort (#59).
Closes #57