Skip to content

Decouple reference workflow from pymatgen internals#67

Merged
bjmorgan merged 10 commits intomainfrom
57-reference-workflow-decoupling
Mar 22, 2026
Merged

Decouple reference workflow from pymatgen internals#67
bjmorgan merged 10 commits intomainfrom
57-reference-workflow-decoupling

Conversation

@bjmorgan
Copy link
Copy Markdown
Owner

Summary

Removes pymatgen Structure/Lattice usage from all reference workflow internals. Structure objects are now only accepted at public API boundaries and arrays are extracted eagerly.

  • calculate_species_distances() accepts arrays + species lists instead of Structure objects. Add indices_for_species() helper replacing structure.indices_from_symbol()
  • site_index_mapping() accepts arrays instead of Structure objects
  • get_coordination_indices() uses all_mic_distances instead of structure.get_neighbors(). Computes the full centre-to-coordination distance matrix and filters by cutoff
  • IndexMapper.map_coordinating_atoms() accepts arrays instead of Structure objects
  • StructureAligner optimisation loop no longer creates temp-Structure objects per iteration. _validate_structures and _create_objective_function work with pure numpy arrays
  • CoordinationEnvironmentFinder extracts arrays in __init__ and passes them to get_coordination_indices
  • ReferenceBasedSites eagerly extracts arrays from both structures in constructor. Internal methods use stored arrays via _effective_ref_coords property

Part of the pymatgen decoupling effort (#59).

Closes #57

calculate_species_distances() now accepts arrays (frac_coords,
lattice_matrix, species lists) instead of Structure objects. Add
indices_for_species() helper replacing structure.indices_from_symbol().
StructureAligner updated to extract arrays at the boundary.

Part of #57
site_index_mapping() now accepts arrays instead of Structure objects.
Uses all_mic_distances instead of lattice.get_all_distances.

Part of #57
…ation_indices

get_coordination_indices() now accepts arrays (frac_coords, lattice_matrix,
species) instead of a Structure. Computes the full centre-to-coordination
distance matrix and filters by cutoff, replacing the per-site
get_neighbors() call. CoordinationEnvironmentFinder extracts arrays in
__init__.

Tests updated to pass arrays directly and to account for minimum-image
semantics (each atom pair yields one distance, no periodic-image
duplication).

Part of #57
map_coordinating_atoms() accepts arrays instead of Structure objects.
Uses all_mic_distances instead of lattice.get_all_distances.
ReferenceBasedSites._map_environments updated to pass arrays.

Part of #57
_create_objective_function now works with pure numpy arrays.
_validate_structures accepts species lists instead of Structure
composition. The optimisation loop no longer calls structure.copy()
or performs per-site assignment on every iteration.

Part of #57
ReferenceBasedSites now stores frac_coords, lattice_matrix, and
species lists from both structures at construction time. Internal
methods use these stored arrays instead of accessing Structure
objects directly.

Part of #57
Remove dead _translate_coords method and duplicate scipy.optimize
import from StructureAligner. Remove unnecessary .copy() on arrays
already returned as copies by pymatgen. Extract _effective_ref_coords
property to avoid repeating the aligned/reference conditional.
Fix critical issues:
- Fix wrong deprecation target: get_coordinating_indices ->
  get_coordination_indices
- Fix incorrect aligned_structure docstring: reference is translated
  to match target, None when align=False

Fix important issues:
- Add species/frac_coords length validation at public boundaries
  for get_coordination_indices, calculate_species_distances,
  site_index_mapping, and IndexMapper.map_coordinating_atoms
- Narrow except Exception to except ValueError in
  ReferenceBasedSites to avoid masking programming errors
- Add trailing newlines to tools.py and reference_based_sites.py
- Accept bare string for species filters in site_index_mapping
- Fix American spellings in coord_finder.py module docstring
- Update stale tools.py module docstring to reflect array-based API

Address suggestions:
- Add full docstring to StructureAligner.align
- Remove stale vertex_species terminology comment
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR advances the pymatgen-decoupling effort by moving the reference workflow’s internal computations (coordination finding, index mapping, and alignment objective evaluation) onto NumPy arrays + a lattice matrix, keeping pymatgen.Structure usage primarily at public API boundaries.

Changes:

  • Refactors site_analysis.tools helpers (get_coordination_indices, site_index_mapping, calculate_species_distances) to operate on (frac_coords, lattice_matrix, species) rather than Structure.
  • Removes temporary Structure construction from StructureAligner’s optimisation loop by using array-based objective evaluation.
  • Updates reference-workflow components (CoordinationEnvironmentFinder, IndexMapper, ReferenceBasedSites) and associated tests to pass/store arrays eagerly.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
site_analysis/tools.py Converts key helpers to array-based APIs and uses all_mic_distances for MIC distance matrices.
site_analysis/reference_workflow/structure_aligner.py Makes the optimisation objective and metric evaluation purely array-based; keeps Structure at boundaries.
site_analysis/reference_workflow/index_mapper.py Updates mapping logic to use array inputs + MIC distance matrices, with optional species filtering.
site_analysis/reference_workflow/coord_finder.py Extracts arrays/species in __init__ and delegates to array-based get_coordination_indices.
site_analysis/reference_workflow/reference_based_sites.py Eagerly caches arrays/species; internal mapping uses stored arrays rather than Structures.
tests/test_tools.py Updates tests to call array-based tool APIs; adds/adjusts coordination and distance tests accordingly.
tests/test_reference_workflow/test_structure_aligner.py Updates helper-distance checks and mocks for the array-based aligner internals.
tests/test_reference_workflow/test_index_mapper.py Rewrites tests for array-based IndexMapper.map_coordinating_atoms.
tests/test_reference_workflow/test_coord_finder.py Adjusts coordination tests to reflect MIC-based neighbour counting (often via supercells).
tests/test_reference_workflow/test_reference_based_sites.py Updates mocks/assertions to verify stored-array plumbing in ReferenceBasedSites.
tests/test_reference_workflow/test_reference_based_sites_integration.py Updates expectations for small-cell coordination behavior under MIC distances (no periodic-image duplication).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread site_analysis/reference_workflow/structure_aligner.py Outdated
Comment thread site_analysis/tools.py Outdated
Comment thread site_analysis/reference_workflow/index_mapper.py
Comment thread site_analysis/tools.py
Fix _validate_structures docstring (not always sorted). Update
error message in get_coordination_indices to not reference Structure.
Handle empty ref_indices in IndexMapper._find_closest_atom_mapping.
Add validation for empty species2_filter match in site_index_mapping.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test_tools.py
Comment on lines 105 to 108
species1 = ['Na', 'Cl']
sites1 = [PeriodicSite(species=s, coords=c, lattice=lattice) for s, c in zip(species1, coords1)]
sites2 = [PeriodicSite(species='Na', coords=c, lattice=lattice) for c in coords2]
structure1 = Structure.from_sites(sites1)
structure2 = Structure.from_sites(sites2)
mapping = site_index_mapping(structure1, structure2, species1='Na')
species2 = ['Na', 'Na']
mapping = site_index_mapping(coords1, coords2, lattice_matrix, species1, species2, species1_filter=['Na'])
np.testing.assert_array_equal(mapping, np.array([1]))
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is named ..._species1_filter_as_string, but it passes species1_filter=['Na'] (a list). That means the code path that normalizes a string filter to a list is never exercised here, and the test name becomes misleading. Pass species1_filter='Na' (string) in this test (and keep the list variant in the _as_list test).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ca164bb. Tests now pass bare strings ('Na') to exercise the string normalisation path.

Comment thread tests/test_tools.py Outdated
structure2 = Structure.from_sites(sites2)
mapping = site_index_mapping(structure1, structure2, species2='Na', one_to_one_mapping=False)
mapping = site_index_mapping(coords1, coords2, lattice_matrix, species1, species2,
species2_filter=['Na'], one_to_one_mapping=False)
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is named ..._species2_filter_as_string, but it passes species2_filter=['Na'] (a list). As a result, the string-handling branch in site_index_mapping() isn’t covered and the test name is inaccurate. Pass species2_filter='Na' in this test and keep the list form in the _as_list test.

Suggested change
species2_filter=['Na'], one_to_one_mapping=False)
species2_filter='Na', one_to_one_mapping=False)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ca164bb.

The _as_string test variants were passing lists, not exercising
the string normalisation code path.
@bjmorgan bjmorgan merged commit 8cb80d1 into main Mar 22, 2026
6 checks passed
@bjmorgan bjmorgan deleted the 57-reference-workflow-decoupling branch March 22, 2026 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reference workflow decoupling from pymatgen

2 participants