Skip to content

🐛[BUG]: Data layer gaps and limitations #21

@zubatyuk

Description

@zubatyuk

Version

0.0

On which installation method(s) does this occur?

No response

Describe the issue

Validation and mutation

  1. Silent mutation in validators. AtomicData validators fill zeros for missing fields (energy, forces, velocities, charges, masses) and coerce all floating-point dtypes to match the positions tensor. There is no way for the user to provide a different dtype.

  2. No shape validation at batch level. AtomicData validates shapes at construction time via jaxtyping annotations. Direct assignment to Batch bypasses all shape checks -- overwriting positions with a wrong-dimension tensor is accepted silently.

  3. Silent uniform fallback. When a group is declared as variable-length (segmented) but no segment lengths are provided, the code silently treats it as fixed-length instead of raising. There is no way to specify a custom group beyond node/edge/system (e.g., a variable-length collection of bonds to freeze).

Schema bugs and limitations

  1. LevelSchema set() bug. Reassigning an attribute to a different group leaves a stale reference in the old group, producing contradictory schema state.

  2. No user-definable schema contract. Users cannot define custom dtypes or shapes for attributes in known groups. Users cannot define custom groups beyond node/edge/system through AtomicData.

Index handling

  1. Hard-coded index offset. Batch construction adjusts indices for exactly one hard-coded attribute (edge_index). The offset logic is spread across 4 locations (batch construction, select, append, zarr read). Adding a new index-bearing attribute requires modifying all four.

  2. Hard-coded edge_index transpose. The shape convention differs between AtomicData (2, E), Batch internal storage (E, 2), and zarr (2, E). Each boundary has its own hard-coded transpose in 4 separate locations.

Architecture

  1. Duplicated group membership knowledge. Which attributes belong to which group is defined in 3 separate places: AtomicData class-level key sets, LevelSchema DEFAULT_ATTRIBUTE_MAP, and batch.py module-level frozensets. They must be kept in sync manually.

  2. Global mutable state for custom attributes. add_node_property/add_edge_property/add_system_property mutate class-level key sets, affecting all instances. Addressed in Fix class-level key set mutation in AtomicData #20.

Minimum reproducible example

Relevant log output

Environment details

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions