-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Version
0.0
On which installation method(s) does this occur?
No response
Describe the issue
Validation and mutation
-
Silent mutation in validators. AtomicData validators fill zeros for missing fields (energy, forces, velocities, charges, masses) and coerce all floating-point dtypes to match the positions tensor. There is no way for the user to provide a different dtype.
-
No shape validation at batch level. AtomicData validates shapes at construction time via jaxtyping annotations. Direct assignment to Batch bypasses all shape checks -- overwriting positions with a wrong-dimension tensor is accepted silently.
-
Silent uniform fallback. When a group is declared as variable-length (segmented) but no segment lengths are provided, the code silently treats it as fixed-length instead of raising. There is no way to specify a custom group beyond node/edge/system (e.g., a variable-length collection of bonds to freeze).
Schema bugs and limitations
-
LevelSchema
set()bug. Reassigning an attribute to a different group leaves a stale reference in the old group, producing contradictory schema state. -
No user-definable schema contract. Users cannot define custom dtypes or shapes for attributes in known groups. Users cannot define custom groups beyond node/edge/system through AtomicData.
Index handling
-
Hard-coded index offset. Batch construction adjusts indices for exactly one hard-coded attribute (
edge_index). The offset logic is spread across 4 locations (batch construction, select, append, zarr read). Adding a new index-bearing attribute requires modifying all four. -
Hard-coded edge_index transpose. The shape convention differs between AtomicData
(2, E), Batch internal storage(E, 2), and zarr(2, E). Each boundary has its own hard-coded transpose in 4 separate locations.
Architecture
-
Duplicated group membership knowledge. Which attributes belong to which group is defined in 3 separate places: AtomicData class-level key sets, LevelSchema
DEFAULT_ATTRIBUTE_MAP, and batch.py module-level frozensets. They must be kept in sync manually. -
Global mutable state for custom attributes.
add_node_property/add_edge_property/add_system_propertymutate class-level key sets, affecting all instances. Addressed in Fix class-level key set mutation in AtomicData #20.