Skip to content

feat: add optional boundary datastore support#635

Open
sadamov wants to merge 24 commits into
mllam:mainfrom
sadamov:feat/boundary-datastore
Open

feat: add optional boundary datastore support#635
sadamov wants to merge 24 commits into
mllam:mainfrom
sadamov:feat/boundary-datastore

Conversation

@sadamov

@sadamov sadamov commented May 11, 2026

Copy link
Copy Markdown
Collaborator

Describe your changes

Add optional boundary datastore support to WeatherDataset and WeatherDataModule, enabling LAM models to ingest boundary forcing from a separate domain (e.g. ERA5 for a COSMO/DANRA interior).

  • NeuralLAMConfig accepts an optional datastore_boundary field
  • load_config_and_datastore returns a 3-tuple (config, datastore, datastore_boundary)
  • WeatherDataset.__getitem__ returns a 5-tuple (init_states, target_states, forcing, boundary, target_times) where the boundary tensor is empty (last dim 0) when no boundary datastore is configured
  • New CLI args --num_past_boundary_steps / --num_future_boundary_steps control the boundary forcing window size
  • ForecasterModule.common_step unpacks the boundary tensor but does not yet wire it into the forward pass (model-side integration is planned as a separate PR)
  • Boundary datastore can be any registered datastore type (mdp, npyfilesmeps, etc.)
  • MDPDatastore and NpyFilesDatastoreMEPS both handle boundary-only configs (forcing + static, no state variables) gracefully
  • ERA5 boundary test configs added at tests/datastore_examples/mdp/era5_1000hPa_danra_100m_winds/ (WeatherBench2 64x32 equiangular grid as boundary for DANRA interior), with init_datastore_boundary_example() fixture in conftest.py

This is PR A in the boundary datastore plan outlined in #108. PR B (model-side boundary handling) and PR C (#636, boundary plotting) will follow.

Issue Link

refs #108

Type of change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📖 Documentation (Addition or improvements to documentation)

Note on breaking change: WeatherDataset.__getitem__ now returns a 5-tuple instead of 4-tuple, and load_config_and_datastore returns a 3-tuple instead of 2-tuple. All callers in the repo have been updated.

Checklist before requesting a review

  • My branch is up-to-date with the target branch - if not update your fork with the changes from the target branch (use pull with --rebase option if possible).
  • I have performed a self-review of my code
  • For any new/modified functions/classes I have added docstrings that clearly describe its purpose, expected inputs and returned values
  • I have placed in-line comments to clarify the intent of any hard-to-understand passages of my code
  • I have updated the README to cover introduced code changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have given the PR a name that clearly describes the change, written in imperative form (context).
  • I have requested a reviewer and an assignee (assignee is responsible for merging). This applies only if you have write access to the repo, otherwise feel free to tag a maintainer to add a reviewer and assignee.

Checklist for reviewers

Each PR comes with its own improvements and flaws. The reviewer should check the following:

  • the code is readable
  • the code is well tested
  • the code is documented (including return types and parameters)
  • the code is easy to maintain

Author checklist after completed review

  • I have added a line to the CHANGELOG describing this change, in a section
    reflecting type of change (add section where missing):
    • added: when you have added new functionality
    • changed: when default behaviour of the code has been changed
    • fixes: when your contribution fixes a bug
    • maintenance: when your contribution is relates to repo maintenance, e.g. CI/CD or documentation

Checklist for assignee

  • PR is up to date with the base branch
  • the tests pass
  • (if the PR is not just maintenance/bugfix) the PR is assigned to the next milestone. If it is not, propose it for a future milestone.
  • author has added an entry to the changelog (and designated the change as added, changed, fixed or maintenance)
  • Once the PR is ready to be merged, squash commits and merge the PR.

refs #138

sadamov and others added 2 commits May 11, 2026 16:00
Add support for loading boundary forcing from a separate datastore,
enabling LAM models to ingest boundary conditions from a different
domain (e.g. ERA5 boundaries for a COSMO/DANRA interior).

- NeuralLAMConfig accepts optional `datastore_boundary` field
- load_config_and_datastore returns 3-tuple (config, datastore, datastore_boundary)
- WeatherDataset loads, windows, and standardizes boundary forcing
- __getitem__ returns 5-tuple (init_states, target_states, forcing, boundary, target_times)
- New CLI args --num_past_boundary_steps / --num_future_boundary_steps
- ForecasterModule.common_step unpacks boundary (not yet wired to forward)
- 4 new boundary-specific tests, all 157 tests pass

refs mllam#108

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sadamov sadamov self-assigned this May 11, 2026
@sadamov sadamov added the enhancement New feature or request label May 11, 2026
@sadamov sadamov added this to the v0.8.0 milestone May 11, 2026
sadamov added a commit to sadamov/neural-lam that referenced this pull request May 11, 2026
Pass datastore_boundary through train_model.py into ForecasterModule.
During --eval, plot_examples loads raw boundary forcing and overlays it
underneath prediction/target panels via vis.plot_prediction. Add four
boundary plotting tests using BoundaryDummyDatastore from PR mllam#635.
Update README to document boundary plotting during evaluation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
sadamov and others added 9 commits May 11, 2026 17:17
Add MDP-based ERA5 boundary example at
tests/datastore_examples/mdp/era5_1000hPa_danra_100m_winds/ with
config.yaml, era5.datastore.yaml (WeatherBench2 64x32 equiangular),
and danra.datastore.yaml (DANRA 100m winds interior).

Add DATASTORES_BOUNDARY_EXAMPLES dict and
init_datastore_boundary_example() to conftest.py for use in boundary
integration tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
MDPDatastore.__init__ crashed with KeyError when loading a datastore
that has only forcing+static (no state), e.g. ERA5 boundary data.
Fix is_ensemble check to guard against missing state, and
grid_shape_state to fall back to forcing/static categories.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Make _get_analysis_times fall back to forcing file patterns when no
state files exist, guard get_dataarray("state") against empty var_names,
and prevent empty feature list from matching state loading path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Guard against missing state/static feature keys in the zarr, not just
forcing. Boundary-only datastores (e.g. ERA5) may lack state_feature
entirely.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Return len(grid_index) directly instead of computing from
grid_shape_state, which is more robust for boundary-only datastores.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Shrink the ERA5 boundary test dataset to 2022-03-30..2022-04-12 (was
  1990-2022), add per-input lat/lon coord_ranges and enable mllam's
  convex-hull domain_cropping with include_interior_points=true. Stats
  computation drops from minutes to seconds and the cached zarr stays
  under 1 MB.
- Stack [longitude, latitude] directly into grid_index in the era5
  dim_mapping (per mllam's example.era5_cropped.yaml) so the original
  coord names survive for the convex-hull crop -- removes the need for
  any rename-preserve workaround in neural-lam.
- Generalise the MDPDatastore units loop over self.spatial_coordinates
  with sensible defaults for x/y (m) and longitude/latitude/lon/lat
  (degrees_*) so ERA5-style geographic datastores work.
- Register a pytest `slow` marker and a `--run-slow` CLI flag so the
  ERA5 boundary integration test (added in this PR) is skipped by
  default and can be run on demand.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sadamov added a commit to sadamov/neural-lam that referenced this pull request May 14, 2026
Pass datastore_boundary through train_model.py into ForecasterModule.
During --eval, plot_examples loads raw boundary forcing and overlays it
underneath prediction/target panels via vis.plot_prediction. Add four
boundary plotting tests using BoundaryDummyDatastore from PR mllam#635.
Update README to document boundary plotting during evaluation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Drop state metadata after init and raise KeyError on `state` lookups so
plotting/model code that accidentally queries state on a boundary fails
loudly. Real ERA5-style boundary datastores expose only forcing fields,
and the existing boundary tests (test_datasets.py) only ever access
forcing on the boundary, so making the dummy state-less brings it
closer to real boundary semantics without changing test behaviour.
sadamov added a commit to sadamov/neural-lam that referenced this pull request May 15, 2026
Pass datastore_boundary through train_model.py into ForecasterModule.
During --eval, plot_examples loads raw boundary forcing and overlays it
underneath prediction/target panels via vis.plot_prediction. Add four
boundary plotting tests using BoundaryDummyDatastore from PR mllam#635.
Update README to document boundary plotting during evaluation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
sadamov added a commit to sadamov/neural-lam that referenced this pull request May 15, 2026
Asserts plot_prediction works against a boundary datastore with no state
category (forcing-only), exercising the get_xy("forcing") / get_lat_lon(
"forcing") path in vis.plot_on_axis. Pairs with the BoundaryDummyDatastore
state-less change in mllam#635.
sadamov and others added 2 commits May 18, 2026 09:22
Resolve conflicts from mllam#239 (normalize on GPU):
- weather_dataset.py: drop the CPU standardization path (state/forcing/boundary
  stats setup, _compute_std_safe, in-__getitem__ scaling, and the
  standardize= plumbing in WeatherDataModule); keep the boundary-datastore
  feature and the 5-tuple sample (init, target, forcing, boundary, times).
- models/module.py: on_after_batch_transfer now unpacks/returns the 5-tuple,
  standardizing state+forcing on-device and passing boundary through unchanged
  (boundary is not yet consumed by the forecaster on this branch).
- tests/test_datasets.py: drop the dataset-standardization tests mllam#239 removed
  (incl. boundary standardization, now a GPU concern); keep the structural
  boundary tests without the removed standardize= kwarg.
- tests/test_gpu_normalization.py: feed/expect the 5-tuple batch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@joeloskarsson joeloskarsson left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's so great that you started the work on this 😄 I had a look through everything except the tests now. Had one major comment about the interior-boundary data alignment in the dataset, otherwise mostly small things.

Comment thread neural_lam/datastore/mdp.py Outdated
Comment on lines +481 to +482
else:
raise ValueError("Dataset has no state, forcing, or static data")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really possible now, then we have a fully empty datastore? I feel like we should raise an error way before this then (in the constructor).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, fully empty was never meant to be valid. Added a check in __init__ that requires state or forcing, and simplified grid_shape_state to just pick between those two (dropped the static fallback and the defensive else). fixed in cf323a8.

Comment thread neural_lam/models/module.py Outdated
Comment on lines +246 to +247
normalized here so the work runs on the accelerator. The boundary
forcing is passed through unchanged.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good reason to not do the standardization of boundary data here? This feels confusing and inconsistent to me. Best to do all standardization for the batch in the same place.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed over lunch, I would argue that this is now part of the model side PR B, since we are using on_batch_transfer_end methods now and relieved the WeatherDataset of its standardization duties. But, we said to implement it here nonetheless so: ForecasterModule.__init__ now takes an optional datastore_boundary and registers boundary_mean/boundary_std buffers, and on_after_batch_transfer standardizes the boundary exactly like the interior forcing. With no boundary datastore the buffers stay None and the tensor passes through unchanged. Covered by the two new tests in tests/test_gpu_normalization.py. done in 9a9af31.

Comment on lines +275 to +281
(
init_states,
target_states,
forcing_features,
boundary_features,
batch_times,
) = batch

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(
init_states,
target_states,
forcing_features,
boundary_features,
batch_times,
) = batch
(
init_states,
target_states,
forcing_features,
boundary_features,
batch_times,
) = batch
# NOTE: For now we do not use the boundary features from here. This is yet
# to be implemented on the model side.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied with a small reword (0ec56f5). Boundary is now standardized in on_after_batch_transfer (comment #2) but still not consumed by the forward pass, so in 9a9af31 I extended the NOTE to point at the standardization and reference #108 for the model-side wiring.

# init_states: (2, N_grid, d_features)
# target_states: (ar_steps, N_grid, d_features)
# forcing: (ar_steps, N_grid, d_windowed_forcing)
# boundary: (ar_steps, N_boundary_grid, d_windowed_boundary)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if it is wise to return an empty tensor here, or if it would just be better for this to be None? I suppose that with None we would need a custom (but simple) collate function for the batching.
My reasoning is that if you for example forget to specify a boundary datastore then many things will still work and this will be a silent bug, potentially even the forward pass would work (in some scenario)? I can not see a case where we would not want to explicitly do things differently in a model depending on if the boundary forcing is present or not, and it being None would be a good signifier of this. Otherwise everything processing boundary would need to have the boundary datastore, to check if that is None.
I am not sure about this, and could probably be convinced either way. But would be happy to hear your thoughts.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with an empty tensor to avoid a custom collate and keep __getitem__ shape-stable for code that just unpacks the 5-tuple. The silent-bug risk is real though, and I think the model wiring (PR B) is the right place to catch it: the model knows whether it expects boundary, so we can assert there that boundary.shape[-1] > 0 matches datastore_boundary is not None. Keeps the loader simple but still fails loudly when they get out of sync.
what do you think?

Comment thread neural_lam/weather_dataset.py Outdated
if self.da_boundary_forcing is not None:
da_boundary_windowed = self._slice_forcing_time(
da_forcing=self.da_boundary_forcing,
idx=sample_idx,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the correct idx for the boundary. What about when the boundary has a different time step? What about when the boundary is a forecast? I feel like we used to have a lot of logic to find this alignment, that I can't see now, which makes me fear that we dropped something important.

But this might also be a matter of the scope of this PR, if you were intending this to restrict to reanalysis boundary with the same timestep? I would though prefer to make what we merge in as similar as possible to our contribution in the paper :)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You were right, I messed up the partial port from the research branch, trying to reduce the scope of the PR.
Ported the time-based alignment from the research branch, so all four combinations work now: analysis/forecast interior crossed with analysis/forecast boundary. See 140caf5.

A few intentional deviations from the research branch:

  • No interior_subsample_step / boundary_subsample_step (buggy, orthogonal)
  • No window_time_deltas, dynamic_time_deltas, time_slice concat in the boundary tensor (didn't help, can be added later)
  • Per-step the window still centers on the target time (the boundary condition for the interior at the predicted time), but only after the launch is fixed correctly on init. see a543ce9

Comment thread tests/test_datasets.py

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also port over a version of https://github.com/joeloskarsson/neural-lam-dev/blob/research/tests/test_time_slicing.py? I remember this being very useful to figure out all the alignment between interior and boundary data.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, that file is really useful. Extended the existing tests/test_time_slicing.py rather than adding a new one. SinglePointDummyDatastore now supports forecast mode, plus a new BoundaryOnlyDummyDatastore that mirrors a real boundary store (forcing-only, state access raises KeyError).
Added in cbc1c14.

sadamov and others added 5 commits May 28, 2026 05:03
…pe_state

Previously a datastore with no state, forcing or static would silently
reach `grid_shape_state` and fail with a confusing fallback error. Now
the constructor raises immediately if neither state nor forcing is
present, and the fallback in `grid_shape_state` collapses to a simple
`state if present else forcing` pick.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Boundary features are unpacked from the batch but the forecaster
forward pass does not consume them yet; the model-side wiring lands
in a follow-up PR (mllam#108).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…is/forecast modes

Replace integer-idx boundary slicing with time-based nearest-neighbor
(pad) lookup so the boundary datastore can have a different step length
than the interior, and either side may be in analysis or forecast mode.

- Add `get_time_step`, `check_time_overlap`, `crop_time_if_needed`
  helpers in `neural_lam.utils` (ported from the research branch in
  joeloskarsson/neural-lam-dev, with an extra guard against silent
  argmax-on-all-false cropping).
- Refactor `WeatherDataset`: precompute the within-sample state step
  and any forecast lead-time step in __init__; run
  `crop_time_if_needed` + `check_time_overlap` against the boundary so
  the first/last samples never fall outside boundary coverage; replace
  `_slice_forcing_time` with `_window_forcing_in_time` for time-aligned
  windowing of cross-datastore boundary; preserve the original integer-
  idx fast path as `_window_same_forecast_by_idx` for same-datastore
  forecast forcing (npyfilesmeps has non-unique analysis_time so the
  pandas pad-lookup cannot be used there).
- Window alignment matches the existing forcing convention (target
  time, i.e. `state_times[init_steps + i]`).
- Split `test_boundary_dataset_length_unchanged` into a no-crop and a
  cropping case to document the new behaviour.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Port a slimmed-down version of the alignment tests from
joeloskarsson/neural-lam-dev to exercise the new time-based boundary
windowing in WeatherDataset.

- Extend `SinglePointDummyDatastore` with forecast-mode support.
- Add a boundary-only variant whose state lookup raises KeyError, to
  catch any path that accidentally asks the boundary for state.
- `test_time_slicing_boundary_analysis`: parametrised over past/future
  window sizes, asserts exact boundary window values around each
  target state time.
- `test_boundary_step_length_mismatch_supported`: 1h interior with a
  6h boundary, verifies the pad-matched lookup.
- `test_forecast_interior_with_analysis_boundary` and
  `test_analysis_interior_with_forecast_boundary`: the two mixed
  analysis/forecast combinations.
- `test_check_time_overlap_insufficient_raises`: surface the cropping
  failure path with a clear error.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sadamov and others added 2 commits May 28, 2026 17:08
Boundary forcing was previously passed through unchanged in
`on_after_batch_transfer`, leaving the only normalization step on a
separate code path and inconsistent with how interior state/forcing are
handled. Wire it through the same on-device hook.

- `ForecasterModule.__init__` takes a new optional `datastore_boundary`
  arg (excluded from `save_hyperparameters` alongside `datastore` and
  `forecaster`, so it must be passed at `load_from_checkpoint` time).
- Register `boundary_mean` / `boundary_std` from
  `datastore_boundary.get_standardization_dataarray("forcing")` when a
  boundary datastore is provided; otherwise leave both as None.
- `on_after_batch_transfer` standardizes the boundary tensor the same
  way it standardizes forcing: feature-major `(feature, window)`
  per-feature stats are tiled once on the first batch and cached.
- Update the NOTE in `common_step` to reflect that boundary is now
  standardized but still not consumed by the forecaster (mllam#108).
- Pass `datastore_boundary` from `train_model.main` to the module.
- New tests: `test_boundary_standardized_when_datastore_provided` and
  `test_boundary_passthrough_when_no_boundary_datastore`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The forecast-boundary path selected its analysis_time by pad-matching
the first target time, which can pick a boundary forecast launched after
model init - unavailable operationally. Anchor on the model init time
(state_times[init_steps - 1], strictly before) instead, matching the
research branch, and assert the per-step boundary valid time never runs
ahead of the target.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@sadamov sadamov requested a review from joeloskarsson May 28, 2026 19:27
@observingClouds

Copy link
Copy Markdown
Contributor

Hi @sadamov,
Thanks for starting and advancing this feature integration. As this is something that we at DMI need to make our operational AI model, let me and @SimonKamuk know if we can help to bring this over the finish line. We might have some comments too 😛

@sadamov

sadamov commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator Author

@observingClouds yes please! Feel free to leave your reviews. If there are major pieces missing, you should have push access to my PR-branch here and after a quick ping you can also push directly.

Resolved conflicts:
- neural_lam/datastore/mdp.py: kept the state-or-forcing-required validation
  while taking main's local _ds variable pattern (refactored upstream).
- neural_lam/train_model.py: moved --num_past_boundary_steps and
  --num_future_boundary_steps to use data_group.add_argument, matching the
  argument-groups refactor from mllam#641.
- neural_lam/weather_dataset.py: kept mllam#635's renamed
  _window_same_forecast_by_idx and shared_kwargs setup pattern; added type
  hints to match the post-mllam#631 type-hint sweep. Updated __getitem__ and
  __iter__ return types from 4-tuple to 5-tuple to reflect the new
  boundary tensor. Typed shared_kwargs as dict[str, Any] to satisfy mypy
  on the **kwargs splat.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sadamov added a commit to sadamov/neural-lam that referenced this pull request Jun 7, 2026
Resolved conflicts:
- neural_lam/train_model.py: moved --num_past_boundary_steps and
  --num_future_boundary_steps to use data_group.add_argument, matching
  the argument-groups refactor from mllam#641 (same as mllam#635 resolution).
- neural_lam/weather_dataset.py: kept mllam#636's _slice_forcing_time
  signature with the extra num_past_steps/num_future_steps params,
  added type hints from main. Typed shared_kwargs as dict[str, Any] to
  satisfy mypy on the **kwargs splat. Updated _build_item_dataarrays,
  __getitem__, and __iter__ return types from 4-tuple to 5-tuple to
  reflect the new boundary tensor.
- neural_lam/vis.py: kept mllam#636's boundary_da / boundary_datastore /
  boundary_margin_degrees args on plot_on_axis, plot_prediction, and
  plot_spatial_error. crop_to_interior is preserved as a deprecated
  parameter on plot_prediction and plot_spatial_error (already handled
  with a deprecation warning earlier on this branch). Type hints from
  main's post-mllam#631 sweep are applied throughout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Matches the CI invocation introduced in mllam#651 so this branch's eventual
ERA5 boundary integration test (and any other @pytest.mark.slow on this
branch) is exercised by CI, not just by local --run-slow invocations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@leifdenby

Copy link
Copy Markdown
Member

@sadamov I added #652 to say how I would generalise the config. I am not saying that we need to use that config layout with this PR, but you could maybe have a look at that PR and see what you think? If you do like to use it then we could maybe consider using a structure in this PR that could remain relatively unchanged down the line.

Per the simplification in mllam#651, the custom --run-slow flag is being
dropped in favour of relying on pytest's native -m marker selection.
Remove the corresponding pytest_addoption + pytest_collection_modifyitems
from conftest and the --run-slow flag from CI on this branch too, so
this PR doesn't re-introduce conflicts when mllam#651 lands. The slow marker
registration in pyproject.toml stays, ready for use on any future
@pytest.mark.slow test on this branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sadamov added a commit to sadamov/neural-lam that referenced this pull request Jun 9, 2026
… constructor only)

Pre-emptively land the public CONFIG-LAYER shape proposed in
mllam#652 on top of the boundary-datastore work in mllam#635 so
the model-side adapter (Joel's follow-up) doesn't have to break the
schema again later. The return type of `WeatherDataset.__getitem__`
stays as a tuple; the per-sample boundary tensor is dropped from the
public output entirely (it's loaded internally for the data
infrastructure that mllam#635 introduced, but not surfaced) because the
model isn't consuming it today and the multi-source consumption path
under mllam#652 will reintroduce it via a model-side `ForecastBatch` with
per-source dicts.

Config schema (neural_lam/config.py):
- Replace `datastore` + `datastore_boundary` top-level keys with a
  single `datastores: Dict[str, DatastoreSelection]` mapping. The dict
  key becomes the canonical source name used throughout the pipeline.
- `DatastoreSelection` grows optional `inputs:` and `outputs:` per-
  category variable include-lists (parsed but not honoured at runtime
  yet; a follow-up filters tensors by them once Joel's model-side
  consumption lands and decides the shape).
- Validate at config load: raise InvalidConfigError when two
  datastores declare the same variable as an output, pointing at
  mdp's `dim_mapping.name_format` or `xr.Dataset.assign_coords` on
  the existing zarr's small `{category}_feature` coord.
- `load_config_and_datastore` returns `(config, Dict[str, BaseDatastore])`
  rather than the legacy `(config, interior, boundary)` triple.

WeatherDataset (neural_lam/weather_dataset.py):
- Constructor takes `(datastores, selections, ...)` dicts directly.
  Internally resolves the single interior + optional boundary pair
  for its existing slicing/windowing logic (which is unchanged).
- Boundary loading + windowing infrastructure stays intact - the
  boundary datastore is still loaded, the boundary forcing is still
  built per sample - but the per-sample tensor is intentionally
  not surfaced in __getitem__.
- __getitem__ returns the pre-mllam#635 4-tuple
  `(init_states, target_states, forcing, target_times)` with an
  explicit TODO(mllam#652) marker at the construction site pointing
  Joel at where the model-side ForecastBatch will plug in.
- `create_dataarray_from_tensor` refactored to expose a
  `build_dataarray_from_tensor` staticmethod so model.py can build
  DataArrays without instantiating a full WeatherDataset under the
  new constructor signature.

ForecasterModule (neural_lam/models/module.py):
- `on_after_batch_transfer` reverts to the pre-mllam#635 4-tuple unpack.
- `common_step` reverts to the pre-mllam#635 4-tuple unpack.
- `plot_examples` updates `time = batch[4]` to `time = batch[3]`.
- Boundary normalisation buffers and tiled caches removed; the
  boundary datastore reference (`self.datastore_boundary`) is still
  held so that the future model-side adapter (mllam#652 follow-up) can
  re-introduce normalisation alongside the per-source dict consumption.
- `_create_dataarray_from_tensor` uses the new
  `WeatherDataset.build_dataarray_from_tensor` staticmethod.

Production call site (neural_lam/train_model.py):
- Unpacks the new `(config, datastores)` return shape.
- Uses `_resolve_datastore_roles` from weather_dataset to pick out
  the interior + boundary for the legacy ForecasterModule constructor.

Example YAMLs (tests/datastore_examples/):
- Single-source danra: `datastores: {danra: ...}`.
- danra + era5 boundary: `datastores: {interior: ..., boundary: ...}`
  with explicit `outputs: {state: }` on interior so the resolver
  knows which one is the prognostic source.

Intentionally deferred to mllam#652 follow-up:
- Surfacing boundary in the per-sample output via a model-side
  ForecastBatch with per-source dicts.
- Honouring `inputs:` / `outputs:` include-lists at runtime.
- Diagnostic outputs (parsed in schema, not yet wired through).

Refs mllam#635, refs mllam#652.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sadamov added a commit to sadamov/neural-lam that referenced this pull request Jun 9, 2026
Replace the single `datastore:` top-level config field with a
`datastores:` mapping keyed by user-chosen names. Each entry is a
DatastoreSelection with optional per-category `inputs` / `outputs`
declarations; one datastore must declare outputs (the interior /
prognostic source) and zero or more may contribute input-only sources
that are reserved for the model-side multi-source consumption (the
mllam#652 follow-up).

WeatherDataset and WeatherDataModule take `datastores` and
`selections` dicts; their per-sample return shape and the model unpack
are unchanged from current main, so this is a config + data-loader
constructor refactor only. Internally the dataset still operates on
the interior datastore alone.

load_config_and_datastore returns (config, Dict[str, BaseDatastore]).
A config-time validator rejects two datastores declaring the same
output variable name, with an error message pointing at mdp's
`dim_mapping.name_format` and `xr.Dataset.assign_coords` as the two
ways to disambiguate.

Other callers updated:
- train_model.py resolves interior + boundary roles for the legacy
  single-source model side.
- create_graph.py and plot_graph.py resolve the interior datastore
  via `_resolve_datastore_roles` instead of the old 2-tuple.
- module.py refactors `_create_dataarray_from_tensor` to use a new
  `WeatherDataset.build_dataarray_from_tensor` staticmethod so the
  model doesn't need to instantiate a full WeatherDataset with the
  new dict signature.

This PR is an alternative to mllam#635: it adopts the public schema
proposed in mllam#652 without bringing in mllam#635's internal boundary
loading. Boundary forcing, multi-source inputs and diagnostic
outputs land via the mllam#652 model-side follow-up.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@sadamov sadamov force-pushed the feat/boundary-datastore branch from 6d237d7 to 78a4d52 Compare June 9, 2026 07:25
sadamov added a commit to sadamov/neural-lam that referenced this pull request Jun 9, 2026
Replace the single `datastore:` top-level config field with a
`datastores:` mapping keyed by user-chosen names. Each entry is a
DatastoreSelection with optional per-category `inputs` / `outputs`
declarations; one datastore must declare outputs (the interior /
prognostic source) and zero or more may contribute input-only sources
that are reserved for the model-side multi-source consumption (the
mllam#652 follow-up).

WeatherDataset and WeatherDataModule take `datastores` and
`selections` dicts; their per-sample return shape and the model unpack
are unchanged from current main, so this is a config + data-loader
constructor refactor only. Internally the dataset still operates on
the interior datastore alone.

load_config_and_datastore returns (config, Dict[str, BaseDatastore]).
A config-time validator rejects two datastores declaring the same
output variable name, with an error message pointing at mdp's
`dim_mapping.name_format` and `xr.Dataset.assign_coords` as the two
ways to disambiguate.

Other callers updated:
- train_model.py resolves interior + boundary roles for the legacy
  single-source model side.
- create_graph.py and plot_graph.py resolve the interior datastore
  via `_resolve_datastore_roles` instead of the old 2-tuple.
- module.py refactors `_create_dataarray_from_tensor` to use a new
  `WeatherDataset.build_dataarray_from_tensor` staticmethod so the
  model doesn't need to instantiate a full WeatherDataset with the
  new dict signature.

This PR is an alternative to mllam#635: it adopts the public schema
proposed in mllam#652 without bringing in mllam#635's internal boundary
loading. Boundary forcing, multi-source inputs and diagnostic
outputs land via the mllam#652 model-side follow-up.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sadamov added a commit to sadamov/neural-lam that referenced this pull request Jun 9, 2026
Match the marker registration used on mllam#635/mllam#651 so any future
@pytest.mark.slow test on this branch is recognised by pytest
without warnings. No tests currently use the marker on mllam#656.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@sadamov

sadamov commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

@sadamov I added #652 to say how I would generalise the config. I am not saying that we need to use that config layout with this PR, but you could maybe have a look at that PR and see what you think? If you do like to use it then we could maybe consider using a structure in this PR that could remain relatively unchanged down the line.

#652 (comment)

Copilot AI pushed a commit that referenced this pull request Jun 9, 2026
Register a `slow` marker in `pyproject.toml` and a matching
`--run-slow` CLI flag in `tests/conftest.py`. Tests carrying
`@pytest.mark.slow` are skipped by default and can be opted into
via `pytest --run-slow`.

Mark `test_training` (parametrised over all datastores) and
`test_training_output_std` as slow because they run a real
`trainer.fit` loop on the MDP / npyfilesmeps datastores and take
minutes per parametrisation. The fast unit tests in the suite
(`test_all_gather_cat_*`, datastore tests against the dummy fixture,
etc.) continue to run on every `pytest` invocation.

Motivated by #635 which already needed this mechanism for the ERA5
boundary integration test. Landing the infra separately so it's
reusable across the suite (e.g. the `test_state_only_datastore_*`
training test on #231 and future trainer.fit-based regressions).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

discussion enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants