Skip to content

Respect STAC cube:dimensions as source of truth in metadata_from_stac + align tests with dimension policy#867

Open
suriyahgit wants to merge 10 commits into
Open-EO:masterfrom
Eurac-Research-Institute-for-EO:dim_policy
Open

Respect STAC cube:dimensions as source of truth in metadata_from_stac + align tests with dimension policy#867
suriyahgit wants to merge 10 commits into
Open-EO:masterfrom
Eurac-Research-Institute-for-EO:dim_policy

Conversation

@suriyahgit
Copy link
Copy Markdown
Member

Problem

metadata_from_stac() previously injected openEO default dimensions (x, y, t, bands) even when a STAC object already defined dimensions via cube:dimensions.

This led to:

  • dimension names being overridden or altered,
  • inconsistencies between STAC metadata and openEO metadata,
  • tests relying on implicit default behavior instead of STAC semantics.

Fix

Introduce a clear dimension resolution policy:

  • If cube:dimensions is present
    → treat it as the source of truth and preserve dimension names and order.

  • If cube:dimensions is absent
    → fall back to openEO defaults (x, y, optional t).

The metadata parsing logic was refactored accordingly and aligned with PySTAC datacube capabilities.

Testing

  • Updated existing tests to no longer assume an always-present band dimension.

  • Added checks verifying:

    • fallback behavior when cube:dimensions is missing,
    • preservation of dimension names when cube:dimensions exists,
    • correct temporal dimension handling across STAC object types.
  • Tests now explicitly enforce the new dimension policy to prevent regressions.

This change makes metadata parsing more STAC-compliant while keeping backward-compatible fallback behavior.

Refer:
https://github.com/destine-datalake-cube/dedl-openeo-coordination/issues/3
Also solves: #743

Copy link
Copy Markdown
Member

@soxofaan soxofaan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just could give this a quick scan to make some notes

Comment thread openeo/metadata.py
Comment thread openeo/metadata.py Outdated
Comment thread openeo/metadata.py Outdated
Comment thread tests/test_metadata.py Outdated
Comment thread tests/test_metadata.py Outdated
Comment thread tests/test_metadata.py
Comment thread tests/test_metadata.py Outdated
Comment thread tests/test_metadata.py Outdated
Comment thread tests/test_metadata.py Outdated
suriyahgit and others added 2 commits March 10, 2026 10:10
…se-correction for Dict to support python3.8

Co-authored-by: Stefaan Lippens <soxofaan@users.noreply.github.com>
@suriyahgit
Copy link
Copy Markdown
Member Author

suriyahgit commented Mar 10, 2026

ready for review @soxofaan

@clausmichele
Copy link
Copy Markdown
Member

Hi @soxofaan , please let us know if there is something we can do to move on with this PR!

Comment thread tests/test_metadata.py

# Dimension name resolution policy (STAC cube:dimensions vs openEO defaults)
@pytest.mark.parametrize(
["stac_dict", "expected_dims"],
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can a STAC 1.1 style test case be added which uses the new 'bands' common metadata and no datacube extension?
Collection metadata example:
https://github.com/radiantearth/stac-spec/blob/master/examples/collection-only/collection.json#L152
Bands spec:
https://github.com/radiantearth/stac-spec/blob/master/commons/common-metadata.md#band-object

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a STAC 1.1 bands common metadata test without the datacube extension in test_metadata_from_stac_stac_1_1_common_bands_without_datacube_extension. The test verifies that metadata_from_stac() still builds the fallback openEO dimensions and extracts the common bands metadata correctly.

Copy link
Copy Markdown
Collaborator

@jdries jdries left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, Stefaan is on holiday
Added one comment myself.
Please also add a changelog entry.
Otherwise I'm inclined to say it looks good, assuming this change mostly preserves existing behaviour. (Biggest worry are collections with faulty datacube extension metadata that would suddenly have a change in behavior.)

Copy link
Copy Markdown
Member

@soxofaan soxofaan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some more notes

Comment thread openeo/metadata.py Outdated
TemporalDimension(name="t", extent=self.infer_temporal_extent(stac_object)),
]
if isinstance(stac_object, (pystac.Collection, pystac.Item)):
dimensions.append(BandDimension(name="bands", bands=list(bands)))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why conditionally add the band dimension here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this conditional injection. When cube:dimensions is present, the declared dimensions are now treated as the source of truth and no fallback bands dimension is added.

Comment thread openeo/metadata.py Outdated
cube_dimensions = self.cube_dimensions_dict(stac_object)
return isinstance(cube_dimensions, dict) and len(cube_dimensions) > 0

def cube_dimensions_dict(self, stac_object: pystac.STACObject) -> Dict[str, dict]:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this has to be in the public interface

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I made these parser helpers private: _has_cube_dimensions(), _cube_dimensions_dict(), and _parse_declared_dimensions().

Comment thread openeo/metadata.py Outdated
return [Rfc3339(propagate_none=True).normalize(d) for d in interval]

if isinstance(stac_object, pystac.Item):
props = getattr(stac_object, "properties", {}) or {}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wondering: why using getattr here? Isn't the presence of properties guaranteed?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this to use stac_obj.properties directly for items.

Comment thread openeo/metadata.py Outdated
norm = Rfc3339(propagate_none=True).normalize(dt_)
return [norm, norm]
start = props.get("start_datetime")
end = props.get("end_datetime")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the "datetime" field a required field, so is this code even reachable?
Or to put differently: shouldn't "start_datetime"/"end_datetime" get higher precedence over "datetime"?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the item temporal fallback logic so start_datetime / end_datetime take precedence over datetime and added test coverage for that precedence.

Comment thread openeo/metadata.py Outdated
dimensions = self.dimensions_from_stac_object(stac_object=stac_object, bands=bands)
return CubeMetadata(dimensions=dimensions)

def dimensions_from_stac_object(self, stac_object: pystac.STACObject, bands: _BandList) -> List[Dimension]:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why having bands as argument here? Shouldn't it be the responsibility of this method to detect the bands? Note that this is a public method, so it's weird to have an argument for something you actually want to extract

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. dimensions_from_stac_object() now detects bands internally through bands_from_stac_object() instead of requiring a bands argument. I didn't notice that this bands_from stac_object() exists earlier, my bad.

Comment thread openeo/metadata.py Outdated
dimensions: List[Dimension] = [
SpatialDimension(name="x", extent=[None, None]),
SpatialDimension(name="y", extent=[None, None]),
TemporalDimension(name="t", extent=self.infer_temporal_extent(stac_object)),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reuse get_temporal_dimension here instead ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the usage of this temporal extent helper, and reusing get_temporal_dimension here instead.

Comment thread openeo/metadata.py Outdated

return ext or [None, None]

def parse_declared_dimensions(self, stac_object: pystac.STACObject, bands: _BandList) -> List[Dimension]:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as noted above, it looks weird to have this required bands argument here without explanation

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the required bands argument from the public-facing parser method. dimensions_from_stac_object() now detects bands internally through bands_from_stac_object(), no longer needed to pass extracted bands into the method. This behaviour is consistent across PR now!

@suriyahgit suriyahgit requested review from jdries and soxofaan May 15, 2026 10:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants