Skip to content

Conversation

@jsignell
Copy link
Contributor

@jsignell jsignell commented Nov 11, 2025

@jsignell jsignell changed the title Index variables Coerce it IndexVariable to Variable when assigning to data variables or coordinates Nov 11, 2025
@jsignell jsignell changed the title Coerce it IndexVariable to Variable when assigning to data variables or coordinates Coerce IndexVariable to Variable when assigning to data variables or coordinates Nov 11, 2025
Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you!


def _assert_variable_invariants(var: Variable, name: Hashable = None):
def _assert_variable_invariants(
var: Variable | Any,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check now happens within this function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with https://github.com/pydata/xarray/pull/10909/files#r2528970158, it may be better to keep the IndexVariable vs Variable check at the level of Dataset and/or DataArray invariants.

Actually, the right place should probably be _assert_indexes_invariants_checks, where we could add the check that non-index variables are not instances of IndexVariable.

stacked.coords["variable"].drop_vars(["z", "variable", "y"]),
expected_stacked_variable,
stacked["variable"].variable.to_base_variable(),
expected_stacked_variable.variable,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the only test change that was needed. Basically stacked["variable"].variable is an IndexVariable because it is represents a coord that is part of a MultiIndex. But you can't know that if you only have access to the DataArray with no coords. So I just switched it to be comparing variable and to coerce to the Variable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah multi-indexes are where IndexVariable really bites us

e.g. #8887

@jsignell jsignell requested a review from dcherian November 12, 2025 22:07
Co-authored-by: Illviljan <[email protected]>
for k, v in da._coords.items():
_assert_variable_invariants(v, k)
_assert_variable_invariants(
v, k, check_default_indexes=check_default_indexes, is_index=k in da._indexes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
v, k, check_default_indexes=check_default_indexes, is_index=k in da._indexes
v, k, is_index=k in da._indexes

"check default indexes" doesn't mean anything for Variable, which is only data + named dims + attrs

def _assert_variable_invariants(
var: Variable | Any,
name: Hashable = None,
check_default_indexes: bool = True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
check_default_indexes: bool = True,

Copy link
Member

@benbovy benbovy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jsignell!

Just added two small comments.


def _assert_variable_invariants(var: Variable, name: Hashable = None):
def _assert_variable_invariants(
var: Variable | Any,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with https://github.com/pydata/xarray/pull/10909/files#r2528970158, it may be better to keep the IndexVariable vs Variable check at the level of Dataset and/or DataArray invariants.

Actually, the right place should probably be _assert_indexes_invariants_checks, where we could add the check that non-index variables are not instances of IndexVariable.

Comment on lines +4784 to +4787
def test_setitem_uses_base_variable_class_even_for_index_variables(self) -> None:
ds = Dataset(coords={"x": [1, 2, 3]})
ds["y"] = ds["x"]
_assert_internal_invariants(ds, check_default_indexes=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_setitem_uses_base_variable_class_even_for_index_variables(self) -> None:
ds = Dataset(coords={"x": [1, 2, 3]})
ds["y"] = ds["x"]
_assert_internal_invariants(ds, check_default_indexes=True)
def test_setitem_uses_base_variable_class_even_for_index_variables(self) -> None:
ds = Dataset(coords={"x": [1, 2, 3]})
ds["y"] = ds["x"]
# explicit check
assert isintance(ds["x"].variable, IndexVariable)
assert not isinstance(ds["y"].variable, IndexVariable)
# test internal invariant checks when comparing the datasets
expected = Dataset(coords={"x": [1, 2, 3], "y": ("x", [1, 2, 3])})
assert_identical(ds, expected)

Nit suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

5 participants