Skip to content

Currently no way to create a Coordinates object without indexes for 1D variables #8704

Closed
@TomNicholas

Description

@TomNicholas

What happened?

The workaround described in #8107 (comment) does not seem to work on main, meaning that I think there is currently no way to create an xr.Coordinates object without 1D variables being coerced to indexes. This means there is no way to create a Dataset object without 1D variables becoming IndexVariables being coerced to indexes.

What did you expect to happen?

I expected to at least be able to use the workaround described in #8107 (comment), i.e.

xr.Coordinates({'x': ('x', uarr)}, indexes={})

where uarr is an un-indexable array-like.

Minimal Complete Verifiable Example

class UnindexableArrayAPI:
    ...


class UnindexableArray:
    """
    Presents like an N-dimensional array but doesn't support changes of any kind, 
    nor can it be coerced into a np.ndarray or pd.Index.
    """
    
    _shape: tuple[int, ...]
    _dtype: np.dtype
    
    def __init__(self, shape: tuple[int, ...], dtype: np.dtype) -> None:
        self._shape = shape
        self._dtype = dtype
        self.__array_namespace__ = UnindexableArrayAPI

    @property
    def dtype(self) -> np.dtype:
        return self._dtype
    
    @property
    def shape(self) -> tuple[int, ...]:
        return self._shape
    
    @property
    def ndim(self) -> int:
        return len(self.shape)

    @property
    def size(self) -> int:
        return np.prod(self.shape)

    @property
    def T(self) -> Self:
        raise NotImplementedError()

    def __repr__(self) -> str:
        return f"UnindexableArray(shape={self.shape}, dtype={self.dtype})"

    def _repr_inline_(self, max_width):
        """
        Format to a single line with at most max_width characters. Used by xarray.
        """
        return self.__repr__()

    def __getitem__(self, key, /) -> Self:
        """
        Only supports extremely limited indexing.
        
        I only added this method because xarray will apparently attempt to index into its lazy indexing classes even if the operation would be a no-op anyway.
        """
        from xarray.core.indexing import BasicIndexer
        
        if isinstance(key, BasicIndexer) and key.tuple == ((slice(None),) * self.ndim):
            # no-op
            return self
        else:
            raise NotImplementedError()

    def __array__(self) -> np.ndarray:
        raise NotImplementedError("UnindexableArrays can't be converted into numpy arrays or pandas Index objects")
uarr = UnindexableArray(shape=(3,), dtype=np.dtype('int32'))

xr.Variable(data=uarr, dims=['x'])  # works fine

xr.Coordinates({'x': ('x', uarr)}, indexes={})  # works in xarray v2023.08.0

but in versions after that it triggers the NotImplementedError in __array__:

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Cell In[59], line 1
----> 1 xr.Coordinates({'x': ('x', uarr)}, indexes={})

File ~/Documents/Work/Code/xarray/xarray/core/coordinates.py:301, in Coordinates.__init__(self, coords, indexes)
    299 variables = {}
    300 for name, data in coords.items():
--> 301     var = as_variable(data, name=name)
    302     if var.dims == (name,) and indexes is None:
    303         index, index_vars = create_default_index_implicit(var, list(coords))

File ~/Documents/Work/Code/xarray/xarray/core/variable.py:159, in as_variable(obj, name)
    152     raise TypeError(
    153         f"Variable {name!r}: unable to convert object into a variable without an "
    154         f"explicit list of dimensions: {obj!r}"
    155     )
    157 if name is not None and name in obj.dims and obj.ndim == 1:
    158     # automatically convert the Variable into an Index
--> 159     obj = obj.to_index_variable()
    161 return obj

File ~/Documents/Work/Code/xarray/xarray/core/variable.py:572, in Variable.to_index_variable(self)
    570 def to_index_variable(self) -> IndexVariable:
    571     """Return this variable as an xarray.IndexVariable"""
--> 572     return IndexVariable(
    573         self._dims, self._data, self._attrs, encoding=self._encoding, fastpath=True
    574     )

File ~/Documents/Work/Code/xarray/xarray/core/variable.py:2642, in IndexVariable.__init__(self, dims, data, attrs, encoding, fastpath)
   2640 # Unlike in Variable, always eagerly load values into memory
   2641 if not isinstance(self._data, PandasIndexingAdapter):
-> 2642     self._data = PandasIndexingAdapter(self._data)

File ~/Documents/Work/Code/xarray/xarray/core/indexing.py:1481, in PandasIndexingAdapter.__init__(self, array, dtype)
   1478 def __init__(self, array: pd.Index, dtype: DTypeLike = None):
   1479     from xarray.core.indexes import safe_cast_to_index
-> 1481     self.array = safe_cast_to_index(array)
   1483     if dtype is None:
   1484         self._dtype = get_valid_numpy_dtype(array)

File ~/Documents/Work/Code/xarray/xarray/core/indexes.py:469, in safe_cast_to_index(array)
    459             emit_user_level_warning(
    460                 (
    461                     "`pandas.Index` does not support the `float16` dtype."
   (...)
    465                 category=DeprecationWarning,
    466             )
    467             kwargs["dtype"] = "float64"
--> 469     index = pd.Index(np.asarray(array), **kwargs)
    471 return _maybe_cast_to_cftimeindex(index)

Cell In[55], line 63, in UnindexableArray.__array__(self)
     62 def __array__(self) -> np.ndarray:
---> 63     raise NotImplementedError("UnindexableArrays can't be converted into numpy arrays or pandas Index objects")

NotImplementedError: UnindexableArrays can't be converted into numpy arrays or pandas Index objects

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

Context is #8699

Environment

Versions described above

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions