Skip to content

groupby_bins fails().mean() on time series data #10217

Open
@relativistic

Description

@relativistic

What happened?

I'm not sure if this is a bug, or just surprising behavior.

When I have a dataset with timeseries variables, and I do a groupby_bins operation followed by a mean() operation, the timeseries data is silently dropped from the dataset, instead of being aggregated.

What did you expect to happen?

I expect the groupby_bins operation to be applied to time_series data when it is applicable to time series data. For example, in the example code below, the mean() operation should have return the average time in each bin.

Some aggregation operations might not be well defined for time (arguably sum(), for example). In such cases I'd expect it should return nans or raise an error.

Minimal Complete Verifiable Example

import xarray as xr
import numpy as np
import pandas as pd

ds = xr.Dataset({
                    'measurement':('trial',np.arange(0,100,10)),
                    'time':('trial',pd.date_range("20240101T1500", "20240101T1501", 10))
                },
                coords={'trial':np.arange(10)}
               
)
ds_agged= ds.groupby_bins('trial',5).mean()

# 'time' variable is mmissing from results, but measurement is present
print(ds_agged)

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

<xarray.Dataset> Size: 80B
Dimensions:      (trial_bins: 5)
Coordinates:
  * trial_bins   (trial_bins) object 40B (-0.009, 1.8] (1.8, 3.6] ... (7.2, 9.0]
Data variables:
    measurement  (trial_bins) float64 40B 5.0 25.0 45.0 65.0 85.0

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.10.16 | packaged by conda-forge | (main, Dec 5 2024, 14:16:10) [GCC 13.3.0]
python-bits: 64
OS: Linux
OS-release: 6.8.0-52-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2

xarray: 2025.3.1
pandas: 2.2.3
numpy: 2.1.3
scipy: 1.15.2
netCDF4: 1.7.2
pydap: 3.5.4
h5netcdf: 1.6.1
h5py: 3.13.0
zarr: 2.18.3
cftime: 1.6.4
nc_time_axis: 1.4.1
iris: 3.11.0
bottleneck: 1.4.2
dask: 2025.3.0
distributed: 2025.3.0
matplotlib: 3.10.1
cartopy: 0.24.0
seaborn: 0.13.2
numbagg: 0.9.0
fsspec: 2025.3.2
cupy: None
pint: 0.24.4
sparse: 0.16.0
flox: None
numpy_groupies: None
setuptools: 75.8.0
pip: 25.0
conda: None
pytest: None
mypy: None
IPython: 8.32.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions