-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
What happened?
Hello,
I have a special case where I would like to use xarray.map_blocks to distribute a work load which wouldn't otherwise hold in memory.
The operation to perform is relatively simple: value assignment at specific pixel indices.
The how to get those pixel indices is less trivial but the xarray logic around coordinates simplifies it a lot!! Hence me using xarray and not directly dask (which doesn't support value assignment along indices officially).
What happened:
I have 3 inputs:
- the array on which I'll set the values
- the array which is used to compute the pixel indices
- the array with the values
While they all have the same chunk size, they do not share the same number of chunks, but array 2 and 3 have same size and chunk numbers.
When going through map_blocks, I see the following:
- we go through all the chunks of the first array
- the chunks of array 2 and 3 are smaller (hidden rechunking to match the number of chunks)
- only the last chunk of array 2 and 3 is used by
map_blocks
This behavior made me first think that they was an error in the writing of the files afterwards.
What did you expect to happen?
I expect :
- all sub-chunks of the provided
argsto be used - or a big error being provided!
Minimal Complete Verifiable Example
import dask.array
import xarray
import numpy as np
x = np.arange(565)
y = np.arange(1000, 1190)
t = np.arange(221)
z = np.arange(900)
# The first array
projected = xarray.DataArray(
dask.array.full((x.size, y.size), np.nan, chunks=(100, 100)),
coords={"x": x, "y": y},
name="projected"
)
# The array whose values will be used
original = xarray.DataArray(
np.full((t.size, z.size), 12.345) ,
coords={"t": t, "z": z},
dims=["t", "z"]
).chunk({"t": 100})
t2x = np.linspace(0, x.size, t.size)
z2y = np.linspace(y.min(), y.max(), z.size)
tmp = t2x[..., None] @ z2y[None, ...]
# The array with the coordinate correspondance
pp = np.full((t.size, z.size, 2), np.nan)
pp[..., 0] = tmp / z2y.max()
pp[..., 1] = tmp / t2x.max()
pixel_positions = xarray.DataArray(
pp,
coords={"t": t, "z": z, "nav": ["x", "y"]},
dims=["t", "z", "nav"]
).chunk({"t": 100})
def match_coordinates(projected, pixel_positions):
coordinates = projected.sel(x=pixel_positions[..., 0], y = pixel_positions[..., 1], method="nearest").coords
return {"x": coordinates["x"], "y": coordinates["y"]}
def place_pixel_on_grid(projected, pixel_positions, original):
coordinates = match_coordinates(projected, pixel_positions)
print(f"{projected.shape} -- {original.shape}")
print(f"pixel_positions coords {pixel_positions.coords}")
projected_chunk = projected.copy()
projected_chunk.loc[{"x": coordinates["x"], "y": coordinates["y"]}] = original
return projected_chunk
xarray.map_blocks(place_pixel_on_grid, projected, args=[pixel_positions, original], template=projected).compute()Steps to reproduce
Run code provided above: the print statement should show:
projected.shapevaries and matches the chunksoriginal.shapeis constant and equal (21, 900)pixel_position coordsare constants and match to the last chunk of data
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
(65, 90) -- (21, 900)
pixel_positions coords Coordinates:
* t (t) int64 168B 200 201 202 203 204 205 ... 215 216 217 218 219 220
* z (z) int64 7kB 0 1 2 3 4 5 6 7 8 ... 892 893 894 895 896 897 898 899
* nav (nav) <U1 8B 'x' 'y'
(100, 100) -- (21, 900)
pixel_positions coords Coordinates:
* t (t) int64 168B 200 201 202 203 204 205 ... 215 216 217 218 219 220
* z (z) int64 7kB 0 1 2 3 4 5 6 7 8 ... 892 893 894 895 896 897 898 899
* nav (nav) <U1 8B 'x' 'y'
(100, 100) -- (21, 900)
pixel_positions coords Coordinates:
* t (t) int64 168B 200 201 202 203 204 205 ... 215 216 217 218 219 220
* z (z) int64 7kB 0 1 2 3 4 5 6 7 8 ... 892 893 894 895 896 897 898 899
* nav (nav) <U1 8B 'x' 'y'
(65, 100) -- (21, 900)
(100, 100) -- (21, 900)
(100, 100) -- (21, 900)
(100, 90) -- (21, 900)
pixel_positions coords Coordinates:
* t (t) int64 168B 200 201 202 203 204 205 ... 215 216 217 218 219 220
* z (z) int64 7kB 0 1 2 3 4 5 6 7 8 ... 892 893 894 895 896 897 898 899
* nav (nav) <U1 8B 'x' 'y'
pixel_positions coords Coordinates:
* t (t) int64 168B 200 201 202 203 204 205 ... 215 216 217 218 219 220
* z (z) int64 7kB 0 1 2 3 4 5 6 7 8 ... 892 893 894 895 896 897 898 899
* nav (nav) <U1 8B 'x' 'y'
(100, 90) -- (21, 900)
pixel_positions coords Coordinates:
* t (t) int64 168B 200 201 202 203 204 205 ... 215 216 217 218 219 220
* z (z) int64 7kB 0 1 2 3 4 5 6 7 8 ... 892 893 894 895 896 897 898 899
* nav (nav) <U1 8B 'x' 'y'
pixel_positions coords Coordinates:
* t (t) int64 168B 200 201 202 203 204 205 ... 215 216 217 218 219 220
* z (z) int64 7kB 0 1 2 3 4 5 6 7 8 ... 892 893 894 895 896 897 898 899
* nav (nav) <U1 8B 'x' 'y'Anything else we need to know?
No response