Skip to content

map_blocks takes only one block from additional arguments when the sizes don't match #10908

@a2nalea

Description

@a2nalea

What happened?

Hello,

I have a special case where I would like to use xarray.map_blocks to distribute a work load which wouldn't otherwise hold in memory.
The operation to perform is relatively simple: value assignment at specific pixel indices.
The how to get those pixel indices is less trivial but the xarray logic around coordinates simplifies it a lot!! Hence me using xarray and not directly dask (which doesn't support value assignment along indices officially).

What happened:
I have 3 inputs:

  • the array on which I'll set the values
  • the array which is used to compute the pixel indices
  • the array with the values

While they all have the same chunk size, they do not share the same number of chunks, but array 2 and 3 have same size and chunk numbers.

When going through map_blocks, I see the following:

  1. we go through all the chunks of the first array
  2. the chunks of array 2 and 3 are smaller (hidden rechunking to match the number of chunks)
  3. only the last chunk of array 2 and 3 is used by map_blocks

This behavior made me first think that they was an error in the writing of the files afterwards.

What did you expect to happen?

I expect :

  • all sub-chunks of the provided args to be used
  • or a big error being provided!

Minimal Complete Verifiable Example

import dask.array
import xarray
import numpy as np


x = np.arange(565)
y = np.arange(1000, 1190)
t = np.arange(221)
z = np.arange(900)

# The first array
projected = xarray.DataArray(
    dask.array.full((x.size, y.size), np.nan, chunks=(100, 100)),
    coords={"x": x, "y": y},
    name="projected"
)

# The array whose values will be used
original = xarray.DataArray(
    np.full((t.size, z.size), 12.345) ,
    coords={"t": t, "z": z},
    dims=["t", "z"]
).chunk({"t": 100})

t2x = np.linspace(0, x.size, t.size)
z2y = np.linspace(y.min(), y.max(), z.size)
tmp = t2x[..., None] @ z2y[None, ...]

# The array with the coordinate correspondance
pp = np.full((t.size, z.size, 2), np.nan)
pp[..., 0] = tmp / z2y.max()
pp[..., 1] = tmp / t2x.max()

pixel_positions = xarray.DataArray(
    pp,
    coords={"t": t, "z": z, "nav": ["x", "y"]},
    dims=["t", "z", "nav"]
).chunk({"t": 100})

def match_coordinates(projected, pixel_positions):
    coordinates = projected.sel(x=pixel_positions[..., 0], y = pixel_positions[..., 1], method="nearest").coords
    return {"x": coordinates["x"], "y": coordinates["y"]}

def place_pixel_on_grid(projected, pixel_positions, original):
    coordinates = match_coordinates(projected, pixel_positions)
    print(f"{projected.shape} -- {original.shape}")
    print(f"pixel_positions coords {pixel_positions.coords}")
    projected_chunk = projected.copy()
    projected_chunk.loc[{"x": coordinates["x"], "y": coordinates["y"]}] = original
    return projected_chunk

xarray.map_blocks(place_pixel_on_grid, projected, args=[pixel_positions, original], template=projected).compute()

Steps to reproduce

Run code provided above: the print statement should show:

  • projected.shape varies and matches the chunks
  • original.shape is constant and equal (21, 900)
  • pixel_position coords are constants and match to the last chunk of data

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

(65, 90) -- (21, 900)
pixel_positions coords Coordinates:
  * t        (t) int64 168B 200 201 202 203 204 205 ... 215 216 217 218 219 220
  * z        (z) int64 7kB 0 1 2 3 4 5 6 7 8 ... 892 893 894 895 896 897 898 899
  * nav      (nav) <U1 8B 'x' 'y'
(100, 100) -- (21, 900)
pixel_positions coords Coordinates:
  * t        (t) int64 168B 200 201 202 203 204 205 ... 215 216 217 218 219 220
  * z        (z) int64 7kB 0 1 2 3 4 5 6 7 8 ... 892 893 894 895 896 897 898 899
  * nav      (nav) <U1 8B 'x' 'y'
(100, 100) -- (21, 900)
pixel_positions coords Coordinates:
  * t        (t) int64 168B 200 201 202 203 204 205 ... 215 216 217 218 219 220
  * z        (z) int64 7kB 0 1 2 3 4 5 6 7 8 ... 892 893 894 895 896 897 898 899
  * nav      (nav) <U1 8B 'x' 'y'
(65, 100) -- (21, 900)
(100, 100) -- (21, 900)
(100, 100) -- (21, 900)
(100, 90) -- (21, 900)
pixel_positions coords Coordinates:
  * t        (t) int64 168B 200 201 202 203 204 205 ... 215 216 217 218 219 220
  * z        (z) int64 7kB 0 1 2 3 4 5 6 7 8 ... 892 893 894 895 896 897 898 899
  * nav      (nav) <U1 8B 'x' 'y'
pixel_positions coords Coordinates:
  * t        (t) int64 168B 200 201 202 203 204 205 ... 215 216 217 218 219 220
  * z        (z) int64 7kB 0 1 2 3 4 5 6 7 8 ... 892 893 894 895 896 897 898 899
  * nav      (nav) <U1 8B 'x' 'y'
(100, 90) -- (21, 900)
pixel_positions coords Coordinates:
  * t        (t) int64 168B 200 201 202 203 204 205 ... 215 216 217 218 219 220
  * z        (z) int64 7kB 0 1 2 3 4 5 6 7 8 ... 892 893 894 895 896 897 898 899
  * nav      (nav) <U1 8B 'x' 'y'
pixel_positions coords Coordinates:
  * t        (t) int64 168B 200 201 202 203 204 205 ... 215 216 217 218 219 220
  * z        (z) int64 7kB 0 1 2 3 4 5 6 7 8 ... 892 893 894 895 896 897 898 899
  * nav      (nav) <U1 8B 'x' 'y'

Anything else we need to know?

No response

Environment

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugneeds triageIssue that has not been reviewed by xarray team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions