Skip to content

Writing multiscale element to disk fails when configuring Dask to uses "processes" #1024

@ArneDefauw

Description

@ArneDefauw

When configuring dask to use "processes" instead of "threads", writing a multiscale element to disk fails. I get the error message:

File ~/VIB/harpy/.venv_harpy/lib/python3.12/site-packages/spatialdata/_core/spatialdata.py:1177, in SpatialData.write(self, file_path, overwrite, consolidate_metadata, update_sdata_path, sdata_formats)
   1174 store.close()
   1176 for element_type, element_name, element in self.gen_elements():
-> 1177     self._write_element(
   1178         element=element,
   1179         zarr_container_path=file_path,
...
--> 120 if group.metadata.zarr_format == 3 and len(multiscales := group.metadata.attributes["ome"]["multiscales"]) != 1:
    121     len_scales = len(multiscales)
    122     raise ValueError(f"The length of multiscales metadata should be 1, found the length to be {len_scales}")

KeyError: 'ome'

Minimal example to reproduce:

import os
from spatialdata.models import Image2DModel

from spatialdata.datasets import blobs

import dask

with dask.config.set(scheduler='processes'):

    sdata = blobs()
    sdata=sdata.subset( element_names=[ "blobs_image" ] )
    sdata.write(os.path.join( os.environ.get("TMPDIR"), "sdata.zarr" ), overwrite=True   ) # this works
    print( "done" )

    sdata = blobs()
    sdata=sdata.subset( element_names=[ "blobs_multiscale_image" ] )
    sdata.write(os.path.join( os.environ.get("TMPDIR"), "sdata.zarr" ), overwrite=True   )
# this fails with
#--> 120 if group.metadata.zarr_format == 3 and len(multiscales := group.metadata.attributes["ome"]["multiscales"]) != 1:
#    121     len_scales = len(multiscales)
#    122     raise ValueError(f"The length of multiscales metadata should be 1, found the length to be {len_scales}")

#KeyError: 'ome'

with dask.config.set(scheduler='threads'):

    sdata = blobs()
    sdata=sdata.subset( element_names=[ "blobs_image" ] )
    sdata.write(os.path.join( os.environ.get("TMPDIR"), "sdata.zarr" ), overwrite=True   )
    print( "done" )

    sdata = blobs()
    sdata=sdata.subset( element_names=[ "blobs_multiscale_image" ] )
    sdata.write(os.path.join( os.environ.get("TMPDIR"), "sdata.zarr" ), overwrite=True   )
    print("done")

I tested this on MacOS and CentOS, and got the same error message.

Working with spatialdata==0.6.0, spatial_image==1.2.3 and multiscale_spatial_image==2.0.3.

I also noticed, for images and labes, that the layer in the dask graph that starts with "from-zarr-" is no longer materialized when the spatialdata object is backed by a zarr store (zarr>=3). Probably this is unrelated to the current issue, but I found this a bit strange, since this was not the case in earlier versions of spatialdata(<0.5.0), see https://github.com/saeyslab/harpy/blob/609d639c7578a4c64c3eede4c974d4e90f982910/src/harpy/_tests/test_image/test_manager.py#L47

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions