-
Notifications
You must be signed in to change notification settings - Fork 78
Description
When configuring dask to use "processes" instead of "threads", writing a multiscale element to disk fails. I get the error message:
File ~/VIB/harpy/.venv_harpy/lib/python3.12/site-packages/spatialdata/_core/spatialdata.py:1177, in SpatialData.write(self, file_path, overwrite, consolidate_metadata, update_sdata_path, sdata_formats)
1174 store.close()
1176 for element_type, element_name, element in self.gen_elements():
-> 1177 self._write_element(
1178 element=element,
1179 zarr_container_path=file_path,
...
--> 120 if group.metadata.zarr_format == 3 and len(multiscales := group.metadata.attributes["ome"]["multiscales"]) != 1:
121 len_scales = len(multiscales)
122 raise ValueError(f"The length of multiscales metadata should be 1, found the length to be {len_scales}")
KeyError: 'ome'
Minimal example to reproduce:
import os
from spatialdata.models import Image2DModel
from spatialdata.datasets import blobs
import dask
with dask.config.set(scheduler='processes'):
sdata = blobs()
sdata=sdata.subset( element_names=[ "blobs_image" ] )
sdata.write(os.path.join( os.environ.get("TMPDIR"), "sdata.zarr" ), overwrite=True ) # this works
print( "done" )
sdata = blobs()
sdata=sdata.subset( element_names=[ "blobs_multiscale_image" ] )
sdata.write(os.path.join( os.environ.get("TMPDIR"), "sdata.zarr" ), overwrite=True )
# this fails with
#--> 120 if group.metadata.zarr_format == 3 and len(multiscales := group.metadata.attributes["ome"]["multiscales"]) != 1:
# 121 len_scales = len(multiscales)
# 122 raise ValueError(f"The length of multiscales metadata should be 1, found the length to be {len_scales}")
#KeyError: 'ome'
with dask.config.set(scheduler='threads'):
sdata = blobs()
sdata=sdata.subset( element_names=[ "blobs_image" ] )
sdata.write(os.path.join( os.environ.get("TMPDIR"), "sdata.zarr" ), overwrite=True )
print( "done" )
sdata = blobs()
sdata=sdata.subset( element_names=[ "blobs_multiscale_image" ] )
sdata.write(os.path.join( os.environ.get("TMPDIR"), "sdata.zarr" ), overwrite=True )
print("done")I tested this on MacOS and CentOS, and got the same error message.
Working with spatialdata==0.6.0, spatial_image==1.2.3 and multiscale_spatial_image==2.0.3.
I also noticed, for images and labes, that the layer in the dask graph that starts with "from-zarr-" is no longer materialized when the spatialdata object is backed by a zarr store (zarr>=3). Probably this is unrelated to the current issue, but I found this a bit strange, since this was not the case in earlier versions of spatialdata(<0.5.0), see https://github.com/saeyslab/harpy/blob/609d639c7578a4c64c3eede4c974d4e90f982910/src/harpy/_tests/test_image/test_manager.py#L47