I discovered this issue when reporting this other one: seung-lab/igneous#199.
In the steps below I will download a dataset that is valid and then I will corrupt it by modifying the .zarray data in labels/nuclei. In the corrupted data the length of chunks and shape in .zarray differs. This leads to parsing error in ome-zarr-py and ngff-zarr, while ome-ngff-validator still tells that the dataset is valid.
To reproduce
Preparation
- Download the dataset described here.
cd 20200812-CardiomyocyteDifferentiation14-Cycle1.zarr/B/03/0/labels/nuclei
- run
cat 0/.zarray, you should see this:
- run
http-server --cors
- open
https://ome.github.io/ome-ngff-validator/?source=http://127.0.0.1:8080. It should tell that the dataset is valid
- Run the following script (please adjust the paths), it should run without errors (i.e. the data can be parsed both with
ome-zarr-py and ngff-zarr)
##
from pathlib import Path
image = Path(
"/Users/macbook/ssd/biodata/ome-zarr/20200812"
"-CardiomyocyteDifferentiation14-Cycle1.zarr/B/03/0"
)
labels = Path(
"/Users/macbook/ssd/biodata/ome-zarr/20200812"
"-CardiomyocyteDifferentiation14-Cycle1.zarr/B/03/0/labels/nuclei"
)
assert image.exists()
assert labels.exists()
##
import ngff_zarr as nz
im = nz.from_ngff_zarr(image)
la = nz.from_ngff_zarr(labels)
##
from ome_zarr.io import parse_url
from ome_zarr.reader import Reader
# read the image data
store = parse_url(image, mode="r").store
reader = Reader(parse_url(image))
# nodes may include images, labels etc
nodes = list(reader())
# first node will be the image pixel data
image_node = nodes[0]
dask_data = image_node.data
The bug
- Now open a text editor and modify
"chunks": [
1,
1080,
1280
],
into
"chunks": [
1,
1,
1,
1080,
1280
],
ome-zarr-validator will still thing that the data is valid.
- instead the script above will fail on
la = nz.from_ngff_zarr(labels) (i.e. ngff-zarr can't parse the data). If you comment the line then the script will fail on nodes = list(reader()), i.e. ome-zarr-py also can't parse the data.
I discovered this issue when reporting this other one: seung-lab/igneous#199.
In the steps below I will download a dataset that is valid and then I will corrupt it by modifying the
.zarraydata inlabels/nuclei. In the corrupted data the length ofchunksandshapein.zarraydiffers. This leads to parsing error inome-zarr-pyandngff-zarr, whileome-ngff-validatorstill tells that the dataset is valid.To reproduce
Preparation
cd 20200812-CardiomyocyteDifferentiation14-Cycle1.zarr/B/03/0/labels/nucleicat 0/.zarray, you should see this:http-server --corshttps://ome.github.io/ome-ngff-validator/?source=http://127.0.0.1:8080. It should tell that the dataset is validome-zarr-pyandngff-zarr)The bug
into
ome-zarr-validatorwill still thing that the data is valid.la = nz.from_ngff_zarr(labels)(i.e.ngff-zarrcan't parse the data). If you comment the line then the script will fail onnodes = list(reader()), i.e.ome-zarr-pyalso can't parse the data.