Skip to content

Validation doesn't catch length mismatch between chunks and shape in .zarray #51

Description

@LucaMarconato

I discovered this issue when reporting this other one: seung-lab/igneous#199.

In the steps below I will download a dataset that is valid and then I will corrupt it by modifying the .zarray data in labels/nuclei. In the corrupted data the length of chunks and shape in .zarray differs. This leads to parsing error in ome-zarr-py and ngff-zarr, while ome-ngff-validator still tells that the dataset is valid.

To reproduce

Preparation

  1. Download the dataset described here.
  2. cd 20200812-CardiomyocyteDifferentiation14-Cycle1.zarr/B/03/0/labels/nuclei
  3. run cat 0/.zarray, you should see this:
Image
  1. run http-server --cors
  2. open https://ome.github.io/ome-ngff-validator/?source=http://127.0.0.1:8080. It should tell that the dataset is valid
  3. Run the following script (please adjust the paths), it should run without errors (i.e. the data can be parsed both with ome-zarr-py and ngff-zarr)
##
from pathlib import Path

image = Path(
    "/Users/macbook/ssd/biodata/ome-zarr/20200812"
    "-CardiomyocyteDifferentiation14-Cycle1.zarr/B/03/0"
)
labels = Path(
    "/Users/macbook/ssd/biodata/ome-zarr/20200812"
    "-CardiomyocyteDifferentiation14-Cycle1.zarr/B/03/0/labels/nuclei"
)
assert image.exists()
assert labels.exists()

##
import ngff_zarr as nz

im = nz.from_ngff_zarr(image)
la = nz.from_ngff_zarr(labels) 

##
from ome_zarr.io import parse_url
from ome_zarr.reader import Reader

# read the image data
store = parse_url(image, mode="r").store

reader = Reader(parse_url(image))
# nodes may include images, labels etc
nodes = list(reader())
# first node will be the image pixel data
image_node = nodes[0]

dask_data = image_node.data

The bug

  1. Now open a text editor and modify
    "chunks": [
        1,
        1080,
        1280
    ],

into

    "chunks": [
        1,
        1,
        1,
        1080,
        1280
    ],
  1. ome-zarr-validator will still thing that the data is valid.
  2. instead the script above will fail on la = nz.from_ngff_zarr(labels) (i.e. ngff-zarr can't parse the data). If you comment the line then the script will fail on nodes = list(reader()), i.e. ome-zarr-py also can't parse the data.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions