ZEP10: Generic extensions proposal #67

joshmoore · 2025-05-16T14:43:08Z

This is a follow on to ZEP9 (#65) since #66 limits the scope of ZEP9 solely to phase 1 such that it can be moved to accepted (since zarr-developers/zarr-specs#330 is merged and v3.1 released). This ZEP is equivalent to phase 2 of the original ZEP9 draft and introduces a top-level generic extensions field.

This ZEP will follow the process laid out in ZEP0 and invites votes from the newly refreshed @zarr-developers/implementation-council. This PR may be proactively merged as a draft, but will not be moved to "accepted" until the related PR on zarr-specs is voted on, merged, and v3.2 released.

Please see zarr-developers/zarr-specs#344 for detailed changes.

alimanfoo · 2025-05-16T17:33:25Z

Hi @joshmoore, just a process question, it would seem beneficial to get this PR merged asap so it becomes visible as a draft zep on the zeps website. Who needs to approve that, and what checks would need to be done at this stage to allow merging? E.g., does someone just need to check that the document has the right structure for a ZEP? If so, I'd be happy to approve.

draft/ZEP0010.md

jbms · 2025-05-16T18:14:07Z

Hi @joshmoore, just a process question, it would seem beneficial to get this PR merged asap so it becomes visible as a draft zep on the zeps website. Who needs to approve that, and what checks would need to be done at this stage to allow merging? E.g., does someone just need to check that the document has the right structure for a ZEP? If so, I'd be happy to approve.

I know we've done that in the past for ZEPs but then it is actually harder to comment on it --- I'd need to open a separate issue for each comment..

draft/ZEP0010.md

joshmoore · 2025-05-16T18:47:24Z

@alimanfoo a process question, it would seem beneficial to get this PR merged asap so it becomes visible as a draft zep on the zeps website. Who needs to approve that, and what checks would need to be done at this stage to allow merging? E.g., does someone just need to check that the document has the right structure for a ZEP? If so, I'd be happy to approve.

For merging in the "Draft", yes, that suffices. From https://zarr.dev/zeps/active/ZEP0000.html#submitting-a-zep

"...The Zarr Steering Council and the Zarr Implementations Council will not unreasonably deny publication of a ZEP. Reasons for denying ZEP include duplication of effort, being technically unsound, not providing proper motivation or addressing backwards compatibility, or not taking care of Zarr CODE OF CONDUCT."

@jbms I know we've done that in the past for ZEPs but then it is actually harder to comment on it --- I'd need to open a separate issue for each comment.

I'm certainly all for leaving it open for a bit, especially for the discussion of the material that is only here (as @jbms has done above). I can manage having it open and synchronizing with the specs PR. That being said, if possible, I'd like to get it merged as a "Draft" and then will also keep updating it as necessary to stay in step with discussions on zarr-developers/zarr-specs#344

d-v-b · 2025-05-16T18:49:38Z

Hi @joshmoore, just a process question, it would seem beneficial to get this PR merged asap so it becomes visible as a draft zep on the zeps website. Who needs to approve that, and what checks would need to be done at this stage to allow merging? E.g., does someone just need to check that the document has the right structure for a ZEP? If so, I'd be happy to approve.

seconding @jbms, I rate the ability to discuss the ZEP as a single PR much higher than seeing it listed on the ZEP web site, so I would rather we keep this PR open until it's clear that all the questions have been answered.

draft/ZEP0010.md

d-v-b · 2025-05-16T19:17:05Z

draft/ZEP0010.md

+Note that in this example of the extension is ``must_understand=true`` meaning
+an implementation which does not support the ``example.offset`` extension
+should raise an error.


when should that error be raised? when reading metadata, or when reading chunks?

If the impl doesn't know the example.offset extension, it must fail when parsing the metadata.
It may fail with a out-of-bounds error when reading/writing data outside the domain. But that would be up to the specification for this extension to define.

If the impl doesn't know the example.offset extension, it must fail when parsing the metadata.

It seems to me that a zarr-compatible application should be able to say, for example, "this is an array with shape <shape>, but I can't load chunks for you because of <unknown extension>". Your suggesting that the metadata document should be effectively unreadable prevents this.

It seems to me that a zarr-compatible application should be able to say, for example, "this is an array with shape <shape>, but I can't load chunks for you because of <unknown extension>".

I think that would be a good implementation.

I think that would be a good implementation.

Since the behavior I described relies on reading the metadata without an error, this PR should clarify the distinction between reading metadata documents and other IO operations (e.g., reading chunks, in this example).

If you are purely displaying information to a user and including a warning that an unknown extension was encountered, then displaying whatever information can be heuristically extracted from the metadata successfully may be reasonable.

In general though if there is an unknown extension, you can't really make any assumptions about the meaning of the metadata and any programmatic use is problematic.

For example, the offset extension may mean that the upper bound of the array is no longer indicated by shape but by offset + shape, and the chunk grid starts at offset rather than (0, ...). Maybe there is some program that partitions zarr arrays according to the chunking and then hands off those zarr arrays to worker processes. If the partition program does not support the offset extension, but the worker program does support the offset extension, then the partition program will perform the partitioning incorrectly, but the worker processes may process them without errors, but not correctly aligned to the chunk grid.

Concretely, I'd say that if there is an unknown must_understand=true extension, zarr.open and similar interfaces should not appear to succeed and allow querying properties like the chunk grid, dtype, etc. unless the user explicitly opts into ignoring unknown extensions.

In general though if there is an unknown extension, you can't really make any assumptions about the meaning of the metadata and any programmatic use is problematic.

I find this outcome concerning, as it amounts to fragmenting the zarr ecosystem.

draft/ZEP0010.md

d-v-b · 2025-05-16T19:20:03Z

I think this document should explain why the pre-existing attributes field is insufficient for the purposes of this ZEP.

draft/ZEP0010.md

jbms · 2025-05-16T19:35:14Z

I think this document should explain why the pre-existing attributes field is insufficient for the purposes of this ZEP.

For must_understand: true extensions, like specifying the array content inline, transposing the array, etc. an attribute would definitely not work. However, all of the examples given would work as attributes reasonably well.

draft/ZEP0010.md

Co-authored-by: Davis Bennett <[email protected]>

draft/ZEP0010.md

d-v-b · 2025-05-22T19:18:13Z

I think this document should explain why the pre-existing attributes field is insufficient for the purposes of this ZEP.

For must_understand: true extensions, like specifying the array content inline, transposing the array, etc. an attribute would definitely not work. However, all of the examples given would work as attributes reasonably well.

to be clear, the specific thing that would not work if all extensions were in attributes is that we could not prevent non-compliant implementations from accessing data. Extension-compliant implementations on the other hand would have no trouble reading extensions from attributes.

This makes me wonder: how important is it really to exclude non-compliant implementations from accessing (and possible misinterpreting) data? I.e., how much weight should we assign to this feature. Are there real examples of negative outcomes from misinterpreting specialized zarr data? Or is this purely hypothetical?

joshmoore · 2025-05-24T06:24:25Z

Thanks for the feedback, all. I've pushed a number of clarification commits based on them, and tried to resolve the threads appropriately. I have ideas on further examples (esp. encryption as recently discussed on Zulip), but I'd very much welcome any others that may be floating around (as PRs, comments, etc.)

There are a few remaining conversations:

extensions array vs. one of the alternative
clarifying boundaries between attributes and extensions usage
advanced must_understand semantics

These may be easier on a call to work toward consensus rather than extended back and forth here. Since the previous ZEP meeting spot was cancelled, I'd suggest we start with a one-off. Finding time this coming week (May 26+) may be difficult but two options are:

June 2, 2025 – 20:00–21:00 CEST
June 4, 2025 – 20:00–21:00 CEST

I'd still also like to encourage other implementer voices, @zarr-developers/implementation-council. To ensure everyone feels comfortable contributing, it might be helpful for those who have already shared their perspective to give others space to chime in without feeling the need to immediately respond or defend their thoughts points.

LDeakin · 2025-05-25T00:01:19Z

Thanks Josh and Norman, this looks pretty great! My thoughts based on the PR and comments so far:

An extension array seems the most flexible, as it permits ordered / repeated extensions
The distinction between extensions and attributes is certainly is not as simple as automated vs human. The way I see it:
- An extension may affect chunk operations, metadata parsing, store operations, and other fundamental Zarr functionality in unforeseen ways. Such an extension requires support from a Zarr implementation like zarr-python, tensorstore, zarrs, etc.
- An attribute may change how data/metadata is interpreted, and would be the responsibility of downstream libraries (like ome-zarr-py), but implementations could support them too. These would fit under the banner of ZEP04.
I'm conflicted on isolating reading/writing with must_understand, but leaning to keeping it true/false because
- must_understand: true/false is backwards-compatible
- must_understand: true clearly means an implementation must support reading/writing
- must_understand: false in an extension implies to me that an implementation should support reading, but should not write unless it is actually aware of the extension and knows that it is okay to do so
  - An extension that never needs to be understood for reading or writing seems like it should just be an attribute

jbms · 2025-05-25T05:01:57Z

Can someone give an example of how order-dependent extensions might be used/specified?

It seems to me that it would be potentially confusing to have most extensions be order-independent but in certain cases have order dependence. It also seems like it would be quite challenging to specify the order-dependent behavior unless you can map the extension to some composable "interface" like a codec or storage transformer. But if there is such a composable interface, the extension should just be defined as specifying a list of "things" that conform to that interface, and that interface becomes a new extension extension point, e.g. {"name": "my_new_interface_list", "configuration": {"list": [{"name": "my_new_interface_item", "configuration": {...}}...]}}

draft/ZEP0010.md

maxrjones · 2025-05-26T21:50:10Z

draft/ZEP0010.md

+    ...,
+    "extensions": [
+        {
+            "name": "example.offset",


Is the example part of this name meaningful? I think it would be useful to either define in this document what the naming conventions are or link to the relevant external convention.

Sorry, that's from that "Extensions naming" section added by ZEP9.

draft/ZEP0010.md

maxrjones · 2025-05-26T22:41:02Z

Is there anything in this proposal that motivates its restriction to Zarr specification 3 rather than both Zarr specification 2 and 3?

normanrz · 2025-05-27T07:09:29Z

Is there anything in this proposal that motivates its restriction to Zarr specification 3 rather than both Zarr specification 2 and 3?

At least from my pov, there is no desire to further evolve the v2 specification. Extensibility was one major motivation of the v3 specification. I think it would be confusing to continue evolving both.
In most cases, v2 data can be upgraded to v3 with metadata-only updates.

joshmoore · 2025-05-27T14:18:53Z

jbms (Jeremy Maitin-Shepard) 2 days ago
Can someone give an example of how order-dependent extensions might be used/specified?

I've not come up with a compelling one, @jbms. My intuition is that there would be a chance for one member of the pipeline (extA) to be able to update some state (the metadata?) before a later one (extB). Practically, though, I don't see how extA could know enough about extB to inject itself into the list at the right point. So other than "high-priority" extensions which add themselves at the beginning and "low-priority" ones which add themselves at the end, I still don't have a concrete example.

But, generally, 👍 on a general "SequenceExtension" style that others can adopt. Perhaps this speaks to a "generic extensions conventions" (or "idioms") section.

jbms · 2025-05-27T18:07:34Z

I agree with what @LDeakin said about must_understand --- must understand for writing should always implicitly be true and must_understand applies only for reading. That simplifies things nicely.

I am not in favor of using extensions as an attribute namespace for things like ome-zarr that are logically layered on top of zarr itself and don't require changes/deep integration with the zarr implementation, for several reasons:

zarr implementations intended for general use will have to provide an API for users to directly read and write extensions metadata very similar to attributes
we need to complicate must_understand to indicate reading and writing separately.
it muddies the distinction between things that change the core zarr model itself with things purely layered on top.

Instead we should add an attribute section to the zarr-extensions repo (related to the zarr conventions proposal). While technically this is a breaking change in that currently no part of the attribute namespace is reserved for registered names, in practice we could use some prefix or other naming convention for registered attributes such that conflicts with existing uses are very unlikely.

As I see it, extensions do have a high cost as far as fragmenting the ecosystem and therefore should be introduced with care, mostly for things that could reasonably be added to the core spec also.

draft/ZEP0010.md

Co-authored-by: Sanket Verma <[email protected]>

joshmoore · 2025-06-18T15:39:29Z

As mentioned in zarr-developers/zarr-specs#344 (comment), since we didn’t manage to have an in-person discussion about any of the above topics and we’re now running into holiday times, the timeline for a vote has been put on hold. Here I want to summarize the main decisions and raise a few lingering questions.

My understanding from the must_understand conversations above (0, 1, 2) is that the text of the ZEP should be updated to express that must_understand is always true for writing, e.g., an implementation will not modify a zarr.json or any of the contents of the node if an extension is active that it does not understand. There are some questions about precisely when to raise exceptions on must_understand=True (for reading) that need to be worked through.
(tl;dr: modify draft)
There’s a proposal on the table to use the extensions repo for attributes as well. Since this ZEP10 is focused on adding an element, my preference would be to not handle that here. I think that needs to have it’s own discussion perhaps with a new champion who also drives the decision on ZEP4.
(tl;dr: defer to ZEP4 or a follow-on ZEP)
Finally and most importantly, there’s a proposal on the table to move the extensions to the top-level.
1. A benefit of the array representation is that we can continue to use the same definition of the extension object from https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html#extension-definition . By moving to the top-level, we would no longer be able to do this. Are we generally ok with having differing schemas for these objects? (meaning that section would need to be updated to explain this) Or does that point to these objects actually being of a different type?
2. If we use a top-level object, we will need a mechanism to prevent top-level conflicts. Shall we prefix all extensions with ext: (or similar)? (This is essentially what the "extensions" keyword is doing) Without some clear naming mechanism, we run the risk of pushing the conflict avoidance onto the zarr-extensions managers.

d-v-b · 2025-06-18T15:48:29Z

one thing that is missing for me is a clear demonstration of an individual or group who needs the functionality added in this ZEP. As in: what, in concrete terms, is blocked by the lack of this feature? Alternatively, what onerous thing are we doing today, that we could simplify with this proposal? The PR does contain examples but they are contrived. I would appreciate real examples, i.e. references to real projects or code.

If there are none, then I guess that's fine, but I would find the contents of this proposal 100000 times easier to think about if I could see it as a solution to an acute problem.

joshmoore · 2025-06-24T13:26:30Z

Thanks for the 👍, @LDeakin, but I probably put too much in one comment 😄 So for clarity, let me spell out point 3 in an additional comment to get reactions to it:

Is there support for updating the text of ZEP10 to:

remove the extensions object;
add a prefix to keys representing such objects; and
clarify their type since it differs what is written in ZEP9?

cc: @jbms

joshmoore added 8 commits May 12, 2025 21:53

Copy of ZEP9

e83d6b4

Laptop WIP

7176586

Desktop WIP

ba35fd6

Prepare individual examples

e3dbe93

Explanations and cleanup

d41b161

Wrap up first draft

c255d3f

Apply Norman's feedback on the examples

db7ae37

Introduce 'generic extensions' nomenclature

48c319c

joshmoore mentioned this pull request May 16, 2025

ZEP10: Generic extensions (v3.2 spec changes) zarr-developers/zarr-specs#344

Open

Update spec PR number

ffda708

jbms reviewed May 16, 2025

View reviewed changes

draft/ZEP0010.md Outdated Show resolved Hide resolved

jbms reviewed May 16, 2025

View reviewed changes

draft/ZEP0010.md Show resolved Hide resolved

jbms reviewed May 16, 2025

View reviewed changes

draft/ZEP0010.md Show resolved Hide resolved

d-v-b reviewed May 16, 2025

View reviewed changes

draft/ZEP0010.md Outdated Show resolved Hide resolved

d-v-b reviewed May 16, 2025

View reviewed changes

draft/ZEP0010.md Show resolved Hide resolved

d-v-b reviewed May 16, 2025

View reviewed changes

draft/ZEP0010.md Outdated Show resolved Hide resolved

d-v-b reviewed May 16, 2025

View reviewed changes

draft/ZEP0010.md Outdated Show resolved Hide resolved

d-v-b reviewed May 16, 2025

View reviewed changes

draft/ZEP0010.md Outdated Show resolved Hide resolved

d-v-b reviewed May 16, 2025

View reviewed changes

draft/ZEP0010.md Outdated Show resolved Hide resolved

d-v-b reviewed May 16, 2025

View reviewed changes

draft/ZEP0010.md Outdated Show resolved Hide resolved

d-v-b reviewed May 16, 2025

View reviewed changes

draft/ZEP0010.md Show resolved Hide resolved

d-v-b reviewed May 16, 2025

View reviewed changes

draft/ZEP0010.md Outdated Show resolved Hide resolved

d-v-b reviewed May 16, 2025

View reviewed changes

draft/ZEP0010.md Outdated Show resolved Hide resolved

joshmoore and others added 2 commits May 16, 2025 16:07

Update ZEP0010.md

f4dea6f

Co-authored-by: Davis Bennett <[email protected]>

Update draft/ZEP0010.md

f4b3b2a

Co-authored-by: Davis Bennett <[email protected]>

LDeakin mentioned this pull request May 16, 2025

Tracking issue for Zarr spec, ZEPs, and extensions support zarrs/zarrs#191

Open

normanrz reviewed May 22, 2025

View reviewed changes

draft/ZEP0010.md Outdated Show resolved Hide resolved

joshmoore added 6 commits May 24, 2025 07:40

Intro clarification

b82d3d4

Remove opengl ref and clarify processing

a1495bb

Update multiscales example

dcc2246

Replace dataset with node

547c501

Clarify sub-nodes

dfe741b

Move extensions v attributes to the top

35087ba

maxrjones reviewed May 26, 2025

View reviewed changes

maxrjones mentioned this pull request May 27, 2025

RFC: Reformat GeoZarr as a registration of Zarr translations of well-supported open standards and extensions zarr-developers/geozarr-spec#67

Draft

9 tasks

sanketverma1704 reviewed May 28, 2025

View reviewed changes

draft/ZEP0010.md Outdated Show resolved Hide resolved

Update draft/ZEP0010.md

0cad1d1

Co-authored-by: Sanket Verma <[email protected]>

ZEP10: Generic extensions proposal #67

Are you sure you want to change the base?

ZEP10: Generic extensions proposal #67

Uh oh!

Conversation

joshmoore commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alimanfoo commented May 16, 2025

Uh oh!

Uh oh!

jbms commented May 16, 2025

Uh oh!

Uh oh!

Uh oh!

joshmoore commented May 16, 2025

Uh oh!

d-v-b commented May 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

d-v-b May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

d-v-b commented May 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jbms commented May 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

d-v-b commented May 22, 2025

Uh oh!

joshmoore commented May 24, 2025

Uh oh!

LDeakin commented May 25, 2025

Uh oh!

jbms commented May 25, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

maxrjones commented May 26, 2025

Uh oh!

normanrz commented May 27, 2025

Uh oh!

joshmoore commented May 27, 2025

Uh oh!

jbms commented May 27, 2025

Uh oh!

Uh oh!

joshmoore commented Jun 18, 2025

Uh oh!

d-v-b commented Jun 18, 2025

Uh oh!

joshmoore commented Jun 24, 2025

joshmoore commented May 16, 2025 •

edited

Loading

d-v-b May 22, 2025 •

edited

Loading