From e83d6b46c2b60bd5461652f7f8e52f35543735dc Mon Sep 17 00:00:00 2001 From: Josh Moore Date: Mon, 12 May 2025 21:53:12 +0200 Subject: [PATCH 01/18] Copy of ZEP9 --- draft/ZEP0010.md | 746 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 746 insertions(+) create mode 100644 draft/ZEP0010.md diff --git a/draft/ZEP0010.md b/draft/ZEP0010.md new file mode 100644 index 0000000..e458785 --- /dev/null +++ b/draft/ZEP0010.md @@ -0,0 +1,746 @@ +--- +layout: default +title: ZEP010 +description: This ZEP proposes a new top-level extensions container object. +parent: draft ZEPs +nav_order: 10 +--- + +# ZEP 10 — Zarr Extensions Container + +Authors: + +- [Norman Rzepka](https://github.com/normanrz), scalable minds +- [Josh Moore](https://github.com/joshmoore), German BioImaging + +Status: Draft + +Type: Specification + +Created: 2025-05-12 + +## Abstract + +The Zarr v3 specification is unclear on several matters regarding the +definition of extensions in the `zarr.json` metadata. This proposal defines +immediate clarifications for the specification which will allow the centralized +creation of "raw" names as well as the decentralized use of URIs for +extensions. At the same time, we propose a longer-term strategy for evolving +extensions and extension points for discussion and potential approval by the +community to be rolled out to implementations at a future time. + +## Introduction + +The intention of the Zarr v3 spec was to provide a framework where +community-driven extensions can be created and managed by the community itself. +However, the process for that was unclear and, therefore, implicitly linked to +the ZEP process. Issues have been surfaced by users, who wanted to create and +use new extensions. For example, users wanted to +[create new codecs](https://github.com/zarr-developers/zarr-specs/pull/256), +[data types](https://github.com/zarr-developers/zarr-specs/pull/257) and +[an extension for consolidated metadata](https://github.com/zarr-developers/zarr-specs/pull/309). +We thank the members of the community for their efforts in discovering and communicating +these issues. + +The underlying problem is that the extension mechanism for v3 lacks detail in +its definition. While several extension points are defined in the core spec, +there is no advice about selecting names for these extensions or how naming +conflicts are avoided. Additionally, there is no mechanism for defining new +extensions that do not fit into the existing extension points. In fact, there +are contradictions in what is entirely permissible, including for example +whether codecs are within or required by the core v3 spec. + +Implementations have started to use the v3 spec and are making use of +extensions (e.g `numcodecs.*` codecs and `zstd` in zarr-python), which means +that any changes made to the v3 spec need to be compatible with the current +reality in the community. + +## Proposal + +This proposal has three phases. + +The [first phase](#Phase-1-Immediate-clarifications) clarifies that extensions +can be created by the community without the ZEP process. Extensions with +URI-based names can be created without any further coordination and extensions +with raw names only need to get the name registered in the +[`zarr-extensions` Github repository](https://github.com/zarr-developers/zarr-extensions). +The process for registering will be a lightweight PR-based process. + +The naming mechanism as well as clarifications in the core spec documented will +be implemented by the ZSC through +[PR330](https://github.com/zarr-developers/zarr-specs/pull/330) in the zarr-specs repository for +discussion concurrently with this ZEP. The changes we are proposing, marked +with a "🛠️" below, are interpretations of the current Zarr v3 core spec as well +as additions that are in-line with the spec evolution policy of the Zarr core +spec. The Zarr core spec document will be bumped to version 3.1. The metadata +in the `zarr.json` files will remain unchanged `zarr_format=3`. + +The [second phase](#Phase-2-New-extension-points) defines new extension points +and therefore will follow the active ZEP process. The goal is to encourage more +exploration by the Zarr community outside of what's currently defined with the +core specification. + +The [third phase](#Phase-3-Future-name-evolution) defines a possible future +evolution of the naming strategy, largely as a justification for decisions made +in the first two phases. It is non-normative and can be further discussed in +future ZEPs. + +Revisions to ZEP 0 will be paused until we are clear about the extension +mechanism and the implications for the role of the ZEP process in the future. +The ZEP process remains in place for changes to the Zarr v3 core spec including +the adoption of extensions into the core spec. + +## Definitions + +- "Core" refers to the Zarr v3 core specification as defined in the document + and not + necessarily if something is a MUST for implementations. +- "Metadata" is the Zarr metadata as stored in `zarr.json` files for arrays and + groups. +- "Extensions" as used in this document are components used in the metadata to + define and configure how metadata are interpreted by implementations. These + components include codecs, data types, chunk key encodings, chunk grids and + storage transformers. +- "Extension points" are locations within metadata where extension-related + metadata can be found. Current extension points are listed in the core spec, + e.g. `codecs`, `data_type`. As part of the [second phase](#Phase-2-New-extension-points), + we intend to add another general-purpose extension point called [`extensions`](#Extension-points). +- "Extension maintainers" are a person or a team that is responsible for + creating and maintaining an extension. + +## Phase 1: Immediate clarifications + +### Extension naming + +🛠️ We propose defining two categories of names for immediate use by extensions: +raw names and URI-based. + +1. **Raw names** MUST be assigned within a central repository and follow the + compatibility and versioning [v3 stability + policy](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#stability-policy). + The name assignment is managed through the [`zarr-extensions` Github + repository](https://github.com/zarr-developers/zarr-extensions), where + each extension is listed and either contains a spec document or links to + a spec document. Names are never unassigned or reassigned. The ZSC or by + delegation a maintainer team reserves the right to refuse name assignment + at its own discretion. + + - **Example:** `zstd` + - **Acceptd regex:** `^[a-z0-9-_.]+$` + +2. **URI-based names** can be used by anyone without further coordination + though the assumption is that users reasonably "own" the URI. Users MAY + make use of a persistent redirecting URL like [PURL](https://purl.archive.org/). + URIs have been chosen due to their potential for being self-documenting and + *strongly recommend* that the URL SHOULD resolve to a human-readable explanation of the extension, but + implementations SHOULD NOT attempt to resolve the URL during processing. + There are no guarantees in terms of versioning or compatibility. However, + preserving backwards-compatibility is strongly encouraged. See the + [versioning section](#Versioning-and-spec-evolution) below. + + - **Example:** `https://example.com/zarr3/consolidated-metadata` + - **Accepted regex:** `^https?:\/\/[^/?#]+[^?#]*$` + + +All names currently listed in the v3 specification are raw names. URIs were +mentioned in previous drafts of the v3 specification and are still referenced +under the codecs section: + +> "Each codec must be defined via a separate specification. In order to refer +> to codecs in array metadata documents, each codec must have a unique +> identifier, which is a URI that dereferences to a human-readable +> specification of the codec." +> (). + +🛠️ This proposal would drop the MUST requirement on the dereferencing of the +URI to a SHOULD and more widely specify that implemenations MAY use URI-based +key names throughout the specification. (See +[community registry](#Community-registry) below for a proposal on coordinating the use of +URIs.) + +### Extension definitions + +Extensions are defined in the `zarr.json` metadata either as objects or as +short-hand names. + +🛠️ In order to unify the general design of future extensions we would add a +MUST requirement for future extension objects to adhere to the following +definition. + +Objects have the following keys: +```json +{ + "name": "", + "configuration": { ... } # optional +} +``` + +If such an object is present, the field `must_understand` is implicitly set to +`True`. Implementations which encounter a `must_understand` extension that they +have not implemented MUST raise an exception. +An extension object can explicitly set `must_understand=False` if +implementations can ignore its presence, following the current guidelines in +the v3 specification. + +Instead of extension objects, short-hand names MAY continue to be used if no +configuration metadata is required. They would be equivalent to extension +objects with just a `name` key. This is in-line with +[the current wording of the spec](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#id11). + +### Extension specifications + +There is no strict requirement for extensions to have a formal specification. +However, for adoption in the community it is STRONGLY RECOMMENDED to write a +specification. + +For an extension with a raw name, the +[zarr-extensions](https://github.com/zarr-developers/zarr-extensions) +repository SHOULD be used to either publish the specification +directly or link to another location which does so. For extensions with URI-based names, it +is RECOMMENDED to publish the specification under the URI of the extension. + +### Extension points + +The v3 core spec defines a number of extension points for arrays and groups +that can hold extensions following the above recommendations: + +- `data_type` (array only) +- `codecs` (array only) +- `chunk_grids` (array only) +- `chunk_key_encoding` (array only) +- `storage_transformers` (array only) // extendable to groups + +The Zarr v3 core spec was originally designed to not contain any extensions, +such as data types, codecs etc. However, over time and through changing +authorship, some extensions were, in fact, listed in +[the core spec document](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#data-types). + +🛠️ We propose to acknowledge that the Zarr v3 core spec includes a few +extensions that are expected to be supported by all implementations and fix the +wording in the core spec document accordingly. Wording in each section will be +updated to clarify whether or not an implementation `MUST` support the +extension point. Other extensions become "core" by being listed in the core +spec document through a ZEP. The use of a "raw name" does **NOT** automatically +make an extension "core". + +The following extensions are (currently) listed or referred to in the core spec +and would be declared as "core" extensions under this proposal. + + * Data types: `(u)int{8,16,32,64}`, `float{32,64}`, `complex{64,128}`, `r*` + * Codecs: `bytes`, `transpose` + * Chunk grids: `regular` + * Chunk key encoding: `default`, `v2` + * Storage transformers: (none) + +### Example + +The following example represents an Array showing many of the proposed changes +described above: + +```javascript +{ + "zarr_format": 3, + "data_type": "https://example.com/zarr/string", // URI-based name, short-hand name + "chunk_key_encoding": { + "name": "default", // core + "configuration": { "separator": "." } + }, + "codecs": [ + { + "name": "https://numcodecs.dev/vlen-utf8" // URI-based name + }, + { + "name": "zstd", // raw name + "configuration": { ... } + } + ], + "chunk_grid": { + "name": "regular", // core + "configuration": { "chunk_shape": [ 32 ] } + }, + "shape": [ 128 ], + "dimension_names": [ "x" ], + "attributes": { ... }, + "storage_transformers": [] +} + +``` + +### Discussion + +#### Community registry + +Externally to this ZEP, we will work towards unification of these extension +points. We propose that community register and discuss their extensions on the +[zarr-extensions](https://github.com/zarr-developers/zarr-extensions) repository. +Eventually, we recommend that a maturity ranking be included in those listings +as in other plugin ecosystems. + +#### Reassigning or unassigning names + +We consider it a design goal to not allow datasets written with one set of +naming expectations to be unintentionally interpreted with *other* names by a +future version of an implementation. Without further mechanics, this means that +raw and URI-based names, once assigned, cannot be changed. (For a possible +evolution of this naming scheme, see [phase 3](#Phase-3-name-evolution) below.) +With this limitation, current implementations either already or quickly can be +updated to support the above proposal since no additional logic is necessary +beyond checking `must_understand`. + +#### Name assignment + +Raw names will be assigned through the [`zarr-extension`](https://github.com/zarr-developers/zarr-extensions) repository. While the Zarr steering council will initially maintain +this repository, it is intended that a community team will be formed to maintain +the repository long-term. + +An alternative would be to use the `zarr-specs` repository to deposit spec +documents for every assigned name. The process would be that extension +maintainers would open a PR (with a template) to register their desired name. +Extension maintainers could choose to publish their extensions spec in the +repository directly or link to an externally hosted spec. Updates of specs +hosted in `zarr-specs`, would be updated through PRs. A zarr-specs maintainer +team would be required to ensure a timely assignment of names and updates to +the specs. + +A third option would be to only store the names of the extensions in +`zarr-specs` repository and always link out to an externally hosted spec. In +that case, externally hosted could also be another repo under `zarr-developers` +that manages some extensions through its own maintenance structure. + +#### Review process + +To register an extension with a raw name, a community member needs to open a new +PR in the [zarr-extensions](https://github.com/zarr-developers/zarr-extensions) +repository. + +Each extension MUST have a README.md file that describes the extension and its metadata specification. Extensions SHOULD have a schema.json file that contains the JSON schema for the metadata, if the README.md does not provide a link to an external schema. Please note that all extensions documents will be licensed under the Creative Commons Attribution 3.0 Unported License. Only open a PR if you are willing to license your extension under this license. + +The PR will be reviewed by the Zarr steering council. We aim to be very open about registering extensions. The review will be done largely based on avoiding confusing extension names and preventing malicious activity as well as maintaining the formal requirements of the extensions. Extension maintainers are responsible for their extensions. Updates to the extensions will also be reviewed by the steering council. + +## Phase 2: New extension points + +Beyond the immediate clarifications that are outlined in phase 1, we believe +that additional extension points will further improve the Zarr v3 +specification. Since these introduce new interfaces that implementations SHOULD +be aware, we discuss them here for consideration as a ZEP. + +#### Using `must_understand` + +The v3 core spec allows for the addition of new metadata keys as part of spec +evolution. This is defined in the +["Extension Points" section](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#extension-points) +of the core spec. Currently, implementations MUST parse the entire metadata and +fail, if they find keys they cannot parse unless those are marked by +`must_understand=False`. We propose using this mechanism to evolve the spec and +add the keys that are necessary to achieve a well-defined extension mechanism. + +New keys in the metadata MAY contain objects that have a `must_understand` key +with value `false`. In that case, the key may be ignored by implementations +that cannot parse it. This is useful for extensions that aren't strictly +required for interaction with the data. If the new key holds a scalar value or +an array or doesn't not contain the `must_understand` key, it is implictily +`must_understand=True`. In that case, implementations MUST fail if they cannot +parse the key. + +### "extensions" extension point + +🛠️ To provide for more flexible, immediate, and de-centralized use cases, we +propose to also add another general-purpose extension point `extensions` on +both arrays and groups into which extensions MAY be added. + +The `extensions` object holds an array of extension definitions. The held array +MUST either have one or more extensions or the object MUST be omitted entirely. + +The key itself is implicitly `must_understand=True`. Implementations MAY set +`must_understand=False` if they can reliably determine that all extensions are +also `must_understand=False`. + +### Additional extension points + +We support the creation of additional extension points in the future but their +introduction should follow the ZEP process. In general, the overall number of +"core" extension points should be well-maintained and provide clear APIs which +can be implemented by a large number of libraries. ZEPs are the appropriate +mechanism to encourage a wide variety of opinions and consensus building. + +### Versioning and spec evolution + +We propose leaving the versioning of the core spec unchanged. That means the +value of `zarr_format=3` and new keys to the metadata MUST be understood by +implementations, i.e. they MUST fail if they find a key they don't know, and +all changes to the core spec MUST go through the ZEP process. + +Extensions SHOULD follow the +[stability policy defined in the Zarr v3 core spec](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#stability-policy). +However, this is not a strict requirement and extension maintainers can define +their own evolution processes. + +It is recommended that extensions evolve in a backwards-compatible manner +without explicitly stored versions, meaning (credit to Jeremy Maitin-Shepard): + +- Any metadata compatible with a previous version of the extension continues to + be correctly interpreted by implementations of the new version. +- New metadata written according to the new version of the extension either: + (a) is correctly interpreted by existing implementations of previous versions of the extension, or + (b) causes existing implementations of previous versions of the extension to report an error and not load it. +- New metadata written according to the new version of the extension MUST not + be successfully loaded by existing implementations of previous versions of + the extension with an incorrect interpretation. + +While it is recommended to maximize backwards-compatible changes, it is also +possible to evolve the extension in an intentionally backwards-incompatible +way, e.g.: + +- Choose a new name, e.g. append a version number to the existing name. +- Add a new key to the configuration (like `version`) that was disallowed by + the previous version of the extension, such that existing implementations + will fail when they encounter it. + +### Example + +The following example represents an Array showing many of the proposed changes +described above: + +```javascript +{ + "zarr_format": 3, + ..., + "extensions": [ // new general-purpose extension point + { + "name": "https://example.com/zarr/offset", // uri-based name + "configuration": { "offset": [ 12 ] } + }, + { + "name": "https://example.com/zarr/array-statistics", // uri-based name + "configuration": { + "min": 5, + "max": 12 + }, + "must_understand": false // optional extension + }, + { + "name": "https://example.com/zarr/consolidated-metadata", // uri-based name + "configuration": { ... } + "must_understand": false // optional extension + } + ], +} + +``` + +### Discussion + +#### Alternatives for the `extensions` extension point + +This proposal contains a new general-purpose extension point `extensions`, +which holds an array of extensions. This design allows to have the same +extension definition syntax across all extension points. It also avoids using +URIs as keys in JSON metadata and reduces pollution of the top-level namespace +in a `zarr.json`. Thus, the addition of top-level metadata keys remains +reserved to changes in the core spec. This MAY happen as part of the core spec +adopting functionality of an extension. + +There are alternative designs to be considered: + +##### Top-level metadata keys + +Instead of a general-purpose extension point, we could also add new top-level +extension keys to the metadata. + +```javascript +{ + "zarr_format": 3, + ... + "https://example.com/zarr/offset": { "offset": [ 12 ] }, + "https://example.com/zarr/array-statistics": { + "min": 5, + "max": 12 + }, + "https://example.com/zarr/consolidated-metadata": { + "must_understand": false, + ... + }, // optional extension + ... +} +``` + +In this case, there would be no explicit `configuration` key within an +extension definition, but instead all the keys of such a configuration would be +in the object itself. This would mean that there are two separate types of +extension definitions, i.e. `{"name":"", "configuration": {...}}` in +specialized extension points (e.g. `codecs`) and `"": {...}` for other +extensions. + +It would still be recommended to use an object with keys for the extension +definition to allow for evolution of the extension. + +In case an extension becomes adopted into the core spec, implementations +wouldn't need to be changed (only when changing the name from URI-based to raw +name). + +There has been some controversy about using URLs as keys in JSON metadata. +However, it has also been used effectively in formats such as JSON-LD (see +below). + +##### `extensions` object + +Instead of an array that holds the extension definitions, we could also use an object. + +```javascript +{ + "zarr_format": 3, + ... + "extensions": { + "https://example.com/zarr/offset": { "offset": [ 12 ] }, + "https://example.com/zarr/array-statistics": { + "min": 5, + "max": 12 + }, + "https://example.com/zarr/consolidated-metadata": { + "must_understand": false, + ... + } // optional extension + }, + ... +} +``` + +This alternative is similar to the top-level keys, with mostly the same implications. + +However, this alternative would reserve the top-level namespace for changes to +the core spec and, therefore, reduce pollution of the top-level namespace. + + +#### Applicaton to subnodes + +Conceptually, we propose that extensions defined on groups should be valid for +their child nodes. However, the details of how an implementation should +identify which extensions are active within an hierarchy are unclear. Relying +on traversing the hierarchy towards the root node is undesirable from a +performance point of view. By writing *some* metadata within the contained +subgroups and arrays this could be made easier. Options for what this metadata +could be include: + +1. A copy of the metadata + +```javascript +{ + "extensions": [ + { + "name": "https://example.com/my-extension", + "configuration": { ... full copy of the metadata ...} + } + ] + +} +``` + +2. A reference to the metadata as part of the extension itself + +```javascript +{ + "extensions": [ + { + "name": "https://example.com/my-extension", + "configuration": { + "reference": "../.." + } + } + ] + +} +``` + +3. A complimentary reference extension + +```javascript +{ + "extensions": [ + { + "name": "https://example.com/my-extension-ref", + "configuration": { + "reference": "../.." + } + } + ] + +} +``` + +4. A shared or even core reference extension + +```javascript +{ + "extensions": [ + { + "name": "https://zarr.dev/extensions/parent-ref", + "configuration": { + "reference": "../.." + } + } + ] +} +``` + +## Phase 3: Future name evolution + +The phase 1 proposal above prioritizes changes that can be made to the +specification in line with the current text, aware that there are +contradictions and a lack of clarity. + +It does not provide, however, a mechanism for evolving the assigned names. + +An extension that begins life as a URI that eventually migrates to having a raw +name will require implementations to be updated to check for both values. + +This section outlines why choices above were made (e.g., use of full URIs +rather than a shorter alternative) as the basis for a possible pathway to +evolving the naming strategy. These changes **would** require updating +implementations and therefore would require an additional ZEP but would be +backwards compatible with the clarifications above. + +### URIs everywhere + +By having a mechanism which maps all extension names to a global URI, the two +names (URI and raw) associated with an extensinon. In fact, the original name +assigned to the extension becomes its permanent internal representation even if +it can later be referred to by a shorter name. + +To demonstrate, under this representation the example above could be made +equivalent to: + +```javascript +{ + "https://zarr.dev/v3/array/data_type": "https://example.com/zarr/string", + "https://zarr.dev/v3/array/chunk_key_encoding": { + "name": "https://zarr.dev/v3/chunk_key_encodings/default", + }, + "https://zarr.dev/v3/array/codecs": [ + { + "name": "https://numcodecs.dev/vlen-utf8" + }, + { + "name": "https://zarr.dev/v3/codecs/zstd", + } + ], + "https://zarr.dev/v3/array/chunk_grid": { + "name": "https://zarr.dev/v3/chunk_grid/regular", + }, + "https://zarr.dev/v3/extensions": + { + "name": "https://example.com/zarr/offset", + }, + { + "name": "https://example.com/zarr/array-statistics", + }, + { + "name": "https://example.com/zarr/consolidated-metadata", + } + }, + "https://zarr.dev/v3/array/dimension_names": [ "x" ], + "https://zarr.dev/v3/attributes": { ... }, + "https://zarr.dev/v3/storage_transformers": [] +} +``` + +where all identifiers are now prefixed with a URI full scoping the value, +prefixed in this example with `https://zarr.dev/v3/`. The +specification would clearly list which prefix applies to all "raw" names within +the specification and any new "raw" names which are brought into the +specification **might** exist within URIs outside of `https://zarr.dev/` . + +### Implicit Naming Context + +For all intents and purposes, the immediate proposal defined in this document +defines a single, unversioned global naming context owned by the Zarr +organization which can provide such a mapping. This can be seen as a JSON +document of the form: + +``` +"@context": { + "bool": "https://zarr.dev/bool", // used for brevity and doesn't represent the final URI + ...etc... +} + +``` +This context is not written into each Zarr document but is implied. + +### Explicit Naming Context + +Through the ZEP process, we could make it possible to define an **explicit** +context within a zarr.json file which would override the default, unchaning +naming context. This would let us in the future slowly correct raw names as +necessary without a full breaking change to the format. + +``` +"@context": "https://zarr.dev/context/v3.1" +``` + +This explicit context reference would be written in the zarr.json itself. + + +### Relationship with JSON-LD + +Readers familiar with [JSON-LD](https://json-ld.org/) will recognize the +"@context" definition: + +- Each JSON-LD file MAY have a "@context" key which defines how name resolution + works. With that it's possible to either load a remote context field, or + define namespaces and fields inline. (For Zarr naming resolution, we would + likely want to avoid loading remote resources in favor of well-known contexts + which are cached within the implementations.) +- All keys within the JSON-LD are then resolved against this context. + *EVERYTHING* is a URI but can alternatively be referred to as + "https://example.com/field", "example:field" or for the default namespace + "field" + + +#### Prefixes and aliases + +As an existing and well-defined resolution process, other features of the +JSON-LD naming mechanism might be worth considering for future adoption. For +example, prefixes can be defined within the context, whether implicit or +explicit, which prevents the need to use the full URI. For example, with the +context: + +```javascript +"@context": { + "example": "https://example.com/some-prefix/", +``` + +```javascript +{ + "zarr_format": 3, + "data_type": "https://example.com/some-prefix/utf-16-string", +``` +becomes: +```javascript +{ + "zarr_format": 3, + "data_type": "example:utf-16-string", +``` + +A more advanced feature might be to introduce a short name inline with an +explicit context such that all uses of a given string can be replaced: + + +``` +"@context": { + "shrt": "https://example.com/some-prefix/long-name", +}, +... +"shrt": values +... +``` + +### Internal implications + +Under the hood, all name resolutions whether via implicit, explicit, or inline +context would convert identifer names to a URI which implementations could +reliably use for determining support. Once a URI is assigned, however, it +should never be changed. + + +## Copyright + +This proposal is licensed under [the Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). From 7176586dfe15f1cbca05c66e27669648be24782f Mon Sep 17 00:00:00 2001 From: Josh Moore Date: Mon, 12 May 2025 22:31:42 +0200 Subject: [PATCH 02/18] Laptop WIP --- draft/ZEP0010.md | 507 ++++------------------------------------------- 1 file changed, 38 insertions(+), 469 deletions(-) diff --git a/draft/ZEP0010.md b/draft/ZEP0010.md index e458785..1a127eb 100644 --- a/draft/ZEP0010.md +++ b/draft/ZEP0010.md @@ -21,326 +21,19 @@ Created: 2025-05-12 ## Abstract -The Zarr v3 specification is unclear on several matters regarding the -definition of extensions in the `zarr.json` metadata. This proposal defines -immediate clarifications for the specification which will allow the centralized -creation of "raw" names as well as the decentralized use of URIs for -extensions. At the same time, we propose a longer-term strategy for evolving -extensions and extension points for discussion and potential approval by the -community to be rolled out to implementations at a future time. ## Introduction -The intention of the Zarr v3 spec was to provide a framework where -community-driven extensions can be created and managed by the community itself. -However, the process for that was unclear and, therefore, implicitly linked to -the ZEP process. Issues have been surfaced by users, who wanted to create and -use new extensions. For example, users wanted to -[create new codecs](https://github.com/zarr-developers/zarr-specs/pull/256), -[data types](https://github.com/zarr-developers/zarr-specs/pull/257) and -[an extension for consolidated metadata](https://github.com/zarr-developers/zarr-specs/pull/309). -We thank the members of the community for their efforts in discovering and communicating -these issues. - -The underlying problem is that the extension mechanism for v3 lacks detail in -its definition. While several extension points are defined in the core spec, -there is no advice about selecting names for these extensions or how naming -conflicts are avoided. Additionally, there is no mechanism for defining new -extensions that do not fit into the existing extension points. In fact, there -are contradictions in what is entirely permissible, including for example -whether codecs are within or required by the core v3 spec. - -Implementations have started to use the v3 spec and are making use of -extensions (e.g `numcodecs.*` codecs and `zstd` in zarr-python), which means -that any changes made to the v3 spec need to be compatible with the current -reality in the community. +Zarr specification version 3 currently defines four extension points, each +associated with a specific (array) metadata field. Additional extension points +may be added by future ZEPs. Until that time, however, third-parties may want +to add arbitrary extension objects to either arrays or groups. This proposal +introduces a top-level "extensions" field that serves as a container for such a +list of extensions. ## Proposal -This proposal has three phases. - -The [first phase](#Phase-1-Immediate-clarifications) clarifies that extensions -can be created by the community without the ZEP process. Extensions with -URI-based names can be created without any further coordination and extensions -with raw names only need to get the name registered in the -[`zarr-extensions` Github repository](https://github.com/zarr-developers/zarr-extensions). -The process for registering will be a lightweight PR-based process. - -The naming mechanism as well as clarifications in the core spec documented will -be implemented by the ZSC through -[PR330](https://github.com/zarr-developers/zarr-specs/pull/330) in the zarr-specs repository for -discussion concurrently with this ZEP. The changes we are proposing, marked -with a "🛠️" below, are interpretations of the current Zarr v3 core spec as well -as additions that are in-line with the spec evolution policy of the Zarr core -spec. The Zarr core spec document will be bumped to version 3.1. The metadata -in the `zarr.json` files will remain unchanged `zarr_format=3`. - -The [second phase](#Phase-2-New-extension-points) defines new extension points -and therefore will follow the active ZEP process. The goal is to encourage more -exploration by the Zarr community outside of what's currently defined with the -core specification. - -The [third phase](#Phase-3-Future-name-evolution) defines a possible future -evolution of the naming strategy, largely as a justification for decisions made -in the first two phases. It is non-normative and can be further discussed in -future ZEPs. - -Revisions to ZEP 0 will be paused until we are clear about the extension -mechanism and the implications for the role of the ZEP process in the future. -The ZEP process remains in place for changes to the Zarr v3 core spec including -the adoption of extensions into the core spec. - -## Definitions - -- "Core" refers to the Zarr v3 core specification as defined in the document - and not - necessarily if something is a MUST for implementations. -- "Metadata" is the Zarr metadata as stored in `zarr.json` files for arrays and - groups. -- "Extensions" as used in this document are components used in the metadata to - define and configure how metadata are interpreted by implementations. These - components include codecs, data types, chunk key encodings, chunk grids and - storage transformers. -- "Extension points" are locations within metadata where extension-related - metadata can be found. Current extension points are listed in the core spec, - e.g. `codecs`, `data_type`. As part of the [second phase](#Phase-2-New-extension-points), - we intend to add another general-purpose extension point called [`extensions`](#Extension-points). -- "Extension maintainers" are a person or a team that is responsible for - creating and maintaining an extension. - -## Phase 1: Immediate clarifications - -### Extension naming - -🛠️ We propose defining two categories of names for immediate use by extensions: -raw names and URI-based. - -1. **Raw names** MUST be assigned within a central repository and follow the - compatibility and versioning [v3 stability - policy](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#stability-policy). - The name assignment is managed through the [`zarr-extensions` Github - repository](https://github.com/zarr-developers/zarr-extensions), where - each extension is listed and either contains a spec document or links to - a spec document. Names are never unassigned or reassigned. The ZSC or by - delegation a maintainer team reserves the right to refuse name assignment - at its own discretion. - - - **Example:** `zstd` - - **Acceptd regex:** `^[a-z0-9-_.]+$` - -2. **URI-based names** can be used by anyone without further coordination - though the assumption is that users reasonably "own" the URI. Users MAY - make use of a persistent redirecting URL like [PURL](https://purl.archive.org/). - URIs have been chosen due to their potential for being self-documenting and - *strongly recommend* that the URL SHOULD resolve to a human-readable explanation of the extension, but - implementations SHOULD NOT attempt to resolve the URL during processing. - There are no guarantees in terms of versioning or compatibility. However, - preserving backwards-compatibility is strongly encouraged. See the - [versioning section](#Versioning-and-spec-evolution) below. - - - **Example:** `https://example.com/zarr3/consolidated-metadata` - - **Accepted regex:** `^https?:\/\/[^/?#]+[^?#]*$` - - -All names currently listed in the v3 specification are raw names. URIs were -mentioned in previous drafts of the v3 specification and are still referenced -under the codecs section: - -> "Each codec must be defined via a separate specification. In order to refer -> to codecs in array metadata documents, each codec must have a unique -> identifier, which is a URI that dereferences to a human-readable -> specification of the codec." -> (). - -🛠️ This proposal would drop the MUST requirement on the dereferencing of the -URI to a SHOULD and more widely specify that implemenations MAY use URI-based -key names throughout the specification. (See -[community registry](#Community-registry) below for a proposal on coordinating the use of -URIs.) - -### Extension definitions - -Extensions are defined in the `zarr.json` metadata either as objects or as -short-hand names. - -🛠️ In order to unify the general design of future extensions we would add a -MUST requirement for future extension objects to adhere to the following -definition. - -Objects have the following keys: -```json -{ - "name": "", - "configuration": { ... } # optional -} -``` - -If such an object is present, the field `must_understand` is implicitly set to -`True`. Implementations which encounter a `must_understand` extension that they -have not implemented MUST raise an exception. -An extension object can explicitly set `must_understand=False` if -implementations can ignore its presence, following the current guidelines in -the v3 specification. - -Instead of extension objects, short-hand names MAY continue to be used if no -configuration metadata is required. They would be equivalent to extension -objects with just a `name` key. This is in-line with -[the current wording of the spec](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#id11). - -### Extension specifications - -There is no strict requirement for extensions to have a formal specification. -However, for adoption in the community it is STRONGLY RECOMMENDED to write a -specification. - -For an extension with a raw name, the -[zarr-extensions](https://github.com/zarr-developers/zarr-extensions) -repository SHOULD be used to either publish the specification -directly or link to another location which does so. For extensions with URI-based names, it -is RECOMMENDED to publish the specification under the URI of the extension. - -### Extension points - -The v3 core spec defines a number of extension points for arrays and groups -that can hold extensions following the above recommendations: - -- `data_type` (array only) -- `codecs` (array only) -- `chunk_grids` (array only) -- `chunk_key_encoding` (array only) -- `storage_transformers` (array only) // extendable to groups - -The Zarr v3 core spec was originally designed to not contain any extensions, -such as data types, codecs etc. However, over time and through changing -authorship, some extensions were, in fact, listed in -[the core spec document](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#data-types). - -🛠️ We propose to acknowledge that the Zarr v3 core spec includes a few -extensions that are expected to be supported by all implementations and fix the -wording in the core spec document accordingly. Wording in each section will be -updated to clarify whether or not an implementation `MUST` support the -extension point. Other extensions become "core" by being listed in the core -spec document through a ZEP. The use of a "raw name" does **NOT** automatically -make an extension "core". - -The following extensions are (currently) listed or referred to in the core spec -and would be declared as "core" extensions under this proposal. - - * Data types: `(u)int{8,16,32,64}`, `float{32,64}`, `complex{64,128}`, `r*` - * Codecs: `bytes`, `transpose` - * Chunk grids: `regular` - * Chunk key encoding: `default`, `v2` - * Storage transformers: (none) - -### Example - -The following example represents an Array showing many of the proposed changes -described above: - -```javascript -{ - "zarr_format": 3, - "data_type": "https://example.com/zarr/string", // URI-based name, short-hand name - "chunk_key_encoding": { - "name": "default", // core - "configuration": { "separator": "." } - }, - "codecs": [ - { - "name": "https://numcodecs.dev/vlen-utf8" // URI-based name - }, - { - "name": "zstd", // raw name - "configuration": { ... } - } - ], - "chunk_grid": { - "name": "regular", // core - "configuration": { "chunk_shape": [ 32 ] } - }, - "shape": [ 128 ], - "dimension_names": [ "x" ], - "attributes": { ... }, - "storage_transformers": [] -} - -``` - -### Discussion - -#### Community registry - -Externally to this ZEP, we will work towards unification of these extension -points. We propose that community register and discuss their extensions on the -[zarr-extensions](https://github.com/zarr-developers/zarr-extensions) repository. -Eventually, we recommend that a maturity ranking be included in those listings -as in other plugin ecosystems. - -#### Reassigning or unassigning names - -We consider it a design goal to not allow datasets written with one set of -naming expectations to be unintentionally interpreted with *other* names by a -future version of an implementation. Without further mechanics, this means that -raw and URI-based names, once assigned, cannot be changed. (For a possible -evolution of this naming scheme, see [phase 3](#Phase-3-name-evolution) below.) -With this limitation, current implementations either already or quickly can be -updated to support the above proposal since no additional logic is necessary -beyond checking `must_understand`. - -#### Name assignment - -Raw names will be assigned through the [`zarr-extension`](https://github.com/zarr-developers/zarr-extensions) repository. While the Zarr steering council will initially maintain -this repository, it is intended that a community team will be formed to maintain -the repository long-term. - -An alternative would be to use the `zarr-specs` repository to deposit spec -documents for every assigned name. The process would be that extension -maintainers would open a PR (with a template) to register their desired name. -Extension maintainers could choose to publish their extensions spec in the -repository directly or link to an externally hosted spec. Updates of specs -hosted in `zarr-specs`, would be updated through PRs. A zarr-specs maintainer -team would be required to ensure a timely assignment of names and updates to -the specs. - -A third option would be to only store the names of the extensions in -`zarr-specs` repository and always link out to an externally hosted spec. In -that case, externally hosted could also be another repo under `zarr-developers` -that manages some extensions through its own maintenance structure. - -#### Review process - -To register an extension with a raw name, a community member needs to open a new -PR in the [zarr-extensions](https://github.com/zarr-developers/zarr-extensions) -repository. - -Each extension MUST have a README.md file that describes the extension and its metadata specification. Extensions SHOULD have a schema.json file that contains the JSON schema for the metadata, if the README.md does not provide a link to an external schema. Please note that all extensions documents will be licensed under the Creative Commons Attribution 3.0 Unported License. Only open a PR if you are willing to license your extension under this license. - -The PR will be reviewed by the Zarr steering council. We aim to be very open about registering extensions. The review will be done largely based on avoiding confusing extension names and preventing malicious activity as well as maintaining the formal requirements of the extensions. Extension maintainers are responsible for their extensions. Updates to the extensions will also be reviewed by the steering council. - -## Phase 2: New extension points - -Beyond the immediate clarifications that are outlined in phase 1, we believe -that additional extension points will further improve the Zarr v3 -specification. Since these introduce new interfaces that implementations SHOULD -be aware, we discuss them here for consideration as a ZEP. - -#### Using `must_understand` - -The v3 core spec allows for the addition of new metadata keys as part of spec -evolution. This is defined in the -["Extension Points" section](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#extension-points) -of the core spec. Currently, implementations MUST parse the entire metadata and -fail, if they find keys they cannot parse unless those are marked by -`must_understand=False`. We propose using this mechanism to evolve the spec and -add the keys that are necessary to achieve a well-defined extension mechanism. - -New keys in the metadata MAY contain objects that have a `must_understand` key -with value `false`. In that case, the key may be ignored by implementations -that cannot parse it. This is useful for extensions that aren't strictly -required for interaction with the data. If the new key holds a scalar value or -an array or doesn't not contain the `must_understand` key, it is implictily -`must_understand=True`. In that case, implementations MUST fail if they cannot -parse the key. +TODO - top matter ### "extensions" extension point @@ -396,6 +89,35 @@ way, e.g.: the previous version of the extension, such that existing implementations will fail when they encounter it. +### Definition and naming + +Each extension object will follow the rules laid out by ZEP0009 + +### Voting + +As such, this ZEP will follow the active ZEP process. The goal is to encourage more +exploration by the Zarr community outside of what's currently defined with the +core specification. + + +#### Using `must_understand` (TODO: remove) + +The v3 core spec allows for the addition of new metadata keys as part of spec +evolution. This is defined in the +["Extension Points" section](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#extension-points) +of the core spec. Currently, implementations MUST parse the entire metadata and +fail, if they find keys they cannot parse unless those are marked by +`must_understand=False`. We propose using this mechanism to evolve the spec and +add the keys that are necessary to achieve a well-defined extension mechanism. + +New keys in the metadata MAY contain objects that have a `must_understand` key +with value `false`. In that case, the key may be ignored by implementations +that cannot parse it. This is useful for extensions that aren't strictly +required for interaction with the data. If the new key holds a scalar value or +an array or doesn't not contain the `must_understand` key, it is implictily +`must_understand=True`. In that case, implementations MUST fail if they cannot +parse the key. + ### Example The following example represents an Array showing many of the proposed changes @@ -582,163 +304,10 @@ could be include: } ``` -## Phase 3: Future name evolution - -The phase 1 proposal above prioritizes changes that can be made to the -specification in line with the current text, aware that there are -contradictions and a lack of clarity. - -It does not provide, however, a mechanism for evolving the assigned names. - -An extension that begins life as a URI that eventually migrates to having a raw -name will require implementations to be updated to check for both values. - -This section outlines why choices above were made (e.g., use of full URIs -rather than a shorter alternative) as the basis for a possible pathway to -evolving the naming strategy. These changes **would** require updating -implementations and therefore would require an additional ZEP but would be -backwards compatible with the clarifications above. - -### URIs everywhere - -By having a mechanism which maps all extension names to a global URI, the two -names (URI and raw) associated with an extensinon. In fact, the original name -assigned to the extension becomes its permanent internal representation even if -it can later be referred to by a shorter name. - -To demonstrate, under this representation the example above could be made -equivalent to: - -```javascript -{ - "https://zarr.dev/v3/array/data_type": "https://example.com/zarr/string", - "https://zarr.dev/v3/array/chunk_key_encoding": { - "name": "https://zarr.dev/v3/chunk_key_encodings/default", - }, - "https://zarr.dev/v3/array/codecs": [ - { - "name": "https://numcodecs.dev/vlen-utf8" - }, - { - "name": "https://zarr.dev/v3/codecs/zstd", - } - ], - "https://zarr.dev/v3/array/chunk_grid": { - "name": "https://zarr.dev/v3/chunk_grid/regular", - }, - "https://zarr.dev/v3/extensions": - { - "name": "https://example.com/zarr/offset", - }, - { - "name": "https://example.com/zarr/array-statistics", - }, - { - "name": "https://example.com/zarr/consolidated-metadata", - } - }, - "https://zarr.dev/v3/array/dimension_names": [ "x" ], - "https://zarr.dev/v3/attributes": { ... }, - "https://zarr.dev/v3/storage_transformers": [] -} -``` - -where all identifiers are now prefixed with a URI full scoping the value, -prefixed in this example with `https://zarr.dev/v3/`. The -specification would clearly list which prefix applies to all "raw" names within -the specification and any new "raw" names which are brought into the -specification **might** exist within URIs outside of `https://zarr.dev/` . - -### Implicit Naming Context - -For all intents and purposes, the immediate proposal defined in this document -defines a single, unversioned global naming context owned by the Zarr -organization which can provide such a mapping. This can be seen as a JSON -document of the form: - -``` -"@context": { - "bool": "https://zarr.dev/bool", // used for brevity and doesn't represent the final URI - ...etc... -} - -``` -This context is not written into each Zarr document but is implied. - -### Explicit Naming Context - -Through the ZEP process, we could make it possible to define an **explicit** -context within a zarr.json file which would override the default, unchaning -naming context. This would let us in the future slowly correct raw names as -necessary without a full breaking change to the format. - -``` -"@context": "https://zarr.dev/context/v3.1" -``` - -This explicit context reference would be written in the zarr.json itself. - - -### Relationship with JSON-LD - -Readers familiar with [JSON-LD](https://json-ld.org/) will recognize the -"@context" definition: - -- Each JSON-LD file MAY have a "@context" key which defines how name resolution - works. With that it's possible to either load a remote context field, or - define namespaces and fields inline. (For Zarr naming resolution, we would - likely want to avoid loading remote resources in favor of well-known contexts - which are cached within the implementations.) -- All keys within the JSON-LD are then resolved against this context. - *EVERYTHING* is a URI but can alternatively be referred to as - "https://example.com/field", "example:field" or for the default namespace - "field" - - -#### Prefixes and aliases - -As an existing and well-defined resolution process, other features of the -JSON-LD naming mechanism might be worth considering for future adoption. For -example, prefixes can be defined within the context, whether implicit or -explicit, which prevents the need to use the full URI. For example, with the -context: - -```javascript -"@context": { - "example": "https://example.com/some-prefix/", -``` - -```javascript -{ - "zarr_format": 3, - "data_type": "https://example.com/some-prefix/utf-16-string", -``` -becomes: -```javascript -{ - "zarr_format": 3, - "data_type": "example:utf-16-string", -``` - -A more advanced feature might be to introduce a short name inline with an -explicit context such that all uses of a given string can be replaced: - - -``` -"@context": { - "shrt": "https://example.com/some-prefix/long-name", -}, -... -"shrt": values -... -``` +## Changelog -### Internal implications + - 2025-05-12: Migrate phase 2 of the original ZEP9 -Under the hood, all name resolutions whether via implicit, explicit, or inline -context would convert identifer names to a URI which implementations could -reliably use for determining support. Once a URI is assigned, however, it -should never be changed. ## Copyright From ba35fd639033881cc7b448c3ab08814cf02e1927 Mon Sep 17 00:00:00 2001 From: Josh Moore Date: Tue, 13 May 2025 23:50:58 +0200 Subject: [PATCH 03/18] Desktop WIP --- draft/ZEP0010.md | 80 ++++++++---------------------------------------- 1 file changed, 12 insertions(+), 68 deletions(-) diff --git a/draft/ZEP0010.md b/draft/ZEP0010.md index 1a127eb..180ef36 100644 --- a/draft/ZEP0010.md +++ b/draft/ZEP0010.md @@ -1,7 +1,7 @@ --- layout: default title: ZEP010 -description: This ZEP proposes a new top-level extensions container object. +description: This ZEP proposes a new top-level extensions container field. parent: draft ZEPs nav_order: 10 --- @@ -33,61 +33,14 @@ list of extensions. ## Proposal -TODO - top matter - -### "extensions" extension point - -🛠️ To provide for more flexible, immediate, and de-centralized use cases, we -propose to also add another general-purpose extension point `extensions` on +To provide for more flexible, immediate, and de-centralized use cases, we +propose to add a general-purpose extension point `extensions` on both arrays and groups into which extensions MAY be added. -The `extensions` object holds an array of extension definitions. The held array -MUST either have one or more extensions or the object MUST be omitted entirely. - -The key itself is implicitly `must_understand=True`. Implementations MAY set -`must_understand=False` if they can reliably determine that all extensions are -also `must_understand=False`. - -### Additional extension points - -We support the creation of additional extension points in the future but their -introduction should follow the ZEP process. In general, the overall number of -"core" extension points should be well-maintained and provide clear APIs which -can be implemented by a large number of libraries. ZEPs are the appropriate -mechanism to encourage a wide variety of opinions and consensus building. - -### Versioning and spec evolution +The `extensions` field if present MUST contain an array of extension +definitions. The held array MUST either have one or more extensions or the +object MUST be omitted entirely. -We propose leaving the versioning of the core spec unchanged. That means the -value of `zarr_format=3` and new keys to the metadata MUST be understood by -implementations, i.e. they MUST fail if they find a key they don't know, and -all changes to the core spec MUST go through the ZEP process. - -Extensions SHOULD follow the -[stability policy defined in the Zarr v3 core spec](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#stability-policy). -However, this is not a strict requirement and extension maintainers can define -their own evolution processes. - -It is recommended that extensions evolve in a backwards-compatible manner -without explicitly stored versions, meaning (credit to Jeremy Maitin-Shepard): - -- Any metadata compatible with a previous version of the extension continues to - be correctly interpreted by implementations of the new version. -- New metadata written according to the new version of the extension either: - (a) is correctly interpreted by existing implementations of previous versions of the extension, or - (b) causes existing implementations of previous versions of the extension to report an error and not load it. -- New metadata written according to the new version of the extension MUST not - be successfully loaded by existing implementations of previous versions of - the extension with an incorrect interpretation. - -While it is recommended to maximize backwards-compatible changes, it is also -possible to evolve the extension in an intentionally backwards-incompatible -way, e.g.: - -- Choose a new name, e.g. append a version number to the existing name. -- Add a new key to the configuration (like `version`) that was disallowed by - the previous version of the extension, such that existing implementations - will fail when they encounter it. ### Definition and naming @@ -95,28 +48,19 @@ Each extension object will follow the rules laid out by ZEP0009 ### Voting +TODO: for PR? + As such, this ZEP will follow the active ZEP process. The goal is to encourage more exploration by the Zarr community outside of what's currently defined with the core specification. -#### Using `must_understand` (TODO: remove) +### Use cases -The v3 core spec allows for the addition of new metadata keys as part of spec -evolution. This is defined in the -["Extension Points" section](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#extension-points) -of the core spec. Currently, implementations MUST parse the entire metadata and -fail, if they find keys they cannot parse unless those are marked by -`must_understand=False`. We propose using this mechanism to evolve the spec and -add the keys that are necessary to achieve a well-defined extension mechanism. +TODO +consolidated metadata +domain specific (why not just in attributes / claiming a namespace) -New keys in the metadata MAY contain objects that have a `must_understand` key -with value `false`. In that case, the key may be ignored by implementations -that cannot parse it. This is useful for extensions that aren't strictly -required for interaction with the data. If the new key holds a scalar value or -an array or doesn't not contain the `must_understand` key, it is implictily -`must_understand=True`. In that case, implementations MUST fail if they cannot -parse the key. ### Example From e3dbe934fc50ec88f75a5d06f399836958600faf Mon Sep 17 00:00:00 2001 From: Josh Moore Date: Wed, 14 May 2025 00:10:33 +0200 Subject: [PATCH 04/18] Prepare individual examples --- draft/ZEP0010.md | 120 +++++++++++++++++++++++++++++++++-------------- 1 file changed, 85 insertions(+), 35 deletions(-) diff --git a/draft/ZEP0010.md b/draft/ZEP0010.md index 180ef36..1b24da7 100644 --- a/draft/ZEP0010.md +++ b/draft/ZEP0010.md @@ -38,65 +38,103 @@ propose to add a general-purpose extension point `extensions` on both arrays and groups into which extensions MAY be added. The `extensions` field if present MUST contain an array of extension -definitions. The held array MUST either have one or more extensions or the +definitions. The contained array MUST either have one or more extensions or the object MUST be omitted entirely. ### Definition and naming -Each extension object will follow the rules laid out by ZEP0009 - -### Voting - -TODO: for PR? - -As such, this ZEP will follow the active ZEP process. The goal is to encourage more -exploration by the Zarr community outside of what's currently defined with the -core specification. +Each extension object will follow the rules laid out by ZEP0009. +### Processing -### Use cases +TODO as in spec -TODO -consolidated metadata -domain specific (why not just in attributes / claiming a namespace) +### Examples -### Example +The following examples represent a few realistic use cases of the top-level +``extensions`` container. -The following example represents an Array showing many of the proposed changes -described above: +#### Offset (array) ```javascript { "zarr_format": 3, + "node_type": "array", ..., - "extensions": [ // new general-purpose extension point + "extensions": [ { - "name": "https://example.com/zarr/offset", // uri-based name + "name": "example.offset", "configuration": { "offset": [ 12 ] } - }, + } + ] +} +``` + +#### Statistics (array) + +```javascript +{ + "zarr_format": 3, + "node_type": "array", + ..., + "extensions": [ { - "name": "https://example.com/zarr/array-statistics", // uri-based name + "name": "example.array-statistics", + "must_understand": false "configuration": { "min": 5, "max": 12 - }, - "must_understand": false // optional extension - }, + } + } + ] +} +``` + +#### Domain metadata (group) + +```javascript +{ + "zarr_format": 3, + "node_type": "group", + ..., + "extensions": [ { - "name": "https://example.com/zarr/consolidated-metadata", // uri-based name - "configuration": { ... } - "must_understand": false // optional extension + "name": "domain.metadata", + "must_understand": false + "configuration": { + "multiscale": { + "datasets": [ + "path/to/array/1", + "path/to/array/2", + "path/to/array/3" + ] + } + } } ], } +``` + +domain specific (why not just in attributes / claiming a namespace) +difference to attributes +#### Debugging (group or array) + +```javascript +{ + "zarr_format": 3, + ..., + "extensions": [ + "example.debugging" + ] +} ``` -### Discussion +## Discussion -#### Alternatives for the `extensions` extension point +### Alternatives for the `extensions` extension point This proposal contains a new general-purpose extension point `extensions`, which holds an array of extensions. This design allows to have the same @@ -108,7 +146,7 @@ adopting functionality of an extension. There are alternative designs to be considered: -##### Top-level metadata keys +### Top-level metadata keys Instead of a general-purpose extension point, we could also add new top-level extension keys to the metadata. @@ -148,7 +186,7 @@ There has been some controversy about using URLs as keys in JSON metadata. However, it has also been used effectively in formats such as JSON-LD (see below). -##### `extensions` object +### `extensions` object Instead of an array that holds the extension definitions, we could also use an object. @@ -177,15 +215,17 @@ However, this alternative would reserve the top-level namespace for changes to the core spec and, therefore, reduce pollution of the top-level namespace. -#### Applicaton to subnodes +### Applicaton to subnodes Conceptually, we propose that extensions defined on groups should be valid for their child nodes. However, the details of how an implementation should identify which extensions are active within an hierarchy are unclear. Relying on traversing the hierarchy towards the root node is undesirable from a -performance point of view. By writing *some* metadata within the contained -subgroups and arrays this could be made easier. Options for what this metadata -could be include: +performance point of view. + +As a workaround, extension authors can choose to write *some* metadata within +the contained subgroups and arrays to make this easier. Options for what +this metadata could be include: 1. A copy of the metadata @@ -248,10 +288,20 @@ could be include: } ``` +As further experience is gained by the community of extension authors, +one or more of these methods may be adopted into the core spec. + ## Changelog - 2025-05-12: Migrate phase 2 of the original ZEP9 +### Voting + +TODO: for PR? + +As such, this ZEP will follow the active ZEP process. The goal is to encourage more +exploration by the Zarr community outside of what's currently defined with the +core specification. ## Copyright From d41b16195ec59d1c43f2cd379363989fa11b2ef3 Mon Sep 17 00:00:00 2001 From: Josh Moore Date: Wed, 14 May 2025 00:26:29 +0200 Subject: [PATCH 05/18] Explanations and cleanup --- draft/ZEP0010.md | 208 ++++++++++++++++++++++++++--------------------- 1 file changed, 115 insertions(+), 93 deletions(-) diff --git a/draft/ZEP0010.md b/draft/ZEP0010.md index 1b24da7..deba7d7 100644 --- a/draft/ZEP0010.md +++ b/draft/ZEP0010.md @@ -48,6 +48,8 @@ Each extension object will follow the rules laid out by ZEP0009. ### Processing +Implementations SHOULD .. + TODO as in spec @@ -66,12 +68,21 @@ The following examples represent a few realistic use cases of the top-level "extensions": [ { "name": "example.offset", - "configuration": { "offset": [ 12 ] } + "configuration": { "offset": [ 12, 24 ] } } ] } ``` +The ``example.offset`` extension contains an array of the same order as the +shape of the containing array specifying which element of the array should be +considered as the origin, e.g., `[0, 0]`. This allows the reuse of subregions +of an array without the need to rewrite the data. + +Note that in this example of the extension is ``must_understand=True`` meaning +an implementation which does not support the ``example.offset`` extension +should raise an error. + #### Statistics (array) ```javascript @@ -82,16 +93,20 @@ The following examples represent a few realistic use cases of the top-level "extensions": [ { "name": "example.array-statistics", - "must_understand": false + "must_understand": false, "configuration": { "min": 5, - "max": 12 + "max": 1023 } } ] } ``` +The ``example.array-statistics`` extension contains two fields -- ``min`` +and ``max`` specifying the range of values which are present in the array, +reducing the need to read every byte. + #### Domain metadata (group) ```javascript @@ -101,8 +116,8 @@ The following examples represent a few realistic use cases of the top-level ..., "extensions": [ { - "name": "domain.metadata", - "must_understand": false + "name": "example.domain-metadata", + "must_understand": false, "configuration": { "multiscale": { "datasets": [ @@ -117,8 +132,14 @@ The following examples represent a few realistic use cases of the top-level } ``` -domain specific (why not just in attributes / claiming a namespace) -difference to attributes +Domain-specific metadata is introduced in the ``example.domain-metadata`` +extension which allows encoding a relationship between multiple arrays at the +group level. Here, a "multiscale pyramid" of arrays is being defined which is +a common idiom in the geospatial and bioimaging uses of Zarr. + +The value of specifying this metadata as an extension as opposed to a +user-attribute is the clear registration of the extension name, providing a +namespace for the metadata to prevent collisions. #### Debugging (group or array) @@ -132,88 +153,8 @@ difference to attributes } ``` -## Discussion - -### Alternatives for the `extensions` extension point - -This proposal contains a new general-purpose extension point `extensions`, -which holds an array of extensions. This design allows to have the same -extension definition syntax across all extension points. It also avoids using -URIs as keys in JSON metadata and reduces pollution of the top-level namespace -in a `zarr.json`. Thus, the addition of top-level metadata keys remains -reserved to changes in the core spec. This MAY happen as part of the core spec -adopting functionality of an extension. - -There are alternative designs to be considered: - -### Top-level metadata keys - -Instead of a general-purpose extension point, we could also add new top-level -extension keys to the metadata. - -```javascript -{ - "zarr_format": 3, - ... - "https://example.com/zarr/offset": { "offset": [ 12 ] }, - "https://example.com/zarr/array-statistics": { - "min": 5, - "max": 12 - }, - "https://example.com/zarr/consolidated-metadata": { - "must_understand": false, - ... - }, // optional extension - ... -} -``` - -In this case, there would be no explicit `configuration` key within an -extension definition, but instead all the keys of such a configuration would be -in the object itself. This would mean that there are two separate types of -extension definitions, i.e. `{"name":"", "configuration": {...}}` in -specialized extension points (e.g. `codecs`) and `"": {...}` for other -extensions. - -It would still be recommended to use an object with keys for the extension -definition to allow for evolution of the extension. - -In case an extension becomes adopted into the core spec, implementations -wouldn't need to be changed (only when changing the name from URI-based to raw -name). - -There has been some controversy about using URLs as keys in JSON metadata. -However, it has also been used effectively in formats such as JSON-LD (see -below). - -### `extensions` object - -Instead of an array that holds the extension definitions, we could also use an object. - -```javascript -{ - "zarr_format": 3, - ... - "extensions": { - "https://example.com/zarr/offset": { "offset": [ 12 ] }, - "https://example.com/zarr/array-statistics": { - "min": 5, - "max": 12 - }, - "https://example.com/zarr/consolidated-metadata": { - "must_understand": false, - ... - } // optional extension - }, - ... -} -``` - -This alternative is similar to the top-level keys, with mostly the same implications. - -However, this alternative would reserve the top-level namespace for changes to -the core spec and, therefore, reduce pollution of the top-level namespace. - +The ``example.debugging`` extension is being referenced here by name since +no further configuration is necessary. ### Applicaton to subnodes @@ -233,7 +174,7 @@ this metadata could be include: { "extensions": [ { - "name": "https://example.com/my-extension", + "name": "example.my-extension", "configuration": { ... full copy of the metadata ...} } ] @@ -247,7 +188,7 @@ this metadata could be include: { "extensions": [ { - "name": "https://example.com/my-extension", + "name": "example.my-extension", "configuration": { "reference": "../.." } @@ -263,7 +204,7 @@ this metadata could be include: { "extensions": [ { - "name": "https://example.com/my-extension-ref", + "name": "example.my-extension-ref", "configuration": { "reference": "../.." } @@ -279,7 +220,7 @@ this metadata could be include: { "extensions": [ { - "name": "https://zarr.dev/extensions/parent-ref", + "name": "example.parent-ref", "configuration": { "reference": "../.." } @@ -291,6 +232,87 @@ this metadata could be include: As further experience is gained by the community of extension authors, one or more of these methods may be adopted into the core spec. +### Alternatives for the `extensions` extension point + +This proposal contains a new general-purpose extension point `extensions`, +which holds an array of extensions. This design allows having the same +extension definition syntax across all extension points and reduces pollution +of the top-level namespace in a `zarr.json`. Thus, the addition of top-level +metadata keys remains reserved to changes in the core spec. This MAY happen as +part of the core spec adopting functionality of an extension. + +Alternative designs that were considered are listed below along with their +pros and cons. + +#### Top-level metadata keys + +Instead of a general-purpose extension point, we could also add new top-level +extension keys to the metadata. + +```javascript +{ + "zarr_format": 3, + ... + "https://example.com/zarr/offset": { "offset": [ 12 ] }, + "https://example.com/zarr/array-statistics": { + "min": 5, + "max": 12 + }, + "https://example.com/zarr/consolidated-metadata": { + "must_understand": false, + ... + }, // optional extension + ... +} +``` + +In this case, there would be no explicit `configuration` key within an +extension definition, but instead all the keys of such a configuration would be +in the object itself. This would mean that there are two separate types of +extension definitions, i.e. `{"name":"", "configuration": {...}}` in +specialized extension points (e.g. `codecs`) and `"": {...}` for other +extensions. + +It would still be recommended to use an object with keys for the extension +definition to allow for evolution of the extension. + +In case an extension becomes adopted into the core spec, implementations +wouldn't need to be changed (only when changing the name from URI-based to raw +name). + +There has been some controversy about using URLs as keys in JSON metadata. +However, it has also been used effectively in formats such as JSON-LD (see +below). + +#### `extensions` object + +Instead of an array that holds the extension definitions, we could also use an object. + +```javascript +{ + "zarr_format": 3, + ... + "extensions": { + "https://example.com/zarr/offset": { "offset": [ 12 ] }, + "https://example.com/zarr/array-statistics": { + "min": 5, + "max": 12 + }, + "https://example.com/zarr/consolidated-metadata": { + "must_understand": false, + ... + } // optional extension + }, + ... +} +``` + +This alternative is similar to the top-level keys, with mostly the same implications. + +However, this alternative would reserve the top-level namespace for changes to +the core spec and, therefore, reduce pollution of the top-level namespace. + + ## Changelog - 2025-05-12: Migrate phase 2 of the original ZEP9 From c255d3ff8e337e1246ae85a1d481f5ac28ed6aa9 Mon Sep 17 00:00:00 2001 From: Josh Moore Date: Wed, 14 May 2025 09:30:57 +0200 Subject: [PATCH 06/18] Wrap up first draft --- draft/ZEP0010.md | 136 ++++++++++++++++++++++++++++++++--------------- 1 file changed, 93 insertions(+), 43 deletions(-) diff --git a/draft/ZEP0010.md b/draft/ZEP0010.md index deba7d7..04e7ef3 100644 --- a/draft/ZEP0010.md +++ b/draft/ZEP0010.md @@ -21,6 +21,14 @@ Created: 2025-05-12 ## Abstract +This proposal defines a new general-purpose extension point, ``extensions``, to be +included in the metadata of Zarr v3 arrays and groups. The extensions field +provides a consistent mechanism for attaching additional metadata that does not +fit into existing extension points defined by the core specification. Extension +entries within this field follow the naming and structure rules established in +ZEP0009. This mechanism enables third parties to define and share metadata +extensions without requiring changes to the core specification or introducing +new top-level keys. ## Introduction @@ -41,17 +49,32 @@ The `extensions` field if present MUST contain an array of extension definitions. The contained array MUST either have one or more extensions or the object MUST be omitted entirely. +Further details on the specification changes can be found in +TODO PR NUMBER. ### Definition and naming -Each extension object will follow the rules laid out by ZEP0009. +Each extension object will follow the rules laid out in the "Zarr extensions" +section of the v3 specification. ### Processing -Implementations SHOULD .. - -TODO as in spec - +In a manner similar to OpenGL’s extension mechanism, where implementations must +explicitly advertise and support extensions they recognize, Zarr implementers +are expected to inspect the extensions array and determine whether each listed +extension is supported. If an extension includes "must_understand": true and +the implementation does not support it, the dataset must not be loaded and an +appropriate error should be raised. For extensions without must_understand, +implementers may safely ignore unrecognized entries. + +To support a given extension, an implementation must either (1) check for known +extension names and invoke appropriate logic according to the extension’s +specification at the correct point in its processing pipeline (e.g., during +metadata interpretation, data access, or layout resolution), or (2) delegate +that logic via a callback or plugin mechanism that allows third-party code to +handle the extension dynamically. This modular approach enables implementers to +support a flexible and evolving set of extensions while maintaining core +compatibility. ### Examples @@ -79,7 +102,7 @@ shape of the containing array specifying which element of the array should be considered as the origin, e.g., `[0, 0]`. This allows the reuse of subregions of an array without the need to rewrite the data. -Note that in this example of the extension is ``must_understand=True`` meaning +Note that in this example of the extension is ``must_understand=true`` meaning an implementation which does not support the ``example.offset`` extension should raise an error. @@ -105,7 +128,8 @@ should raise an error. The ``example.array-statistics`` extension contains two fields -- ``min`` and ``max`` specifying the range of values which are present in the array, -reducing the need to read every byte. +reducing the need to read every byte. ``must_understand`` is false, so +implementations can safely ignored the extension. #### Domain metadata (group) @@ -141,22 +165,23 @@ The value of specifying this metadata as an extension as opposed to a user-attribute is the clear registration of the extension name, providing a namespace for the metadata to prevent collisions. -#### Debugging (group or array) +#### Tracing (group or array) ```javascript { "zarr_format": 3, ..., "extensions": [ - "example.debugging" + "example.tracing" ] } ``` -The ``example.debugging`` extension is being referenced here by name since -no further configuration is necessary. +The ``example.tracing`` extension is being referenced here by name since +no further configuration is necessary. This could, for example, activate +detailed information of the processing pipeline within the implementation. -### Applicaton to subnodes +### Application to sub-nodes Conceptually, we propose that extensions defined on groups should be valid for their child nodes. However, the details of how an implementation should @@ -246,19 +271,19 @@ pros and cons. #### Top-level metadata keys -Instead of a general-purpose extension point, we could also add new top-level -extension keys to the metadata. +Instead of a general-purpose extension point, new top-level +extension keys could be added to the metadata:: ```javascript { "zarr_format": 3, ... - "https://example.com/zarr/offset": { "offset": [ 12 ] }, - "https://example.com/zarr/array-statistics": { + "example.offset": { "offset": [ 12 ] }, + "example.array-statistics": { "min": 5, "max": 12 }, - "https://example.com/zarr/consolidated-metadata": { + "example.consolidated-metadata": { "must_understand": false, ... }, // optional extension @@ -268,37 +293,32 @@ extension keys to the metadata. In this case, there would be no explicit `configuration` key within an extension definition, but instead all the keys of such a configuration would be -in the object itself. This would mean that there are two separate types of +in the object itself. Using an object rather than directly for example +an array of values would allow for evolution of the extension. + +This would mean, however, that there are two separate types of extension definitions, i.e. `{"name":"", "configuration": {...}}` in specialized extension points (e.g. `codecs`) and `"": {...}` for other extensions. -It would still be recommended to use an object with keys for the extension -definition to allow for evolution of the extension. +A benefit would be that if an extension becomes adopted into the core spec, implementations +would not need to be updated to support their move from the ``extensions`` object. -In case an extension becomes adopted into the core spec, implementations -wouldn't need to be changed (only when changing the name from URI-based to raw -name). +#### Simple `extensions` object -There has been some controversy about using URLs as keys in JSON metadata. -However, it has also been used effectively in formats such as JSON-LD (see -below). - -#### `extensions` object - -Instead of an array that holds the extension definitions, we could also use an object. +Instead of an array that holds the extension definitions, an object could alternatively be used:: ```javascript { "zarr_format": 3, ... "extensions": { - "https://example.com/zarr/offset": { "offset": [ 12 ] }, - "https://example.com/zarr/array-statistics": { + "example.offset": { "offset": [ 12 ] }, + "example.array-statistics": { "min": 5, "max": 12 }, - "https://example.com/zarr/consolidated-metadata": { + "example.consolidated-metadata": { "must_understand": false, ... } // optional extension @@ -309,22 +329,52 @@ Instead of an array that holds the extension definitions, we could also use an o This alternative is similar to the top-level keys, with mostly the same implications. -However, this alternative would reserve the top-level namespace for changes to -the core spec and, therefore, reduce pollution of the top-level namespace. +This alternative would continue to reserve the top-level namespace for changes to +the core spec and, therefore, reduce pollution of the top-level namespace. Downsides include +that only a single use of each extension would be possible since the key is the extension +name and there would be no ordering of the extensions. +#### Complex `extensions` object -## Changelog +Finally, a more complex ``extensions`` object could be defined:: - - 2025-05-12: Migrate phase 2 of the original ZEP9 - -### Voting +```javascript +{ + "zarr_format": 3, + ... + "extensions": { + "version": 1, + "contents": [ + { + "name": "example.offset", + "configuration": { "offset": [ 12 ] } + }, + { + "name": "example.array-statistics", + "configuration: { + "min": 5, + "max": 12 + } + }, + { + "name": "example.consolidated-metadata", + "must_understand": false, + "configuration": { + ... + } + } + ] + }, + ... +} +``` -TODO: for PR? +This strategy combines the object strategy for extensibility with the uniformity +of using a list of extension definitions, at the cost of a more complex object to parse. -As such, this ZEP will follow the active ZEP process. The goal is to encourage more -exploration by the Zarr community outside of what's currently defined with the -core specification. +## Changelog + - 2025-05-12: Migrate phase 2 of the original ZEP9 ## Copyright From db7ae377dd4fca3a31682bc65e00f9c27705661c Mon Sep 17 00:00:00 2001 From: Josh Moore Date: Thu, 15 May 2025 13:35:07 +0200 Subject: [PATCH 07/18] Apply Norman's feedback on the examples --- draft/ZEP0010.md | 49 ++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 41 insertions(+), 8 deletions(-) diff --git a/draft/ZEP0010.md b/draft/ZEP0010.md index 04e7ef3..239a141 100644 --- a/draft/ZEP0010.md +++ b/draft/ZEP0010.md @@ -79,7 +79,8 @@ compatibility. ### Examples The following examples represent a few realistic use cases of the top-level -``extensions`` container. +``extensions`` container. This ZEP is putting in place the mechanism so the +community can experiment with such extensions before their standardization. #### Offset (array) @@ -131,6 +132,25 @@ and ``max`` specifying the range of values which are present in the array, reducing the need to read every byte. ``must_understand`` is false, so implementations can safely ignored the extension. +#### Skip empty chunks (array) + +```javascript +{ + "zarr_format": 3, + ..., + "extensions": [ + "example.skip_empty_chunks" + ] +} +``` + +Currently the "write_empty_chunks" flag in zarr-python is not propagated +to the zarr.json file. An extension like ``example.skip_empty_chunks`` +could serve as a no-configuration flag in the metadata to inform +implementations that empty chunks should not be written. + + + #### Domain metadata (group) ```javascript @@ -163,23 +183,36 @@ a common idiom in the geospatial and bioimaging uses of Zarr. The value of specifying this metadata as an extension as opposed to a user-attribute is the clear registration of the extension name, providing a -namespace for the metadata to prevent collisions. +namespace for the metadata to prevent collisions. Conceptually, the extensions +space is more for use by software and automated processes while the attributes +are more for human use. -#### Tracing (group or array) +#### Tiered storage (group) ```javascript { "zarr_format": 3, + "node_type": "group", ..., "extensions": [ - "example.tracing" - ] + { + "name": "example.tiered-storage", + "must_understand": false, + "configuration": { + "slow-arrays": [ + "path/to/array/1" + ] + } + } + ], } ``` -The ``example.tracing`` extension is being referenced here by name since -no further configuration is necessary. This could, for example, activate -detailed information of the processing pipeline within the implementation. +Related to the multiscales example above, an ``example.tiered-storage`` +extension could identify arrays within a group which have been put on +slower or even archived filesystems which will encourage more overhead +and potentially costs if they are accessed. An implementation might +warn users before opening the array. ### Application to sub-nodes From 48c319ccfe208a32a7f23d6ef7e9982b1b6b4890 Mon Sep 17 00:00:00 2001 From: Josh Moore Date: Fri, 16 May 2025 15:23:49 +0200 Subject: [PATCH 08/18] Introduce 'generic extensions' nomenclature --- draft/ZEP0010.md | 27 +++++++++++++-------------- 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/draft/ZEP0010.md b/draft/ZEP0010.md index 239a141..68dd31e 100644 --- a/draft/ZEP0010.md +++ b/draft/ZEP0010.md @@ -1,12 +1,12 @@ --- layout: default title: ZEP010 -description: This ZEP proposes a new top-level extensions container field. +description: This ZEP proposes a new generic extensions field. parent: draft ZEPs nav_order: 10 --- -# ZEP 10 — Zarr Extensions Container +# ZEP 10 — Zarr Generic Extensions Authors: @@ -21,8 +21,8 @@ Created: 2025-05-12 ## Abstract -This proposal defines a new general-purpose extension point, ``extensions``, to be -included in the metadata of Zarr v3 arrays and groups. The extensions field +This proposal defines a new generic extension point, ``extensions``, to be +included in the metadata of Zarr v3 arrays and groups. The ``extensions`` field provides a consistent mechanism for attaching additional metadata that does not fit into existing extension points defined by the core specification. Extension entries within this field follow the naming and structure rules established in @@ -36,16 +36,16 @@ Zarr specification version 3 currently defines four extension points, each associated with a specific (array) metadata field. Additional extension points may be added by future ZEPs. Until that time, however, third-parties may want to add arbitrary extension objects to either arrays or groups. This proposal -introduces a top-level "extensions" field that serves as a container for such a +introduces a generic ``extensions`` field that serves as a container for such a list of extensions. ## Proposal To provide for more flexible, immediate, and de-centralized use cases, we -propose to add a general-purpose extension point `extensions` on +propose to add a generic extension point ```extensions`` on both arrays and groups into which extensions MAY be added. -The `extensions` field if present MUST contain an array of extension +The ``extensions`` field if present MUST contain an array of extension definitions. The contained array MUST either have one or more extensions or the object MUST be omitted entirely. @@ -62,10 +62,10 @@ section of the v3 specification. In a manner similar to OpenGL’s extension mechanism, where implementations must explicitly advertise and support extensions they recognize, Zarr implementers are expected to inspect the extensions array and determine whether each listed -extension is supported. If an extension includes "must_understand": true and -the implementation does not support it, the dataset must not be loaded and an -appropriate error should be raised. For extensions without must_understand, -implementers may safely ignore unrecognized entries. +extension is supported. If an extension includes ``"must_understand": true`` +(the default) and the implementation does not support it, the dataset must not +be loaded and an appropriate error should be raised. For extensions with +``"must_understand": false``, implementers may safely ignore unrecognized entries. To support a given extension, an implementation must either (1) check for known extension names and invoke appropriate logic according to the extension’s @@ -292,8 +292,7 @@ one or more of these methods may be adopted into the core spec. ### Alternatives for the `extensions` extension point -This proposal contains a new general-purpose extension point `extensions`, -which holds an array of extensions. This design allows having the same +The current design allows having the same extension definition syntax across all extension points and reduces pollution of the top-level namespace in a `zarr.json`. Thus, the addition of top-level metadata keys remains reserved to changes in the core spec. This MAY happen as @@ -304,7 +303,7 @@ pros and cons. #### Top-level metadata keys -Instead of a general-purpose extension point, new top-level +Instead of a generic extension point, new top-level extension keys could be added to the metadata:: ```javascript From ffda70895882164938fb8809c902165690a18572 Mon Sep 17 00:00:00 2001 From: Josh Moore Date: Fri, 16 May 2025 16:53:06 +0200 Subject: [PATCH 09/18] Update spec PR number --- draft/ZEP0010.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/draft/ZEP0010.md b/draft/ZEP0010.md index 68dd31e..605239c 100644 --- a/draft/ZEP0010.md +++ b/draft/ZEP0010.md @@ -50,7 +50,7 @@ definitions. The contained array MUST either have one or more extensions or the object MUST be omitted entirely. Further details on the specification changes can be found in -TODO PR NUMBER. +. ### Definition and naming From f4dea6f26ba16a386b4bf4007ef96189e20d899e Mon Sep 17 00:00:00 2001 From: Josh Moore Date: Fri, 16 May 2025 16:07:54 -0400 Subject: [PATCH 10/18] Update ZEP0010.md Co-authored-by: Davis Bennett --- draft/ZEP0010.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/draft/ZEP0010.md b/draft/ZEP0010.md index 605239c..a4f151a 100644 --- a/draft/ZEP0010.md +++ b/draft/ZEP0010.md @@ -130,7 +130,7 @@ should raise an error. The ``example.array-statistics`` extension contains two fields -- ``min`` and ``max`` specifying the range of values which are present in the array, reducing the need to read every byte. ``must_understand`` is false, so -implementations can safely ignored the extension. +implementations can safely ignore the extension. #### Skip empty chunks (array) From f4b3b2a86159de2dbe7f5306768f999303002b14 Mon Sep 17 00:00:00 2001 From: Josh Moore Date: Fri, 16 May 2025 22:35:54 +0200 Subject: [PATCH 11/18] Update draft/ZEP0010.md Co-authored-by: Davis Bennett --- draft/ZEP0010.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/draft/ZEP0010.md b/draft/ZEP0010.md index a4f151a..7ecfe8b 100644 --- a/draft/ZEP0010.md +++ b/draft/ZEP0010.md @@ -42,7 +42,7 @@ list of extensions. ## Proposal To provide for more flexible, immediate, and de-centralized use cases, we -propose to add a generic extension point ```extensions`` on +propose to add a generic extension point ``extensions`` on both arrays and groups into which extensions MAY be added. The ``extensions`` field if present MUST contain an array of extension From b82d3d438dc761c1ff5953dba6fd1bf50efa8de4 Mon Sep 17 00:00:00 2001 From: Josh Moore Date: Sat, 24 May 2025 07:40:13 +0200 Subject: [PATCH 12/18] Intro clarification --- draft/ZEP0010.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/draft/ZEP0010.md b/draft/ZEP0010.md index 7ecfe8b..121c9e5 100644 --- a/draft/ZEP0010.md +++ b/draft/ZEP0010.md @@ -39,6 +39,18 @@ to add arbitrary extension objects to either arrays or groups. This proposal introduces a generic ``extensions`` field that serves as a container for such a list of extensions. +These general purpose extensions are not limited by the scopes of existing +extension points and require no heavy-weight process to add functionality or alter +behavior of arrays and groups. +The intent is to facilitate decentralized and low-friction +innovation within the Zarr ecosystem by enabling third parties to experiment +with new features without requiring immediate changes +to the core specification. +By tolerating a broader range of experimental extensions, the community can +explore diverse use cases and patterns. Over time, widely adopted extensions +may serve as the foundation for future standardization through new ZEPS which +introduce new extension points or even core features. + ## Proposal To provide for more flexible, immediate, and de-centralized use cases, we From a1495bbc9e681fb39510a3f0c041048cef1a4d49 Mon Sep 17 00:00:00 2001 From: Josh Moore Date: Sat, 24 May 2025 07:40:32 +0200 Subject: [PATCH 13/18] Remove opengl ref and clarify processing --- draft/ZEP0010.md | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/draft/ZEP0010.md b/draft/ZEP0010.md index 121c9e5..53af558 100644 --- a/draft/ZEP0010.md +++ b/draft/ZEP0010.md @@ -71,22 +71,25 @@ section of the v3 specification. ### Processing -In a manner similar to OpenGL’s extension mechanism, where implementations must -explicitly advertise and support extensions they recognize, Zarr implementers +Zarr implementers are expected to inspect the extensions array and determine whether each listed extension is supported. If an extension includes ``"must_understand": true`` (the default) and the implementation does not support it, the dataset must not be loaded and an appropriate error should be raised. For extensions with ``"must_understand": false``, implementers may safely ignore unrecognized entries. -To support a given extension, an implementation must either (1) check for known +To support a given extension, an implementation many hard-code a check for known extension names and invoke appropriate logic according to the extension’s specification at the correct point in its processing pipeline (e.g., during -metadata interpretation, data access, or layout resolution), or (2) delegate +metadata interpretation, data access, or layout resolution). +Where possible, however, implementations are encouraged, to delegate that logic via a callback or plugin mechanism that allows third-party code to -handle the extension dynamically. This modular approach enables implementers to -support a flexible and evolving set of extensions while maintaining core -compatibility. +handle the extension dynamically. + +As the set of extensions evolves, certain interfaces may arise which allow +this modular approach for a subset of extensions. Where possible, these +interfaces will be added to the specification. Feedback from implementers +on such matters is highly encouraged. ### Examples From dcc2246be9f7870fda77f0f781e883548692b282 Mon Sep 17 00:00:00 2001 From: Josh Moore Date: Sat, 24 May 2025 07:53:53 +0200 Subject: [PATCH 14/18] Update multiscales example --- draft/ZEP0010.md | 28 +++++++++++++++++----------- 1 file changed, 17 insertions(+), 11 deletions(-) diff --git a/draft/ZEP0010.md b/draft/ZEP0010.md index 53af558..afce85a 100644 --- a/draft/ZEP0010.md +++ b/draft/ZEP0010.md @@ -166,7 +166,7 @@ implementations that empty chunks should not be written. -#### Domain metadata (group) +#### Multiscale arrays (group) ```javascript { @@ -175,7 +175,7 @@ implementations that empty chunks should not be written. ..., "extensions": [ { - "name": "example.domain-metadata", + "name": "example.multiscale-arrays", "must_understand": false, "configuration": { "multiscale": { @@ -191,16 +191,22 @@ implementations that empty chunks should not be written. } ``` -Domain-specific metadata is introduced in the ``example.domain-metadata`` +Metadata is introduced in the ``example.multiscale-arrays`` extension which allows encoding a relationship between multiple arrays at the -group level. Here, a "multiscale pyramid" of arrays is being defined which is -a common idiom in the geospatial and bioimaging uses of Zarr. - -The value of specifying this metadata as an extension as opposed to a -user-attribute is the clear registration of the extension name, providing a -namespace for the metadata to prevent collisions. Conceptually, the extensions -space is more for use by software and automated processes while the attributes -are more for human use. +group level. This defines a "multiscale pyramid" of arrays which is +a common idiom in both the geospatial and bioimaging uses of Zarr. +Implementations may choose to return a different subclass or backend when +detecting such metadata. In this case, a "datatree" which allows similar +operations on all levels of the pyramid might be preferred. + +Conceptually, the extensions space is intended primarily for use by software +and automated processes, with the potential to influence behavior or processing +logic, whereas attributes are generally intended for human interpretation and +serve as passive metadata or provenance information. The boundary, however, is +not always clear. +Specifying this metadata as an extension as opposed to a +user-attribute allows the clear registration of the extension name, providing a +namespace for the metadata to prevent collisions. #### Tiered storage (group) From 547c501b996cd4b061523274c8179653561c1792 Mon Sep 17 00:00:00 2001 From: Josh Moore Date: Sat, 24 May 2025 07:55:50 +0200 Subject: [PATCH 15/18] Replace dataset with node --- draft/ZEP0010.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/draft/ZEP0010.md b/draft/ZEP0010.md index afce85a..f6dd88d 100644 --- a/draft/ZEP0010.md +++ b/draft/ZEP0010.md @@ -72,9 +72,9 @@ section of the v3 specification. ### Processing Zarr implementers -are expected to inspect the extensions array and determine whether each listed +are expected to inspect the extensions for each node and determine whether each listed extension is supported. If an extension includes ``"must_understand": true`` -(the default) and the implementation does not support it, the dataset must not +(the default) and the implementation does not support it, the node must not be loaded and an appropriate error should be raised. For extensions with ``"must_understand": false``, implementers may safely ignore unrecognized entries. From dfe741b1f90c39a75053cfd74843dbea871b5c84 Mon Sep 17 00:00:00 2001 From: Josh Moore Date: Sat, 24 May 2025 08:06:45 +0200 Subject: [PATCH 16/18] Clarify sub-nodes --- draft/ZEP0010.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/draft/ZEP0010.md b/draft/ZEP0010.md index f6dd88d..83524f2 100644 --- a/draft/ZEP0010.md +++ b/draft/ZEP0010.md @@ -237,7 +237,10 @@ warn users before opening the array. ### Application to sub-nodes -Conceptually, we propose that extensions defined on groups should be valid for +This ZEP does not try to define the behavior for application to sub-nodes +itself, but defers this to actual extensions. + +Conceptually, we propose that extensions defined on groups may be valid for their child nodes. However, the details of how an implementation should identify which extensions are active within an hierarchy are unclear. Relying on traversing the hierarchy towards the root node is undesirable from a From 35087ba55c6fa2bd7cecdf6eb9c02378d7f717a9 Mon Sep 17 00:00:00 2001 From: Josh Moore Date: Sat, 24 May 2025 08:06:32 +0200 Subject: [PATCH 17/18] Move extensions v attributes to the top --- draft/ZEP0010.md | 27 +++++++++++++++------------ 1 file changed, 15 insertions(+), 12 deletions(-) diff --git a/draft/ZEP0010.md b/draft/ZEP0010.md index 83524f2..3c3c4c2 100644 --- a/draft/ZEP0010.md +++ b/draft/ZEP0010.md @@ -57,9 +57,21 @@ To provide for more flexible, immediate, and de-centralized use cases, we propose to add a generic extension point ``extensions`` on both arrays and groups into which extensions MAY be added. -The ``extensions`` field if present MUST contain an array of extension -definitions. The contained array MUST either have one or more extensions or the -object MUST be omitted entirely. +This field is similar in flexibility to the ``attribues`` field. Conceptually, +``extensions`` is intended primarily for use by software and automated +processes, with the potential to influence behavior or processing logic, +whereas ``attributes`` are generally intended for human interpretation and +serve as passive metadata or provenance information, though the boundaries are +not always distinct. + +By adding a new field, the specification can assert restrictions that if added +to ``attributes``. would amount to a breaking change. If present, the +``extensions`` field MUST contain an array of extension definitions. The +contained array MUST either have one or more extensions or the object MUST be +omitted entirely. Specifying metadata within ``extensions`` as opposed to +``attributes`` allows the clear registration of the extension name, providing a +namespace for the metadata to prevent collisions, and activates the +``must_understand`` handling logic. Further details on the specification changes can be found in . @@ -199,15 +211,6 @@ Implementations may choose to return a different subclass or backend when detecting such metadata. In this case, a "datatree" which allows similar operations on all levels of the pyramid might be preferred. -Conceptually, the extensions space is intended primarily for use by software -and automated processes, with the potential to influence behavior or processing -logic, whereas attributes are generally intended for human interpretation and -serve as passive metadata or provenance information. The boundary, however, is -not always clear. -Specifying this metadata as an extension as opposed to a -user-attribute allows the clear registration of the extension name, providing a -namespace for the metadata to prevent collisions. - #### Tiered storage (group) ```javascript From 0cad1d11ac434e8b45d1ef0afec079c72c765cf3 Mon Sep 17 00:00:00 2001 From: Josh Moore Date: Wed, 28 May 2025 19:54:38 +0200 Subject: [PATCH 18/18] Update draft/ZEP0010.md Co-authored-by: Sanket Verma --- draft/ZEP0010.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/draft/ZEP0010.md b/draft/ZEP0010.md index 3c3c4c2..a56f911 100644 --- a/draft/ZEP0010.md +++ b/draft/ZEP0010.md @@ -1,6 +1,6 @@ --- layout: default -title: ZEP010 +title: ZEP0010 description: This ZEP proposes a new generic extensions field. parent: draft ZEPs nav_order: 10