Skip to content

Recommended dtype when storing an array of strings, UnstableSpecificationWarning #35

@ofk123

Description

@ofk123

Hi,
Have not monitored discussions on data-types, so excuse me if this was asked somewhere else.

I have previously stored a dataset in zarr V2, that, consists of an array looking similar to message below, containing a short string describing another array, (optimization results).

import numpy as np
import xarray as xr

gridshape = (181, 360, 1)
gridded_data = np.random.rand(*gridshape)
opt_message_U30 = np.full(gridshape, "", dtype="U30")
opt_message_S30 = np.full(gridshape, "", dtype="S30")
opt_message_U30[0, 0, 0] = "Converged" # for example
opt_message_S30[0, 0, 0] = "Converged" #
ds = xr.Dataset(
    data_vars={
        "data": (["lat", "lon", "p"], gridded_data),
        "message_U30": (["lat", "lon", "p"], opt_message_U30),
        "message_S30": (["lat", "lon", "p"], opt_message_S30),
    },
    coords={"lat": np.arange(181), "lon": np.arange(360), "p": [0]},
)
ds.chunk().to_zarr("test.zarr", zarr_format=3)

Now I want to store the same data as Zarr V3, and I get the following warnings:

/srv/conda/envs/notebook/lib/python3.12/site-packages/zarr/core/dtype/npy/string.py:248: UnstableSpecificationWarning: The data type (FixedLengthUTF32(length=30, endianness='little')) does not have a Zarr V3 specification. That means that the representation of arrays saved with this data type may change without warning in a future version of Zarr Python. Arrays stored with this data type may be unreadable by other Zarr libraries. Use this data type at your own risk! Check https://github.com/zarr-developers/zarr-extensions/tree/main/data-types for the status of data type specifications for Zarr V3.
  v3_unstable_dtype_warning(self)
/srv/conda/envs/notebook/lib/python3.12/site-packages/zarr/core/dtype/npy/bytes.py:383: UnstableSpecificationWarning: The data type (NullTerminatedBytes(length=30)) does not have a Zarr V3 specification. That means that the representation of arrays saved with this data type may change without warning in a future version of Zarr Python. Arrays stored with this data type may be unreadable by other Zarr libraries. Use this data type at your own risk! Check https://github.com/zarr-developers/zarr-extensions/tree/main/data-types for the status of data type specifications for Zarr V3.
  v3_unstable_dtype_warning(self)
/srv/conda/envs/notebook/lib/python3.12/site-packages/zarr/api/asynchronous.py:228: UserWarning: Consolidated metadata is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.
  warnings.warn(

(Using two different dtypes above to show different UnstableSpecificationWarning's.)

Versions:

np.__version__, xr.__version__
('2.2.6', '2025.7.1')

Does there exist a recommended dtype for storing a DataArray of strings (less than 30 characters in each gridpoint) ?

I know I can index these "optimization-results-messages" and store an array of integer indices instead, but I want a way to store the strings themself, so that I am certain it is always identical to the string from the underlying library (f.ex. scipy.optimize.minimize, or similar).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions