-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Hi,
Have not monitored discussions on data-types, so excuse me if this was asked somewhere else.
I have previously stored a dataset in zarr V2, that, consists of an array looking similar to message below, containing a short string describing another array, (optimization results).
import numpy as np
import xarray as xr
gridshape = (181, 360, 1)
gridded_data = np.random.rand(*gridshape)
opt_message_U30 = np.full(gridshape, "", dtype="U30")
opt_message_S30 = np.full(gridshape, "", dtype="S30")
opt_message_U30[0, 0, 0] = "Converged" # for example
opt_message_S30[0, 0, 0] = "Converged" #
ds = xr.Dataset(
data_vars={
"data": (["lat", "lon", "p"], gridded_data),
"message_U30": (["lat", "lon", "p"], opt_message_U30),
"message_S30": (["lat", "lon", "p"], opt_message_S30),
},
coords={"lat": np.arange(181), "lon": np.arange(360), "p": [0]},
)
ds.chunk().to_zarr("test.zarr", zarr_format=3)
Now I want to store the same data as Zarr V3, and I get the following warnings:
/srv/conda/envs/notebook/lib/python3.12/site-packages/zarr/core/dtype/npy/string.py:248: UnstableSpecificationWarning: The data type (FixedLengthUTF32(length=30, endianness='little')) does not have a Zarr V3 specification. That means that the representation of arrays saved with this data type may change without warning in a future version of Zarr Python. Arrays stored with this data type may be unreadable by other Zarr libraries. Use this data type at your own risk! Check https://github.com/zarr-developers/zarr-extensions/tree/main/data-types for the status of data type specifications for Zarr V3.
v3_unstable_dtype_warning(self)
/srv/conda/envs/notebook/lib/python3.12/site-packages/zarr/core/dtype/npy/bytes.py:383: UnstableSpecificationWarning: The data type (NullTerminatedBytes(length=30)) does not have a Zarr V3 specification. That means that the representation of arrays saved with this data type may change without warning in a future version of Zarr Python. Arrays stored with this data type may be unreadable by other Zarr libraries. Use this data type at your own risk! Check https://github.com/zarr-developers/zarr-extensions/tree/main/data-types for the status of data type specifications for Zarr V3.
v3_unstable_dtype_warning(self)
/srv/conda/envs/notebook/lib/python3.12/site-packages/zarr/api/asynchronous.py:228: UserWarning: Consolidated metadata is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.
warnings.warn(
(Using two different dtypes above to show different UnstableSpecificationWarning's.)
Versions:
np.__version__, xr.__version__
('2.2.6', '2025.7.1')
Does there exist a recommended dtype for storing a DataArray of strings (less than 30 characters in each gridpoint) ?
I know I can index these "optimization-results-messages" and store an array of integer indices instead, but I want a way to store the strings themself, so that I am certain it is always identical to the string from the underlying library (f.ex. scipy.optimize.minimize, or similar).