-
Notifications
You must be signed in to change notification settings - Fork 51
Open
Milestone
Description
Hi,
I have used virtualizarr to concatenate several .nc files into one parq one.
I noticed that when I then open the saved dataset, the first value of its index is replaced with nan.
I thus suspect that virtualize.to_kerchunk() might have a bug.
Here how to replicate the issue:
filename = "test"
# synthetic xarray.DataSet inspired by xarray's documentation
temperature = 15 + 8 * np.random.randn(2, 2, 3)
lon = [[-99.83, -99.32], [-99.79, -99.23]]
lat = [[42.25, 42.21], [42.63, 42.59]]
depths = np.arange(150, step=50)
da = xr.DataArray(
data=temperature,
dims=["x", "y", "depth"],
coords=dict(
lon=(["x", "y"], lon),
lat=(["x", "y"], lat),
depth=depths
),
attrs=dict(
description="Ambient temperature.",
units="degC",
),
)
ds = da.to_dataset(name="temperature")
ds.to_netcdf(f"{filename}.nc")
vds = open_virtual_dataset(
f"{filename}.nc",
indexes={},
decode_times=True,
loadable_variables=["lon", "lat", "depth"]
)
print("depth index of vds:\t\t", vds.depth.to_numpy())
# depth index of vds: [ 0 50 100]
# saves as parq/ folder
vds.virtualize.to_kerchunk(f"{filename}.parq", format="parquet")
loaded_ds = xr.open_dataset(f"{filename}.parq", engine="kerchunk", chunks={})
print("depth index of the loaded vds:\t", loaded_ds.depth.to_numpy())
# depth index of the loaded vds: [ nan 50. 100.]
# temporary fix
loaded_ds.coords["depth"].values[0] = 0.
print("index after fix:\t\t", loaded_ds.depth.to_numpy())
# index after fix: [ 0. 50. 100.]I am a beginner and have therefore no idea of the cause...
norlandrhagen
Metadata
Metadata
Assignees
Labels
No labels