-
Notifications
You must be signed in to change notification settings - Fork 29
Continue v3 prototype #223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This reduces the test diff
This also reduces the test diff
Change VersionedStore to FormattedStore
TODO: Fix Zarr v3 type strings
|
@mkitti I continued with your v3 prototype here. I added a python v3 version to compared to, however it looks like we are still missing |
|
How should we proceed? Should we merge this into master or should we merge this into my branch? |
|
I could PR to your branch 🙂, and continue there? But I would like to finish it this week 😬😅. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to check the shatd index. I think it is transposed from what I would expect it to be.
src/Codecs/V3/V3.jl
Outdated
| Empty chunks are marked with (MAX_UINT64, MAX_UINT64) | ||
| """ | ||
| struct ShardIndex{N} | ||
| offsets_and_lengths::Array{UInt64, N} # Shape: (chunks_per_shard..., 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting. I usually think of this as a matrix 2xN where N is the number of chunks. I wonder if the N should come first since when linearized that is the order of the information.
Perhaps better yet, we should have
struct ChunkShardInfo
offset::UInt64
nbytes::UInt64
end
Then we will have an array of ChunkShardInfo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some updates, but I'm out of my depth here, please feel totally free to push directly to this branch, you should have access to it now (after accepting the invite).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we don't have a shards argument on our Julia implementation, right? we need that.
I will not touch this further today, maybe late at night, but not during the day, so please if you can squeeze some commits / minutes in between that would be ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to think about how to best implement this. My current sense is that a shards argument would only be a convenience to a more advanced API that would allow multiple levels of sharding. Let's focus on getting thr codec chain or perhaps codec tree right first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a lot of these things come from: https://github.com/zarr-developers/zarr-python/blob/main/src/zarr/codecs/sharding.py
src/Codecs/V3/V3.jl
Outdated
| function encode_shard_index(index::ShardIndex{N}, index_codecs::Vector{V3Codec}) where N | ||
| # Index array is stored in C order (row-major) | ||
| # Convert to bytes: the index is an array of UInt64 values | ||
| index_bytes = reinterpret(UInt8, vec(index.offsets_and_lengths)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this will end up in the correct order.
Thr numbers should alternate between offset and nbytes.
mkitti
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I caught a few more things. I would be happy to do this myself. Basically ShardIndex looks like it should be an AbstractArray of ShardChunkInfo.
We can merge into master if @meggart has no other plans.
I think it might be good to release the JSON compat fixes first though.
| function set_chunk_slice!(idx::ShardIndex, chunk_coords::NTuple{N,Int}, offset::Int, nbytes::Int) where N | ||
| idx.chunks[chunk_coords...] = ChunkShardInfo(UInt64(offset), UInt64(nbytes)) | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should defer the casting to UInt64 to the ChunkShardInfo constructor.
Also, I think we could relax the arguments to Integer rather than Int. Also, I am definitely sure Int should not depend on the system word size (Int64 or Int32).
| function set_chunk_empty!(idx::ShardIndex, chunk_coords::NTuple{N,Int}) where N | ||
| idx.chunks[chunk_coords...] = ChunkShardInfo() | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it should be a setindex! implementation on the °ShardIndex`.
| function calculate_chunks_per_shard(shard_shape::NTuple{N,Int}, chunk_shape::NTuple{N,Int}) where N | ||
| return ntuple(i -> div(shard_shape[i], chunk_shape[i]), N) | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it should be length on the ShardIndex.
Pull Request Test Coverage Report for Build 19420720700Details
💛 - Coveralls |
test fix CI
TODOs
Tests: