Skip to content

Conversation

@lazarusA
Copy link
Contributor

@lazarusA lazarusA commented Nov 16, 2025

test fix CI

TODOs

Tests:

  • PythonCall v3 creation
  • Zarr.jl v3 creation
  • compare !

mkitti and others added 30 commits March 11, 2025 22:11
This reduces the test diff
This also reduces the test diff
Change VersionedStore to FormattedStore
TODO: Fix Zarr v3 type strings
@lazarusA
Copy link
Contributor Author

@mkitti I continued with your v3 prototype here. I added a python v3 version to compared to, however it looks like we are still missing sharding fully, examples are in place to test though. On the good side, it looks like all the other things worked 😕 .

@mkitti
Copy link
Member

mkitti commented Nov 17, 2025

How should we proceed? Should we merge this into master or should we merge this into my branch?

@lazarusA
Copy link
Contributor Author

I could PR to your branch 🙂, and continue there? But I would like to finish it this week 😬😅.

Copy link
Member

@mkitti mkitti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to check the shatd index. I think it is transposed from what I would expect it to be.

Empty chunks are marked with (MAX_UINT64, MAX_UINT64)
"""
struct ShardIndex{N}
offsets_and_lengths::Array{UInt64, N} # Shape: (chunks_per_shard..., 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting. I usually think of this as a matrix 2xN where N is the number of chunks. I wonder if the N should come first since when linearized that is the order of the information.

Perhaps better yet, we should have

struct ChunkShardInfo
     offset::UInt64
     nbytes::UInt64
end

Then we will have an array of ChunkShardInfo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some updates, but I'm out of my depth here, please feel totally free to push directly to this branch, you should have access to it now (after accepting the invite).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we don't have a shards argument on our Julia implementation, right? we need that.

I will not touch this further today, maybe late at night, but not during the day, so please if you can squeeze some commits / minutes in between that would be ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to think about how to best implement this. My current sense is that a shards argument would only be a convenience to a more advanced API that would allow multiple levels of sharding. Let's focus on getting thr codec chain or perhaps codec tree right first.

Copy link
Contributor Author

@lazarusA lazarusA Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function encode_shard_index(index::ShardIndex{N}, index_codecs::Vector{V3Codec}) where N
# Index array is stored in C order (row-major)
# Convert to bytes: the index is an array of UInt64 values
index_bytes = reinterpret(UInt8, vec(index.offsets_and_lengths))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this will end up in the correct order.

Thr numbers should alternate between offset and nbytes.

Copy link
Member

@mkitti mkitti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I caught a few more things. I would be happy to do this myself. Basically ShardIndex looks like it should be an AbstractArray of ShardChunkInfo.

We can merge into master if @meggart has no other plans.

I think it might be good to release the JSON compat fixes first though.

Comment on lines +207 to +209
function set_chunk_slice!(idx::ShardIndex, chunk_coords::NTuple{N,Int}, offset::Int, nbytes::Int) where N
idx.chunks[chunk_coords...] = ChunkShardInfo(UInt64(offset), UInt64(nbytes))
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should defer the casting to UInt64 to the ChunkShardInfo constructor.

Also, I think we could relax the arguments to Integer rather than Int. Also, I am definitely sure Int should not depend on the system word size (Int64 or Int32).

Comment on lines +216 to +218
function set_chunk_empty!(idx::ShardIndex, chunk_coords::NTuple{N,Int}) where N
idx.chunks[chunk_coords...] = ChunkShardInfo()
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it should be a setindex! implementation on the °ShardIndex`.

Comment on lines +225 to +227
function calculate_chunks_per_shard(shard_shape::NTuple{N,Int}, chunk_shape::NTuple{N,Int}) where N
return ntuple(i -> div(shard_shape[i], chunk_shape[i]), N)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it should be length on the ShardIndex.

@coveralls
Copy link

Pull Request Test Coverage Report for Build 19420720700

Details

  • 105 of 505 (20.79%) changed or added relevant lines in 11 files are covered.
  • 3 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-22.5%) to 66.817%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/Storage/http.jl 8 11 72.73%
src/ZArray.jl 14 18 77.78%
src/ZGroup.jl 19 24 79.17%
src/Codecs/Codecs.jl 1 7 14.29%
src/metadata.jl 22 37 59.46%
src/Compressors/v3.jl 0 16 0.0%
src/Storage/formattedstore.jl 32 84 38.1%
src/metadata3.jl 1 128 0.78%
src/Codecs/V3/V3.jl 0 172 0.0%
Files with Coverage Reduction New Missed Lines %
src/ZGroup.jl 1 80.21%
src/Storage/Storage.jl 2 79.27%
Totals Coverage Status
Change from base Build 19048552340: -22.5%
Covered Lines: 1039
Relevant Lines: 1555

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants