Skip to content

Conversation

@keller-mark
Copy link

@keller-mark keller-mark commented Nov 5, 2024

Fixes #91

These changes are from both me and @Artur-man

The main public-facing changes here are:

  • The ZarrAnnData class
  • read_zarr and write_zarr top-level functions
  • Support for from_Seurat(output_class="ZarrAnnData")
  • Support for from_SingleCellExperiment(output_class="ZarrAnnData")

Internally:

  • read_zarr_helpers.R is the zarr analog of read_h5ad_helpers.R
  • write_zarr_helpers.R is the zarr analog of write_h5ad_helpers.R
  • Test fixtures within inst/extdata/example.zarr (this makes the diff noisy, apologies)
  • Lots of tests:
    • test-Zarr-read.R (35 new tests)
    • test-Zarr-write.R (70)
    • test-ZarrAnnData.R (26)
    • test-h5ad-zarr.R (17)

A number of these functions generate warnings in the R console that are intended to be followed up on to improve the code (and should probably be resolved as end users may not appreciate them), but the tests still pass despite these warnings.

Known things that are not implemented here:

  • support for recarrays
  • usage of mode = c("r", "r+", "a", "w", "w-", "x") parameter value

Copy link
Member

@rcannood rcannood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work @keller-mark and @Artur-man !

I went through the PR for a first time and left some minor comments. I will review the code by running it a couple of times next :)

attrs <- g$get_attrs()$to_list()

if (!all(c("encoding-type", "encoding-version") %in% names(attrs))) {
path <- name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are a lot of linting issues in this file -- could you run lintr::lint_package() and fix any issues that pop up?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done ... did a full lint_package check and corrected some R check issues too!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should probably be removed

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@Artur-man
Copy link

Artur-man commented Apr 12, 2025

I have quickly checked if pizzarr utilities could be replaced with https://github.com/grimbough/Rarr, unfortunately there exists a set of limitations to the BioC native package, which

Will be in touch to see if these are resolved in the future, otherwise no zarr R package is currently both in CRAN/BioC and functionally complete yet.

@Artur-man
Copy link

Artur-man commented Nov 10, 2025

There is some progress in the Rarr package, would you guys like a clean PR (since there were so many updates since) or continuing here is fine ?

@lazappi
Copy link
Collaborator

lazappi commented Nov 11, 2025

There is some progress in the Rarr package, would you guys like a clean PR (since there were so many updates since) or continuing here is fine ?

Probably whatever is easiest for you and @keller-mark/whatever makes the PR easiest to understand. There have been a lot of changes to the package since this was opened so we would need to make sure those are included here.

I saw that {Rarr} is planning to have Zarr v3 support for the next release so I think that makes sense in terms of which backend package to use.

@Artur-man
Copy link

I adapted the changes, was fairly easy. I will then continue here and refer back to Hugo if needed again.

@lazappi lazappi removed this from the 1.1.0 milestone Nov 20, 2025
@lazappi
Copy link
Collaborator

lazappi commented Nov 20, 2025

I'm going to make this a draft for now, just to help with our organisation. Please let us know when it's ready to review.

@lazappi lazappi marked this pull request as draft November 20, 2025 11:51
@Artur-man
Copy link

Artur-man commented Nov 24, 2025

Guys, I am close .... I will add a new example.zarr here, however lets add a yml file to allow creating quick environments, so the same files can be generated quickly later, and also ... would you like a separate example_zarr.py file ?

# python v3.13.5
import anndata # anndata v0.11.4
import scanpy # scanpy v1.11.4
import numpy # numpy v2.2.6
import pandas # pandas v2.3.0
import scipy.sparse # scipy v1.14.1

@lazappi
Copy link
Collaborator

lazappi commented Nov 25, 2025

We tried to cover as many cases as possible in the example dataset so I would probably modify the script to also output a (tarred/zipped) Zarr as well as an H5AD (and maybe rename it to something like create_example.py) rather than creating a new one.

I noted down the versions just in case it became important later but maybe it makes sense to store them in a separate file. If you are regenerating the dataset I would probably also update the environment. I think there is also a version number in the script that should be bumped and a changelog that should be updated.

@Artur-man
Copy link

Artur-man commented Nov 25, 2025

I have tested this in a couple of anndata examples as well as test datasets from https://github.com/HelenaLC/SpatialData and https://github.com/HelenaLC/SpatialData.data, and it seems to be working.

We are currently waiting for structure data and scalar support from Rarr:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Zarr backend

4 participants