Skip to content

Conversation

@stephenwilliams22
Copy link
Contributor

@stephenwilliams22 stephenwilliams22 commented Dec 10, 2024

This PR adds support for cellranger 9.0 which performs automatic cell type annotation. Creates a simple loading function similar to Load10x_spatial for 10x single cell data which will also put cell annotations in the meta.data.

Changes to other .Rd files result from the roxygenize() function

decided to let @dcollins15 roxigenize on his end to get formatting and documentation that matches the rest of the repo

@rharao
Copy link
Contributor

rharao commented Dec 11, 2024

Thanks for this! I'm excited to try the automatic cell type calling in Cellranger 9.0.
I'm not part of the developer team, but would you be open to some comments on this addition from a user's perspective?

@stephenwilliams22
Copy link
Contributor Author

Thanks for this! I'm excited to try the automatic cell type calling in Cellranger 9.0. I'm not part of the developer team, but would you be open to some comments on this addition from a user's perspective?

100%. Comments welcome.

Copy link
Contributor

@rharao rharao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate it!

@dcollins15 dcollins15 self-requested a review December 11, 2024 18:40
@stephenwilliams22
Copy link
Contributor Author

@dcollins15 sorry for the little formatting edits. Not really sure where those were introduced. Feel free to fix and roxygenise and push directly to my branch as it seems what I'm going on my side is introducing little diffs.

@stephenwilliams22
Copy link
Contributor Author

@evolvedmicrobe I wanted to give you a heads up on this PR

Copy link
Contributor

@rharao rharao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it!

)
})

for (i in 1:seq_along(seurat.list)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1:seq_along(seurat.list)

should be seq_along(seurat.list).
Alternatively, you could replace the for loop with:

seurat.list <- lapply(seurat.list, \(x) if (Assays(x) %in% c("Gene Expression", "RNA")) {Add_10X_CellTypes(data.dir, x)} else {x})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did also notice that if Add_10X_CellTypes doesn't find a cell types csv, it returns the original without warning or error, so if you want to rely on that, you could skip the conditional here.

counts <- Read10X_h5(counts.path, ...)

if (to.upper) {
counts <- imap(counts, ~{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: because this formula operates on the counts matrices only and doesn't use the list's names, I think the imap call could be replaced by base::sapply.
Frankly, I'm not sure why this package imports purrr::imap anyway, as I can't find it used anywhere in the codebase, not even in Load10X_Spatial. But that's an issue for another time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personally, I like the simplicity and readability of tidy functions. I'll leave it up to the seruat developers to decide if they mind the extra import. I haven't heard any complaints since we wrote this function a few years ago.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really an issue; the choice one way or another doesn't affect me. (Personally, I agree; I always prefer the purrr functions in my own scripts.) I only wanted to mention it for the attention of the dev team.
Thanks for your receptiveness 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to replacing the imap call with a sapply to match the current implementation of Load10X_Spatial.

Frankly, I'm not sure why this package imports purrr::imap anyway, as I can't find it used anywhere in the codebase, not even in Load10X_Spatial. But that's an issue for another time.

This is also a good point, @importFrom purrr imap should be dropped from Load10X_Spatial and as a suggested dependency. Now that you mention it, @importFrom grid rasterGrob could also be dropped from the function. I'll include these tweaks in #9556 👌

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like imap for its readability, but here I agree with @/rharao that because the names aren't used, it doesn't make much of a difference. Not to mention, purrr has since been dropped from package imports.

Copy link
Contributor

@dcollins15 dcollins15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great 🎉 Big thanks to @rharao for the code review 🙌

For these types of loader functions, I think the two main priorities should be:

  1. Backward compatibility
  2. Consistency

You've done a great job emulating the existing Load10X_Spatial but I wonder if the two functions can or should share more logic 🤔

I’d also appreciate a bit more clarity on how Load10X is intended to relate to Read10X. Specifically:

  • What versions of Cell Ranger output are compatible with each?
  • As far as I can tell, Read10X hasn't been updated since just before the release of Cell Ranger v7.1.0.

I suspect that we could be doing a better job of making Load10X_Spatial and LoadXenium more mutually coherent cohesive but I'm not very familiar with the datatypes or output structure to propose specific suggestions.

On a related note, what's the easiest way to access 10x datasets in their "canonical" structure? I typically download datasets from the 10x Portal but end up having to manually configure the data.dir for Seurat. I assume there’s a better method that I’m just ignorant of.

This leads nicely into my other concern: testing. Creating small, representative datasets for testing was one of the more tedious parts of the Visium HD updates. Any thoughts on streamlining this process would be great.


@stephenwilliams22, now that I’ve had a thorough look, I think it would be best to hold off on merging this until after the next release (v5.2.0).

counts <- Read10X_h5(counts.path, ...)

if (to.upper) {
counts <- imap(counts, ~{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to replacing the imap call with a sapply to match the current implementation of Load10X_Spatial.

Frankly, I'm not sure why this package imports purrr::imap anyway, as I can't find it used anywhere in the codebase, not even in Load10X_Spatial. But that's an issue for another time.

This is also a good point, @importFrom purrr imap should be dropped from Load10X_Spatial and as a suggested dependency. Now that you mention it, @importFrom grid rasterGrob could also be dropped from the function. I'll include these tweaks in #9556 👌

@anashen anashen self-requested a review September 11, 2025 18:50
@anashen anashen changed the base branch from develop to main September 11, 2025 19:20
@anashen anashen force-pushed the stephen/cellranger-9.0 branch from 1d575a1 to 7e84ebd Compare September 11, 2025 19:21
Copy link
Member

@anashen anashen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenwilliams22 Looks good! Verified functionality using this data & pre-merge checks pass successfully after rebasing (I also switched the base branch of the PR to main). I'm hoping we can get this in with the next release. Made a couple of comments, would appreciate it if you could take a look at those as well as the remaining threads. Thank you!

counts <- Read10X_h5(counts.path, ...)

if (to.upper) {
counts <- imap(counts, ~{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like imap for its readability, but here I agree with @/rharao that because the names aren't used, it doesn't make much of a difference. Not to mention, purrr has since been dropped from package imports.

@stephenwilliams22
Copy link
Contributor Author

stephenwilliams22 commented Sep 16, 2025

This is looking great 🎉 Big thanks to @rharao for the code review 🙌

For these types of loader functions, I think the two main priorities should be:

  1. Backward compatibility
  2. Consistency

You've done a great job emulating the existing Load10X_Spatial but I wonder if the two functions can or should share more logic 🤔

I’d also appreciate a bit more clarity on how Load10X is intended to relate to Read10X. Specifically:

  • What versions of Cell Ranger output are compatible with each?
  • As far as I can tell, Read10X hasn't been updated since just before the release of Cell Ranger v7.1.0.

I suspect that we could be doing a better job of making Load10X_Spatial and LoadXenium more mutually coherent cohesive but I'm not very familiar with the datatypes or output structure to propose specific suggestions.

On a related note, what's the easiest way to access 10x datasets in their "canonical" structure? I typically download datasets from the 10x Portal but end up having to manually configure the data.dir for Seurat. I assume there’s a better method that I’m just ignorant of.

This leads nicely into my other concern: testing. Creating small, representative datasets for testing was one of the more tedious parts of the Visium HD updates. Any thoughts on streamlining this process would be great.

@stephenwilliams22, now that I’ve had a thorough look, I think it would be best to hold off on merging this until after the next release (v5.2.0).

I agree with most of this. We can definitely take the time to share code between all the 10x functions. Please let me know if you want to do that for this release.

As far as the specific code here

  • What versions of Cell Ranger output are compatible with each?
    • Both functions should be compatible will all versions of cellranger. The differences are
      • Read10x will not put cell types in the metadata
  • Read10X
    • As you mentioned, this is a very old function which doesn't take advantage of the "packaged" h5 matrix. Personally, I think that we might think of an end of life for Read10X. Not that 10x is going to get rid of the filtered_feature_bc_matrix directory any times soon but new features will most certainly be focused on the h5 matrix

With regard to testing datasets we are in talks internally about this. I'm going to try and make each release of the rangers come with a tiny dataset with the correct structure for testing.

Datasets on our public data portal should be in the structure of any normal ranger run. if they are not I'd like to hear about the issue so we can provide instruction or correct things on our side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants