Skip to content

Conversation

@ilan-gold
Copy link
Collaborator

@ilan-gold ilan-gold commented Nov 11, 2025

There are a few things this PR does to minimize memory usage from obs:

  1. Delays bringing the whole thing into memory until we are ready to write
  2. Tracks categoricals across datasets so we don't lose those types and tests that categories are maintained across shards
  3. Because we go around instantiating AnnData objects, whether via anndata.concat or by io, I think we should probably make remove_unused_categories=False the default in the library where possible especially given the target application of deep learning where you really want those categories + codes to be correct. I've added something to the README about this

h5 performance is still relatively bad memory-wise because you have to ready string arrays into memory (this comes from anndata , the alternative of wrapping them as dask arrays is somehow even slower).

@codecov
Copy link

codecov bot commented Nov 11, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.24%. Comparing base (18c4e5a) to head (0560f16).

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #85      +/-   ##
==========================================
+ Coverage   90.39%   91.24%   +0.84%     
==========================================
  Files           8        8              
  Lines         656      674      +18     
==========================================
+ Hits          593      615      +22     
+ Misses         63       59       -4     
Files with missing lines Coverage Δ
src/annbatch/io.py 94.79% <100.00%> (+2.83%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ilan-gold ilan-gold marked this pull request as ready for review November 13, 2025 11:57
@ilan-gold ilan-gold requested a review from felix0097 November 13, 2025 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants