Skip to content

Add depth command for computing coverage depth across all sequences#125

Open
unavailable-2374 wants to merge 8 commits intopangenome:mainfrom
unavailable-2374:pull-123
Open

Add depth command for computing coverage depth across all sequences#125
unavailable-2374 wants to merge 8 commits intopangenome:mainfrom
unavailable-2374:pull-123

Conversation

@unavailable-2374
Copy link
Copy Markdown

The depth command iterates through all sequences as references, calculates coverage depth (unique sample count) for each position using a sweep-line algorithm, and outputs a TSV table with window_id, depth, and sample columns.

Key features:

  • Two-phase processing: parallel overlap queries + sequential deduplication
  • Union approach for A-B/B-A alignment asymmetry (merges all intervals per sample)
  • Groups by PanSN sample name (sample#haplotype#chr -> sample)
  • Supports transitive queries, custom reference ordering, and adjacent window merging
  • Each position is output only once via ProcessedRegions tracking

unavailable-2374 and others added 3 commits February 4, 2026 19:04
The depth command iterates through all sequences as references, calculates
coverage depth (unique sample count) for each position using a sweep-line
algorithm, and outputs a TSV table with window_id, depth, and sample columns.

Key features:
- Two-phase processing: parallel overlap queries + sequential deduplication
- Union approach for A-B/B-A alignment asymmetry (merges all intervals per sample)
- Groups by PanSN sample name (sample#haplotype#chr -> sample)
- Supports transitive queries, custom reference ordering, and adjacent window merging
- Each position is output only once via ProcessedRegions tracking

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add the new subset_filter parameter (as None) to query_transitive_dfs
and query_transitive_bfs calls to match updated method signatures.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…ptimization

Major enhancements to the depth command:
- Add --stats mode for global depth statistics across all sequences
- Add --combined-output for merged BED output with sample lists
- Add sample filtering via --samples and --samples-file options
- Add --memory-efficient mode using compressed bitmaps (~50x less memory)
- Add --fai-list for filling uncovered regions with depth=1
- Add --ref option for targeted mode (single reference sample)
- Add --merge-tolerance for combining adjacent intervals
- Add region query support via -r/--target-range and -b/--target-bed

Technical changes:
- Implement SampleFilter for flexible sample inclusion/exclusion
- Add parallel computation with configurable thread pool
- Optimize memory usage for TB-scale datasets
- Support both per-sample and combined output modes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
unavailable-2374 and others added 5 commits February 8, 2026 13:24
Major improvements to depth command:

1. New --windowed mode (compute_depth_windowed_v2):
   - Parallel batch processing of sequences (100 seqs/batch)
   - Sparse sample storage (Vec<(u16,u32,i64,i64)> vs Vec<Option>)
   - Streaming output per sequence (reduced memory)
   - Numeric sorting by seq_id instead of string comparison

2. Output format changes:
   - Column 1: ID (row number) instead of seq_name
   - Anchor sample column shows anchor sequence coordinates

3. Self-alignment filtering:
   - Default: filter out same-sample transitive alignments
   - Use --include-self-alignments to include intra-genome duplications

4. Merge tolerance for windowed mode:
   - --merge-tolerance now works with --windowed (default: 0.05)
   - Adjacent intervals with depth diff <= tolerance are merged

Performance: ~10x memory reduction, full CPU utilization with -t flag

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Major improvements to depth computation correctness and performance:

- Add jemalloc allocator to fix glibc malloc fragmentation with 128 threads
- Add query_raw_overlapping() for O(n+k) range queries instead of full interval loading
- Add clear_sub_index_cache() to bound memory for per-file indexing mode
- Two-phase processing: hub sequences (Phase 1) complete before leaves (Phase 2)
  - With --ref: ref sequences are Phase 1
  - Without --ref: auto-detect hubs via alignment degree pre-scan
- Phase 1 uses chunk-level parallelism (5MB chunks) for transitive mode
- Degree-based sorting ensures high-connectivity sequences always anchor first
- Fixes incorrect depth in star topologies when hub wasn't processed first

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Restructure help headings: "Mode selection" for --ref/--ref-only/-r/-b,
  "Statistics add-on" for --stats/--combined-output
- Update CLI descriptions to reflect 3 modes: ref-anchored, ref-only, region query
- Update log messages: "Ref-anchored mode" / "Ref-only mode" / "region query mode"
- Update compute_depth_global doc comments to match hub-first design

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Default transitive depth now uses raw-interval BFS with linear
  interpolation instead of CIGAR BFS, avoiding chunk boundary gaps
  and improving performance. CIGAR BFS retained via --use-BFS flag.
- Remove --approximate flag (redundant with raw BFS default).
- Fix depth counting: depth now consistently counts total unique
  samples covering each position (including the anchor sample).
  Previously anchor was excluded from aligned regions but included
  in gap regions, creating an off-by-one inconsistency.
- sweep_line_depth() is now a pure sweep-line with no anchor
  special-casing; callers add a synthetic anchor alignment.
- Fix 200+GB RAM during pre-scan by disabling tree cache before
  compute_alignment_degrees() and clearing sub-index cache after.
- Fix BFS exploration using full alignment extents instead of
  linearly-interpolated clipped coordinates to avoid missing
  alignments at hop boundaries.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant