feat!: Use longest compatible remaining transcript #439

jarbesfeld · 2025-10-08T13:39:14Z

closes #438

zealws · 2025-10-08T16:24:59Z

This change does not address the inconsistency in the return values of UtaDatabase.get_transcripts, but by removing that call, this PR does fix the inconsistency in the transcript from fusor's point of view.

I re-ran the test I used to identify #435 and it now produces consistent results:

» rm -f transcripts.txt ; for i in $(seq 50) ; do fusor-annotator.py . ; jq -r '.fusor.structure[0].transcript' arriba_fusor.jsonl >> transcripts.txt ; done
...

» cat transcripts.txt | sort | uniq -c
  50 refseq:NM_001320454.2

I still have concerns about the behavior of UtaDatabase.get_transcripts. I think that function should probably be modified to produce more consistent results, but with the change from this PR, I don't think that fix is as high-priority as it was.

zealws · 2025-10-08T16:37:54Z

@jarbesfeld this isn't super pressing, but it'd be nice to have a unit-test that fails before this change and passes after.

Maybe you could write a test that calls ExonGenomicCoordsMapper.genomic_to_tx_segment to confirm the longest transcript (NM_001320454.2) is selected for this gene MIR9-1HG (chromosome 1, position 156421555)? It'd probably have to be run several times to confirm the failure since the current behavior is non-deterministic, but it should pass consistently after your change.

jarbesfeld · 2025-10-08T17:21:45Z

@zealws I added in the MIR9-1HG example as a test case. I was unsure if this is a test that I should have duplicate times, so please let me know if you have any feedback.

zealws · 2025-10-08T17:28:35Z

Awesome!

As suspected, that test fails randomly when run without the associated change to genomic_to_tx_segment.

~/dev/cool-seq-tool » git status
On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   tests/mappers/test_exon_genomic_coords.py

~/dev/cool-seq-tool » for x in $(seq 10) ; do pytest tests/mappers/test_exon_genomic_coords.py::test_genomic_to_transcript_fusion_context | grep '^tests/mappers/test_exon_genomic_coords.py' ; done
tests/mappers/test_exon_genomic_coords.py F                              [100%]
tests/mappers/test_exon_genomic_coords.py:1082: AssertionError
tests/mappers/test_exon_genomic_coords.py F                              [100%]
tests/mappers/test_exon_genomic_coords.py:1082: AssertionError
tests/mappers/test_exon_genomic_coords.py F                              [100%]
tests/mappers/test_exon_genomic_coords.py:1082: AssertionError
tests/mappers/test_exon_genomic_coords.py F                              [100%]
tests/mappers/test_exon_genomic_coords.py:1082: AssertionError
tests/mappers/test_exon_genomic_coords.py .                              [100%]
tests/mappers/test_exon_genomic_coords.py F                              [100%]
tests/mappers/test_exon_genomic_coords.py:1082: AssertionError
tests/mappers/test_exon_genomic_coords.py F                              [100%]
tests/mappers/test_exon_genomic_coords.py:1082: AssertionError
tests/mappers/test_exon_genomic_coords.py F                              [100%]
tests/mappers/test_exon_genomic_coords.py:1082: AssertionError
tests/mappers/test_exon_genomic_coords.py F                              [100%]
tests/mappers/test_exon_genomic_coords.py:1082: AssertionError
tests/mappers/test_exon_genomic_coords.py F                              [100%]
tests/mappers/test_exon_genomic_coords.py:1082: AssertionError

1 pass, 9 failures. And the returned transcript isn't usually the same in the failed tests.

zealws · 2025-10-08T17:30:30Z

For comparison, against the test-lc branch, passes 10/10:

~/dev/cool-seq-tool » for x in $(seq 10) ; do pytest tests/mappers/test_exon_genomic_coords.py::test_genomic_to_transcript_fusion_context | grep '^tests/mappers/test_exon_genomic_coords.py' ; done
tests/mappers/test_exon_genomic_coords.py .                              [100%]
tests/mappers/test_exon_genomic_coords.py .                              [100%]
tests/mappers/test_exon_genomic_coords.py .                              [100%]
tests/mappers/test_exon_genomic_coords.py .                              [100%]
tests/mappers/test_exon_genomic_coords.py .                              [100%]
tests/mappers/test_exon_genomic_coords.py .                              [100%]
tests/mappers/test_exon_genomic_coords.py .                              [100%]
tests/mappers/test_exon_genomic_coords.py .                              [100%]
tests/mappers/test_exon_genomic_coords.py .                              [100%]
tests/mappers/test_exon_genomic_coords.py .                              [100%]

Use longest compatible remaining

9309535

jarbesfeld self-assigned this Oct 8, 2025

jarbesfeld requested a review from a team as a code owner October 8, 2025 13:39

jarbesfeld added bug Something isn't working enhancement New feature or request priority:medium Medium priority labels Oct 8, 2025

Add MIR9-1HG test

6a58add

korikuzma approved these changes Oct 13, 2025

View reviewed changes

jarbesfeld merged commit 6c8cbad into main Oct 13, 2025
18 checks passed

jarbesfeld deleted the test-lc branch October 13, 2025 12:24

zealws mentioned this pull request Oct 13, 2025

UTA Query for transcript is non-deterministic #435

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat!: Use longest compatible remaining transcript #439

feat!: Use longest compatible remaining transcript #439

Uh oh!

jarbesfeld commented Oct 8, 2025

Uh oh!

zealws commented Oct 8, 2025

Uh oh!

zealws commented Oct 8, 2025

Uh oh!

jarbesfeld commented Oct 8, 2025 •

edited

Loading

Uh oh!

zealws commented Oct 8, 2025

Uh oh!

zealws commented Oct 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat!: Use longest compatible remaining transcript #439

feat!: Use longest compatible remaining transcript #439

Uh oh!

Conversation

jarbesfeld commented Oct 8, 2025

Uh oh!

zealws commented Oct 8, 2025

Uh oh!

zealws commented Oct 8, 2025

Uh oh!

jarbesfeld commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zealws commented Oct 8, 2025

Uh oh!

zealws commented Oct 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jarbesfeld commented Oct 8, 2025 •

edited

Loading