Skip to content

Conversation

@jarbesfeld
Copy link
Contributor

closes #438

@jarbesfeld jarbesfeld self-assigned this Oct 8, 2025
@jarbesfeld jarbesfeld requested a review from a team as a code owner October 8, 2025 13:39
@jarbesfeld jarbesfeld added bug Something isn't working enhancement New feature or request priority:medium Medium priority labels Oct 8, 2025
@zealws
Copy link
Contributor

zealws commented Oct 8, 2025

This change does not address the inconsistency in the return values of UtaDatabase.get_transcripts, but by removing that call, this PR does fix the inconsistency in the transcript from fusor's point of view.

I re-ran the test I used to identify #435 and it now produces consistent results:

» rm -f transcripts.txt ; for i in $(seq 50) ; do fusor-annotator.py . ; jq -r '.fusor.structure[0].transcript' arriba_fusor.jsonl >> transcripts.txt ; done
...

» cat transcripts.txt | sort | uniq -c
  50 refseq:NM_001320454.2

I still have concerns about the behavior of UtaDatabase.get_transcripts. I think that function should probably be modified to produce more consistent results, but with the change from this PR, I don't think that fix is as high-priority as it was.

@zealws
Copy link
Contributor

zealws commented Oct 8, 2025

@jarbesfeld this isn't super pressing, but it'd be nice to have a unit-test that fails before this change and passes after.

Maybe you could write a test that calls ExonGenomicCoordsMapper.genomic_to_tx_segment to confirm the longest transcript (NM_001320454.2) is selected for this gene MIR9-1HG (chromosome 1, position 156421555)? It'd probably have to be run several times to confirm the failure since the current behavior is non-deterministic, but it should pass consistently after your change.

@jarbesfeld
Copy link
Contributor Author

jarbesfeld commented Oct 8, 2025

@zealws I added in the MIR9-1HG example as a test case. I was unsure if this is a test that I should have duplicate times, so please let me know if you have any feedback.

@zealws
Copy link
Contributor

zealws commented Oct 8, 2025

Awesome!

As suspected, that test fails randomly when run without the associated change to genomic_to_tx_segment.

~/dev/cool-seq-tool » git status
On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   tests/mappers/test_exon_genomic_coords.py

~/dev/cool-seq-tool » for x in $(seq 10) ; do pytest tests/mappers/test_exon_genomic_coords.py::test_genomic_to_transcript_fusion_context | grep '^tests/mappers/test_exon_genomic_coords.py' ; done
tests/mappers/test_exon_genomic_coords.py F                              [100%]
tests/mappers/test_exon_genomic_coords.py:1082: AssertionError
tests/mappers/test_exon_genomic_coords.py F                              [100%]
tests/mappers/test_exon_genomic_coords.py:1082: AssertionError
tests/mappers/test_exon_genomic_coords.py F                              [100%]
tests/mappers/test_exon_genomic_coords.py:1082: AssertionError
tests/mappers/test_exon_genomic_coords.py F                              [100%]
tests/mappers/test_exon_genomic_coords.py:1082: AssertionError
tests/mappers/test_exon_genomic_coords.py .                              [100%]
tests/mappers/test_exon_genomic_coords.py F                              [100%]
tests/mappers/test_exon_genomic_coords.py:1082: AssertionError
tests/mappers/test_exon_genomic_coords.py F                              [100%]
tests/mappers/test_exon_genomic_coords.py:1082: AssertionError
tests/mappers/test_exon_genomic_coords.py F                              [100%]
tests/mappers/test_exon_genomic_coords.py:1082: AssertionError
tests/mappers/test_exon_genomic_coords.py F                              [100%]
tests/mappers/test_exon_genomic_coords.py:1082: AssertionError
tests/mappers/test_exon_genomic_coords.py F                              [100%]
tests/mappers/test_exon_genomic_coords.py:1082: AssertionError

1 pass, 9 failures. And the returned transcript isn't usually the same in the failed tests.

@zealws
Copy link
Contributor

zealws commented Oct 8, 2025

For comparison, against the test-lc branch, passes 10/10:

~/dev/cool-seq-tool » for x in $(seq 10) ; do pytest tests/mappers/test_exon_genomic_coords.py::test_genomic_to_transcript_fusion_context | grep '^tests/mappers/test_exon_genomic_coords.py' ; done
tests/mappers/test_exon_genomic_coords.py .                              [100%]
tests/mappers/test_exon_genomic_coords.py .                              [100%]
tests/mappers/test_exon_genomic_coords.py .                              [100%]
tests/mappers/test_exon_genomic_coords.py .                              [100%]
tests/mappers/test_exon_genomic_coords.py .                              [100%]
tests/mappers/test_exon_genomic_coords.py .                              [100%]
tests/mappers/test_exon_genomic_coords.py .                              [100%]
tests/mappers/test_exon_genomic_coords.py .                              [100%]
tests/mappers/test_exon_genomic_coords.py .                              [100%]
tests/mappers/test_exon_genomic_coords.py .                              [100%]

@jarbesfeld jarbesfeld merged commit 6c8cbad into main Oct 13, 2025
18 checks passed
@jarbesfeld jarbesfeld deleted the test-lc branch October 13, 2025 12:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request priority:medium Medium priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use get_longest_compatible_transcript in _genomic_to_tx_segment

4 participants