UTA Query for transcript is non-deterministic

### Describe the bug

The query for transcripts against the UTA database is non-deterministic, and returns an arbitrary matching transcript each time it's run.

This is a result of a join between `associated_accessions` and `tx_exon_aln_v` which doesn't fully constrain the rows in the `tx_exon_aln_v` table, resulting in the database choosing one arbitrarily. The fields of this arbitrary row are used to order the results, resulting in a random transcript being selected and returned.

I ran fusor 50 times using a script, and it resulted in a variety of transcripts being selected for the 5' partner (columns are count and transcript ID):
```
   8 refseq:NM_001320453.1
   6 refseq:NM_001320454.1
   4 refseq:NM_001320454.2
  11 refseq:NM_001320455.1
   3 refseq:NM_001320455.2
  14 refseq:NM_006365.2
   4 refseq:NM_006365.3
```

### Steps to reproduce

1. Download [this arriba fusion](https://gist.github.com/zealws/b8c5f598b6ab4894f499f11d2a9e9a97).
2. Use [fusor](https://github.com/cancervariants/fusor) to annotate this fusion.
3. Check the 5' partner's transcript.
4. Rerun fusor, and check the transcript again.

### Expected behavior

The chosen transcript is deterministic and does not change from one run to another.

### Current behavior

The chosen transcript is non-deterministic, and returns an arbitrary matching transcript each time it is run.

### Possible reason(s)

[The query](https://github.com/GenomicMedLab/cool-seq-tool/blob/main/src/cool_seq_tool/sources/uta_database.py#L793-L819) joins the tables `associated_accessions` with `tx_exon_aln_v` on `associated_accessions.tx_ac = tx_exon_aln_v.tx_ac` (this happens transitively through a join on the `transcript` with the same column).
For a given value of `associated_accessions.tx_ac`, there may be several matching `tx_exon_aln_v` rows (example below).

As a result, the join chooses an arbitrary row from `tx_exon_aln_v`. These rows differ in their `tx_start_i` and `tx_end_i`, which is used to order the transcripts, resulting in the transcript order varying each time the query is run.

Example query of matching values of `tx_exon_aln_v` for a given `tx_ac`:

```sql
uta=> select tx_ac, tx_start_i, tx_end_i from tx_exon_aln_v where tx_ac = 'NM_001320453.1';
     tx_ac      | tx_start_i | tx_end_i
----------------+------------+----------
 NM_001320453.1 |          0 |      173
 NM_001320453.1 |        173 |      248
 NM_001320453.1 |        248 |      344
 NM_001320453.1 |        344 |      444
 NM_001320453.1 |        444 |      794
 NM_001320453.1 |          0 |      173
 NM_001320453.1 |        173 |      248
 NM_001320453.1 |        248 |      344
 NM_001320453.1 |        344 |      444
 NM_001320453.1 |        444 |      794
(10 rows)
```

### Suggested fix

Find a way to make the join from `associated_accessions.tx_ac = tx_exon_aln_v.tx_ac` deterministic.

### Branch, commit, and/or version

main

### Screenshots

_No response_

### Environment details

_Not relevant_

### Additional details

_No response_

### Contribution

I can help prepare a PR, but I would like someone with a better understanding of the structure of the UTA database to determine how best to make the join deterministic.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UTA Query for transcript is non-deterministic #435

Describe the bug

Steps to reproduce

Expected behavior

Current behavior

Possible reason(s)

Suggested fix

Branch, commit, and/or version

Screenshots

Environment details

Additional details

Contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

UTA Query for transcript is non-deterministic #435

Description

Describe the bug

Steps to reproduce

Expected behavior

Current behavior

Possible reason(s)

Suggested fix

Branch, commit, and/or version

Screenshots

Environment details

Additional details

Contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions