Off-by-one error in postprocess pipeline

### Problem
In the postprocess pipeline script for making negatives, we attempt to sort the data by the miRNA family column but we use `nl -v 0` with `sort -k`. The former has 0-based indexing and the latter has 1-based indexing, resulting in an off-by-one error in selecting the column by which to sort the data. The data is instead sorted by miRNA name. This is present up to https://github.com/BioGeMT/miRBench_paper/releases/tag/v1.0.0.

### Consequences
Since the data is sorted by miRNA name and the data is processed by miRNA fam block in the make_neg_sets.py script, the same miRNA family may be processed more than once. Additionally, since the blacklisted genes are based on cluster ID that are not in a specific miRNA family (now miRNA name) block, genes assigned to miRNAs from the _same_ miRNA family are pooled as candidate genes to be sampled from. 

However, the `miraw_analysis` was carried out on the `gene` column as a sanity check and the evaluation metric was random (APS=0.50 refer to miRBench publication), proving no notable effect. 

### Solution
In future versions, `nl -v 0` should be changed to `nl -v 1`. 




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Off-by-one error in postprocess pipeline #58

Problem

Consequences

Solution

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Off-by-one error in postprocess pipeline #58

Description

Problem

Consequences

Solution

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions