-
Notifications
You must be signed in to change notification settings - Fork 0
Stef/make negs filtered interactions #57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
6a9c1eb
af58270
06ac7d5
b67e075
36b1166
d064cdb
71e0773
00e463a
2330f0b
48bad05
d606fd5
db7b256
ab7907b
444946c
c6df493
1238605
35e07c6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the post processing pipeline, we have now two similar modules - |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| from miRBench.encoder import get_encoder | ||
| from miRBench.predictor import get_predictor | ||
| import pandas as pd | ||
| import argparse | ||
|
|
||
| def add_seeds(df): | ||
| seed_types = ["Seed6mer", "Seed6merBulgeOrMismatch"] | ||
| for tool in seed_types: | ||
| encoder = get_encoder(tool) | ||
| predictor = get_predictor(tool) | ||
| encoded_input = encoder(df) | ||
| output = predictor(encoded_input) | ||
| df[tool] = output | ||
| return df | ||
|
|
||
| def main(): | ||
| parser = argparse.ArgumentParser(description="Add seed types via miRBench") | ||
| parser.add_argument("--ifile", type=str, help="Input file") | ||
| parser.add_argument("--ofile", type=str, help="Output file with seed types") | ||
| args = parser.parse_args() | ||
|
|
||
| # Read input file | ||
| df = pd.read_csv(args.ifile, sep='\t') | ||
|
|
||
| # Add seed types | ||
| df_seedtypes = add_seeds(df) | ||
|
|
||
| # Write seed types to file | ||
| df_seedtypes.to_csv(args.ofile, sep='\t', index=False) | ||
|
|
||
| if __name__ == "__main__": | ||
| main() |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| # Seed Type Annotation | ||
|
|
||
| ## Overview | ||
|
|
||
| This script annotates seed types using the miRBench package. | ||
|
|
||
| ## Features | ||
|
|
||
| - Indicates the presence of Seed6mer and Seed6merBulgeOrMismatch per noncodingRNA:gene pair, using the **miRBench** seed encoders and predictors. | ||
|
|
||
| ## Requirements | ||
|
|
||
| - Python 3.8 | ||
| - Required Python packages: | ||
| - `miRBench` | ||
| - `pandas` | ||
| - `argparse` | ||
|
|
||
| ## Usage | ||
|
|
||
| Run the script with the following command: | ||
|
|
||
| ```bash | ||
| python add_seed_types.py --ifile <input_file> --ofile <output_file> | ||
| ``` | ||
|
|
||
| ### Arguments | ||
|
|
||
| - `--ifile` (required): Path to input file with `noncodingRNA` and `gene` columns | ||
| - `--ofile` (required): Path to output file with added seed types (columns: `Seed6mer`, `Seed6merBulgeOrMismatch`) | ||
|
|
||
|
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| import pandas as pd | ||
| import argparse | ||
|
|
||
| def filter_interactions(df): | ||
| # Canonical seed: Seed6mer is 1 | ||
| df_canonical = df[df['Seed6mer'] == 1].copy() | ||
| df_canonical = df_canonical.drop(columns=['Seed6mer', 'Seed6merBulgeOrMismatch']) | ||
|
|
||
| # Note that in miRBench package, Seed6merBulgeOrMismatch is inclusive of Seed6mer | ||
| # Non-canonical seed: Seed6merBulgeOrMismatch is 1 AND Seed6mer is 0 | ||
| df_noncanonical = df.loc[(df["Seed6merBulgeOrMismatch"] == 1) & (df["Seed6mer"] == 0)].copy() | ||
| df_noncanonical = df_noncanonical.drop(columns=['Seed6mer', 'Seed6merBulgeOrMismatch']) | ||
|
|
||
| # No seed: Seed6merBulgeOrMismatch is 0 | ||
| df_noseed = df[df['Seed6merBulgeOrMismatch'] == 0].copy() | ||
| df_noseed = df_noseed.drop(columns=['Seed6mer', 'Seed6merBulgeOrMismatch']) | ||
|
|
||
| return df_canonical, df_noncanonical, df_noseed | ||
|
|
||
| def write_interactions(df, ofile): | ||
| df.to_csv(ofile, sep='\t', index=False) | ||
|
|
||
| def main(): | ||
| parser = argparse.ArgumentParser(description="Filter canonical/non-canonical/no-seed interactions, for all Manakov datasets") | ||
| parser.add_argument("--ifile", type=str, help="Input file with seed types") | ||
| parser.add_argument("--canonical_ofile", type=str, help="Output file for canonical seed types") | ||
| parser.add_argument("--noncanonical_ofile", type=str, help="Output file for noncanonical seed types") | ||
| parser.add_argument("--nonseed_ofile", type=str, help="Output file for nonseed types") | ||
| args = parser.parse_args() | ||
|
|
||
| # Read file with seed types | ||
| df_seed_types = pd.read_csv(args.ifile, sep='\t') | ||
|
|
||
| # Filter canonical/non-canonical/non-seed interactions | ||
| df_canonical, df_noncanonical, df_noseed = filter_interactions(df_seed_types) | ||
|
|
||
| # Write interactions to file | ||
| write_interactions(df_canonical, args.canonical_ofile) | ||
| write_interactions(df_noncanonical, args.noncanonical_ofile) | ||
| write_interactions(df_noseed, args.nonseed_ofile) | ||
|
|
||
| if __name__ == "__main__": | ||
| main() |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,37 @@ | ||
| # Filter interactions | ||
|
|
||
| ## Overview | ||
|
|
||
| This script filters a file with annotated seed types into canonical, non-canonical, and non-seed interactions. | ||
|
|
||
| ## Features | ||
|
|
||
| - Defines canonical, non-canonical, and non-seed interactions. | ||
| - Filters canonical, non-canonical, and non-seed interactions, and saves them into 3 distinct files. | ||
|
|
||
| ## Requirements | ||
|
|
||
| - Python 3.8 | ||
| - Required Python packages: | ||
| - `pandas` | ||
| - `argparse` | ||
|
|
||
| ## Usage | ||
|
|
||
| Run the script with the following command: | ||
|
|
||
| ```bash | ||
| python filter_interactions.py --ifile <input_file_with_seed_types> --canonical_ofile <output_file_with_canonical_interactions> --noncanonical_ofile <output_with_noncanonical_interactions> --nonseed <output_with_nonseed_interactions> | ||
| ``` | ||
|
|
||
| ### Arguments | ||
|
|
||
| - `--ifile`: Path to input file with seed type annotations (columns: Seed6mer, Seed6merBulgeOrMismatch). This is output of add_seed_types.py script. | ||
| - `--canonical_ofile`: Path to output file containing canonical (Seed6mer) interactions | ||
| - `--noncanonical_ofile`: Path to output file containing non-canonical (Seed6merBulgeOrMismatch but not Seed6mer) interactions. | ||
| - `--nonseed_ofile`: Path to output file containing non-seed (No Seed6merBulgeOrMismatch) interactions. | ||
|
|
||
| ## Note | ||
|
|
||
| Note that in the miRBench package, Seed6merBulgeOrMismatch is the most loose seed type and therefore include all other seed types defined in the miRBench package, including the Seed6mer. | ||
|
|
Uh oh!
There was an error while loading. Please reload this page.