-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Description
I'm using REDItools find-repeats command to generate a BED file of repetitive regions for filtering RNA editing sites. However, I've observed that the output appears to contain only homopolymeric repeats (consecutive identical nucleotides like AAAAA, CCCCC, TTTTT, GGGGG) and not other types of repetitive elements (such as tandem repeats, microsatellites, or transposable elements).
Command Used
python -m reditools find-repeats --min-length 5 --output repeats.min5.bed reference.fa
Observed Output
The generated BED file (repeats.min5.bed) has the following format:
1 1023 1029 6 C
1 1189 1194 5 C
1 2241 2248 7 T
1 3014 3021 7 A
1 3453 3460 7 A
Column 1: Chromosome
Column 2: Start position (0-based)
Column 3: End position
Column 4: Length of repeat
Column 5: Nucleotide type (A, C, G, or T)
All entries show single nucleotide types (A, C, G, T), indicating these are homopolymeric regions only.
Question
Is this the intended behavior of find-repeats? Specifically:
-
Is
find-repeatsdesigned to detect only homopolymeric repeats (consecutive identical nucleotides), or should it also detect other repetitive elements like:- Tandem repeats (e.g., ATATAT, GCGCGC)
- Microsatellites
- Transposable elements
- Other repetitive motifs
-
If it's intended to detect only homopolymers, is there a different REDItools command or parameter to detect broader repetitive elements?
-
If it should detect broader repeats, could this be a bug or limitation in the current implementation?
Additional Context
I notice that REDItools also has an --omopolymeric-span parameter in the analyze command that filters sites within homopolymer regions. This suggests that homopolymer filtering is important, but I'm wondering if there's a way to also filter broader repetitive elements using find-repeats or another method.
Thank you for your clarification!
Note: If this is expected behavior, I would appreciate any recommendations for tools or methods to detect broader repetitive elements (like RepeatMasker or Tandem Repeats Finder) that could be integrated into the RNA editing analysis pipeline.