Skip to content

Does find-repeats detect only homopolymeric repeats or all repetitive elements? #46

@AchiadChenzi

Description

@AchiadChenzi

Description

I'm using REDItools find-repeats command to generate a BED file of repetitive regions for filtering RNA editing sites. However, I've observed that the output appears to contain only homopolymeric repeats (consecutive identical nucleotides like AAAAA, CCCCC, TTTTT, GGGGG) and not other types of repetitive elements (such as tandem repeats, microsatellites, or transposable elements).

Command Used

python -m reditools find-repeats --min-length 5 --output repeats.min5.bed reference.fa

Observed Output

The generated BED file (repeats.min5.bed) has the following format:
1 1023 1029 6 C
1 1189 1194 5 C
1 2241 2248 7 T
1 3014 3021 7 A
1 3453 3460 7 A

Column 1: Chromosome
Column 2: Start position (0-based)
Column 3: End position
Column 4: Length of repeat
Column 5: Nucleotide type (A, C, G, or T)

All entries show single nucleotide types (A, C, G, T), indicating these are homopolymeric regions only.

Question

Is this the intended behavior of find-repeats? Specifically:

  1. Is find-repeats designed to detect only homopolymeric repeats (consecutive identical nucleotides), or should it also detect other repetitive elements like:

    • Tandem repeats (e.g., ATATAT, GCGCGC)
    • Microsatellites
    • Transposable elements
    • Other repetitive motifs
  2. If it's intended to detect only homopolymers, is there a different REDItools command or parameter to detect broader repetitive elements?

  3. If it should detect broader repeats, could this be a bug or limitation in the current implementation?

Additional Context

I notice that REDItools also has an --omopolymeric-span parameter in the analyze command that filters sites within homopolymer regions. This suggests that homopolymer filtering is important, but I'm wondering if there's a way to also filter broader repetitive elements using find-repeats or another method.

Thank you for your clarification!


Note: If this is expected behavior, I would appreciate any recommendations for tools or methods to detect broader repetitive elements (like RepeatMasker or Tandem Repeats Finder) that could be integrated into the RNA editing analysis pipeline.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions