Skip to content

Conversation

@katysloz
Copy link
Contributor

@katysloz katysloz commented Jul 8, 2025

This PR is for incorporating the programme anvi-report-dgrs into master. It is currently still on going as of 07.07.2025.

This programme is for searching for diversity-generating retroelements (see papers 1, 2 & 3) we use read recruitment information and single-nucleotide variants to search for putative variable regions and then BLAST for finding template regions matching those. Finally, the tool uses homology of reverse transcriptases from here and PFAM (RVT_1, RVT_2, RVT_3, RVT_N originally from Simon and Zimmerly), therefore, a default HMM collection called Reverse_Trascriptase has been added.

The tool can also profile the variability of the DGRs using the short read information, where it uses the built in function to search for 'primers' of the variable regions in the short reads. This captures more diversity including reads that will not match to the reference and then the user can use an external program like oligotyping to visualise these changes over samples.

Some work that needs doing before merging:

  • double check DGR variability profiling works across multiple examples and update code
  • remove old code using just fasta files and no read recruitment
  • create anvi-self-test for the programme
  • final checks including to check all parameters, make sure outptus are tab deliminated, and overall code flow (maybe some functions could be split - especially for @FlorianTrigodet and his favourite function get_blast_results) etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants