PHLAME is a complete pipeline for the creation of intraspecies reference databases and the metagenomic detection of intraspecies clades, their relative frequency, and their estimated Divergence from the reference phylogeny.
The accepted raw inputs to PHLAME are:
- [1] A species-specific assembled reference genome in .fasta format
- [2] A collection of whole genome sequences of the same species in .fastq or .fasta format
- [3] Metagenomic sequencing data in either .fastq or aligned .bam format.
Link to preprint is here.
The data used in the manuscript is available here.
Pre-built reference databases are available in classifiers/
You can install PHLAME using either pip or conda:
pip install phlame
conda install -c bioconda phlame
PHLAME is in active development. Please make sure you have the latest version installed before running.
- python >=3.8, <3.13
- numpy - (tested with v1.20.3)
- matplotlib - (tested with v3.4.2)
- pandas - (tested with v1.2.5)
- biopython - (tested with v1.79)
- scipy - (tested with v1.6.2)
- statsmodels - (tested with v0.13.1)
- ete3 - (tested with v3.1.2)
- samtools (>=v1.15)
- bcftools (>=v1.2)
- RaXML - (tested with v8.2.13)
- Additionally, starting with raw sequencing read data will require an aligner (like bowtie2).
PHLAME uses intraspecies reference databases to profile strain-level diversity of individual species from metagenomic data. PHLAME is novelty-aware and will quantify the proportion of novel diversity in a sample that is not be explained by existing reference genomes. This is made possible through PHLAME's Divergence (DVb) metric, which estimates the point on individual branches of a phylogeny for which novel strains in a sample are inferred to diverge from known references.
This tutorial uses the small set of files found in example/ and is made to be run inside the example/ directory.
0. Conceptual introduction to PHLAME
- Collecting genomes for your species of interest
- Sequence data to candidate mutation table
- Creating a phylogeny
- Making a PHLAME database
- Integrating existing strain-level classifications into PHLAME
