Real-time genomic pathogen, resistance, and host range characterization from passive water sampling of wetland ecosystems (PRJEB96272)

Project Overview

This project investigates the microbial and viral communities, vertebrate biodiversity, and prevalence of antimicrobial resistance (AMR) genes in 12 European wetlands across Germany, France, and Spain, along the east atlantic flyway, a major bird migration corridor. Samples were collected from sites categorized by land use (anthropogenic vs. natural) using torpedo-shaped passive water samplers. From dual DNA/RNA extractions, four parallel analyses were conducted using Oxford Nanopore Technologies sequencing (MinION Mk1D/Mk1C, R10.4.1 flow cells):

Shotgun Metagenomics (DNA): Characterization of microbial communities and AMR genes.
Vertebrate Metabarcoding (12S rRNA rRNA): Identification of local vertebrate fauna using 12S rRNA rRNA amplicons.
Avian Influenza Virus (AIV) Analysis (RNA): Targeted whole-genome sequencing of AIV-positive samples.
RNA Viromics (RNA): Untargeted analysis of RNA virus communities.

This repository contains the bioinformatic workflows used to process the sequencing data.

ENA Files: https://www.ebi.ac.uk/ena/browser/view/PRJEB96272 for Metagenomic, 12S rRNA, and Virome data.

AIV Files: GSAID EPI4628641 (Note: Account required for download!)

Sample Information and Barcoding Scheme

⚠️ IMPORTANT: This table is essential for reproducing our analyses. It maps sampling locations to their corresponding barcodes and FASTQ files.

Site Coding System

Sites are coded using a [Country][Environment][Number] format:

Country: G (Germany), F (France), S (Spain)
Environment: A (Anthropogenic), N (Natural)
Number: Sequential identification within each category

1. DNA Shotgun Metagenomics Samples

Location	Sample ID	ONT Barcode	FASTQ Filename
Germany - Anthropogenic
GA1	GA1.1	BC01	SQK-RBK114-24_barcode01.GA1.1.fastq.tar.gz
GA1	GA1.2	BC01	SQK-RBK114-24_barcode01.GA1.2.fastq.tar.gz
GA2	GA2.1	BC03	SQK-RBK114-24_barcode03.GA2.1.fastq.tar.gz
GA2	GA2.2	BC03	SQK-RBK114-24_barcode03.GA2.2.fastq.tar.gz
GA3	GA3.1	BC05	SQK-RBK114-24_barcode05.GA3.1.fastq.tar.gz
GA3	GA3.2	BC05	SQK-RBK114-24_barcode05.GA3.2.fastq.tar.gz
Germany - Natural
GN1	GN1.1	BC17	SQK-RBK114-24_barcode17.GN1.1.fastq.tar.gz
GN1	GN1.2	BC07	SQK-RBK114-24_barcode07.GN1.2.fastq.tar.gz
GN2	GN2.1	BC07	SQK-RBK114-24_barcode07.GN2.1.fastq.tar.gz
GN2	GN2.2	BC07	SQK-RBK114-24_barcode07.GN2.2.fastq.tar.gz
GN3	GN3.1	BC09	SQK-RBK114-24_barcode09.GN3.1.fastq.tar.gz
GN3	GN3.2	BC09	SQK-RBK114-24_barcode09.GN3.2.fastq.tar.gz
France - Anthropogenic
FA1	FA1.1	BC19	SQK-RBK114-24_barcode19.FA1.1.fastq.tar.gz
FA1	FA1.2	BC19	SQK-RBK114-24_barcode19.FA1.2.fastq.tar.gz
FA2	FA2.1	BC21	SQK-RBK114-24_barcode21.FA2.1.fastq.tar.gz
FA2	FA2.2	BC13	SQK-RBK114-24_barcode13.FA2.2.fastq.tar.gz
France - Natural
FN1	FN1.1	BC11	SQK-RBK114-24_barcode11.FN1.1.fastq.tar.gz
FN1	FN1.2	BC11	SQK-RBK114-24_barcode11.FN1.2.fastq.tar.gz
Spain - Anthropogenic
SA1	SA1.1	BC23	SQK-RBK114-24_barcode23.SA1.1.fastq.tar.gz
SA1	SA1.2	BC15	SQK-RBK114-24_barcode15.SA1.2.fastq.tar.gz
Spain - Natural
SN1	SN1.1	BC13	SQK-RBK114-24_barcode13.SN1.1.fastq.tar.gz
SN1	SN1.2	BC13	SQK-RBK114-24_barcode13.SN1.2.fastq.tar.gz
SN2	SN2.1	BC15	SQK-RBK114-24_barcode15.SN2.1.fastq.tar.gz
SN2	SN2.2	BC15	SQK-RBK114-24_barcode15.SN2.2.fastq.tar.gz

2. 12S rRNA Vertebrate Metabarcoding Samples

Raw Data: torpedo12s_all.fastq.gz (multiplexed file containing all samples)

Location	Sample ID	ONT Barcode	9bp Tag
Germany - Anthropogenic
GA1	1	BC01	acataacgc
GA2	2	BC02	tagtggttc
GA3	3	BC03	caatagtga
Germany - Natural
GN1	4	BC04	taatacggt
GN2	5	BC05	agtcaaggc
GN3	6	BC06	actattggt
France - Anthropogenic
FA1	7	BC07	agtgtccag
FA2	8	BC08	aagacttcc
France - Natural
FN1	9	BC09	ttacagatc
Spain - Anthropogenic
SA1	10	BC10	ctccacttc
Spain - Natural
SN1	11	BC11	ctatccgcc
SN2	12	BC12	ctgcgagca
Controls
Negative PCR	c_neg_pcr	BC13	ttagtccac
Negative Extraction	c_neg_ext	BC14	cgcacagcc
Positive Control	c_pos_sample_dani	BC15	ttatatatg

3. AIV (RNA) Analysis Samples

AIV Files: GSAID EPI4628641

4. RNA Viromics Samples

Location	Sample ID	ONT Barcode	FASTQ Filename
Germany - Anthropogenic
GA1	GA1	BC01	SQK-RPB114-24_barcode01.GA1.fastq.tar.gz
GA2	GA2	BC02	SQK-RPB114-24_barcode02.GA2.fastq.tar.gz
GA3	GA3	BC03	SQK-RPB114-24_barcode03.GA3.fastq.tar.gz
Germany - Natural
GN1	GN1	BC04	SQK-RPB114-24_barcode04.GN1.fastq.tar.gz
GN2	GN2	BC05	SQK-RPB114-24_barcode05.GN2.fastq.tar.gz
GN3	GN3	BC06	SQK-RPB114-24_barcode06.GN3.fastq.tar.gz
France - Anthropogenic
FA1	FA1	BC08	SQK-RPB114-24_barcode08.FA1.fastq.tar.gz
FA2	FA2	BC02	SQK-RPB114-24_barcode02.FA2.fastq.tar.gz
France - Natural
FN1	FN1	BC01	SQK-RPB114-24_barcode01.FN1.fastq.tar.gz
Spain - Anthropogenic
SA1	SA1	BC11	SQK-RPB114-24_barcode11.SA1.fastq.tar.gz
Spain - Natural
SN1	SN1	BC10	SQK-RPB114-24_barcode10.SN1.fastq.tar.gz
SN2	SN2	BC12	SQK-RPB114-24_barcode12.SN2.fastq.tar.gz

Site Descriptions

Anthropogenic sites (n=6):
- GA1: Urban landfill pond
- GA2: Village wetland
- GA3: Urban pond
- FA1: Intensive duck farm
- FA2: Intensive duck farm
- SA1: Sewage treatment plant
Natural sites (n=6):
- GN1-3: Coastal marshes/peat bogs
- FN1: Wetland reserve
- SN1-2: Wetland reserves

Notes:

DNA shotgun samples: Duplicate extractions per site (_1 and _2 suffixes)
12S rRNA rRNA, AIV, Virome samples: Single extraction per site (location = sample ID)
12S rRNA rRNA samples: Include both ONT barcodes and 9bp tags for demultiplexing
AIV samples: Only AIV-positive samples proceed to consensus genome generation
All sites are lentic (non-flowing) wetlands
Raw FASTQ files are available at ENA under accession PRJEB96272

Analysis Pipeline Overview

This repository outlines four main analysis pipelines:

DNA Shotgun Metagenomics: Processes FASTQ files from environmental DNA for community analysis, assembly, and AMR/pathogen detection.
AIV (RNA) Analysis: Aligns reads from AIV-positive samples to generate consensus genomes.
Viral Metagenomics (Viromics): Processes cDNA reads from total environmental RNA to characterize the RNA virome.
12S rRNA rRNA Vertebrate Genomics: Processes 12S rRNA gene amplicons to identify vertebrate species.

flowchart TD
    A[Passive Water Sample] --> B[DNA]
    A --> C[RNA]
    
    %% DNA Branch
    B --> D[DNA Shotgun Metagenomics]
    B --> E[12S rRNA Vertebrate Genomics]
    
    %% DNA Shotgun Pipeline
    D --> D1[Basecalling & Demultiplexing<br/>Dorado v5.0.0]
    D1 --> D2[Adapter Trimming<br/>Porechop v0.2.4]
    D2 --> D3[Quality Filtering<br/>NanoFilt v2.8.0<br/>≥100 bp]
    D3 --> D4{Analysis Strategy}
    
    D4 --> D5[Community Analysis]
    D4 --> D6[Hazard Detection]
    
    D5 --> D5a[Rarefaction<br/>Seqkit v2.3.0<br/>87,000 reads]
    D5a --> D5b[Taxonomic Classification<br/>Kraken2 v2.1.2<br/>NCBI nt_core]
    D5a --> D5c[Pathogen ID - Reads<br/>Minimap2 v2.24 +<br/>MEGAN-CE v6.21.1]
    
    D6 --> D6a[Full Dataset]
    D6a --> D6b[De novo Assembly<br/>nanoMDBG v1.1]
    D6b --> D6c[Pathogen ID - Contigs<br/>Minimap2 v2.24 +<br/>MEGAN-CE v6.21.1]
    D6a --> D6h[Pathogen ID - Reads<br/>Minimap2 v2.24 +<br/>MEGAN-CE v6.21.1]
    D6b --> D6d[AMR Detection<br/>AMRFinderPlus v4.0.23<br/>+ Prodigal v2.6.3]
    D6b --> D6e[Plasmid Detection<br/>PlasmidFinder v2.1.6]
    D6a --> D6f[Virulence Factors<br/>DIAMOND v2.1.13<br/>vs VFDB]
    
    %% 12S rRNA Pipeline
    E --> E1[PCR Amplification<br/>12S rRNAV05 primers<br/>+ human blocking]
    E1 --> E2[Demultiplexing<br/>OBITools4 v1.3.1<br/>9 bp tags]
    E2 --> E3[Primer Trimming<br/>Cutadapt v4.2]
    E3 --> E4[Quality Filtering<br/>VSEARCH v2.21<br/>maxEE 1.0]
    E4 --> E5[Dereplication &<br/>Singleton Removal<br/>VSEARCH]
    E5 --> E6[Chimera Removal<br/>VSEARCH]
    E6 --> E7[OTU Clustering<br/>VSEARCH<br/>97% similarity]
    E7 --> E8[Taxonomic Assignment<br/>VSEARCH<br/>vs MIDORI2]
    
    %% RNA Branch
    C --> F[AIV Analysis]
    C --> G[Viral Metagenomics]
    
    %% AIV Pipeline
    F --> F1[Basecalling & Demultiplexing<br/>Dorado v5.0.0]
    F1 --> F2[Quality Filtering]
    F2 --> F3[Alignment to AIV References<br/>Minimap2 v2.28<br/>-ax map-ont]
    F3 --> F4[Best Reference Selection<br/>SAMtools v1.17]
    F4 --> F5[Consensus Generation<br/>BCFtools v1.17<br/>8 segments]
    
    %% Viromics Pipeline
    G --> G1[DNase Treatment]
    G1 --> G2[cDNA Synthesis<br/>SMART-9N protocol]
    G2 --> G3[Adapter Trimming<br/>Porechop v0.2.4]
    G3 --> G4[Quality Filtering<br/>NanoFilt v2.8.0]
    G4 --> G5[Translated Alignment<br/>DIAMOND BLASTx v2.1.13<br/>vs NCBI NR]
    G5 --> G6[RNA Virome Profile]
    
    %% Styling
    classDef dna fill:#e1f5fe,stroke:#01579b,stroke-width:2px
    classDef rna fill:#fce4ec,stroke:#880e4f,stroke-width:2px
    classDef tool fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
    
    class B,D,E,D1,D2,D3,D4,D5,D5a,D5b,D5c,D6,D6a,D6b,D6c,D6d,D6e,D6f,D6g,D6h,E1,E2,E3,E4,E5,E6,E7,E8 dna
    class C,F,G,F1,F2,F3,F4,F5,G1,G2,G3,G4,G5,G6 rna

Tools Used

This project integrates the following key bioinformatics tools:

Basecalling, Trimming and File Transformation:
- Dorado: Basecaller for ONT data (v5.0.0).
- Porechop: Adapter and barcode trimming (v0.2.4).
- NanoFilt: Quality and length filtering for ONT reads (v2.8.0).
- Seqkit: Toolkit for FASTA/Q sequence manipulation (v2.3.0).
Metagenomics & Assembly:
- Kraken2: K-mer based taxonomic classification (v2.1.2).
- metaFlye: Long-read assembler for metagenomes (v2.9.6).
- nanoMDBG: Long-read assembler for metagenomes (v1.1).
- Minimap2: Long-read alignment (v2.28).
- Racon: Consensus correction/polishing for assemblies (v1.5.0).
- Medaka: Consensus correction/polishing for assemblies (v1.7.2).
- Prokka: Prokaryotic genome annotation (v1.14.5).
Pathogen & AMR Analysis:
- AMRFinderPlus: Detection of AMR genes (v4.0.23).
- Prodigal: ORF prediction (v2.6.3).
- DIAMOND: Protein sequence alignment (v2.1.13).
- MEGAN-CE: Interactive metagenomic analysis (v6.21.1).
- PlasmidFinder: Plasmid detection from assemblies (v2.1.6).
Vertebrate & Virome Analysis:
- OBITools4: Barcode demultiplexing for amplicons (v1.3.1).
- Cutadapt: Primer trimming (v4.2).
- VSEARCH: Sequence analysis toolkit (clustering, chimera removal) (v2.21).
AIV & General Utilities:
- SAMtools: Utilities for SAM/BAM alignment files (v1.17).
- BCFtools: Utilities for variant calling and consensus generation (v1.17).

Repository Structure

dna_shotgun_analysis_pipeline.md: Detailed commands for the DNA shotgun metagenomics workflow.
aiv_rna_analysis_pipeline.md: Detailed commands for the AIV (RNA) analysis workflow.
virome_analysis_pipeline.md: Detailed commands for the viral metagenomics workflow.
rrna_vertebrate_analysis.md: Detailed commands for the 12S rRNA vertebrate analysis workflow.
Installation_tutorial.md: Guide for installing all required tools and databases.

Usage Workflow

DNA Shotgun Metagenomics Workflow

Read Processing: Reads were demultiplexed and trimmed with Porechop, then filtered to remove reads < 100 bp with NanoFilt.
Taxonomic Classification: Reads were classified using Kraken2 against the NCBI nt_core database. A dual strategy was used:
- For community analysis (beta-diversity): Samples were rarefied to 87,000 reads using Seqkit.
- For hazard detection (pathogens/AMR): The complete, non-rarefied dataset was used to maximize sensitivity.
Metagenome Assembly & Annotation: De novo assembly was performed with nanoMDBG, chosen over metaFlye because its lack of a minimum read length requirement prevented the >50% data loss that metaFlye would have caused. Assemblies were functionally annotated with Prokka.
Pathogen & AMR Gene Detection:
- Pathogen ID: Reads and contigs were aligned to nt_core with Minimap2 and assigned taxonomy with MEGAN-CE using a conservative Lowest Common Ancestor (LCA) approach.
- Virulence Factors: For specific samples (e.g., high Vibrio cholerae), reads were screened against the Virulence Factor Database (VFDB) using DIAMOND BLASTx.
- AMR Genes: AMRFinderPlus (in --plus mode with Prodigal for ORF prediction) was run on both reads and contigs.
- AMR Mobility: Contigs were screened for plasmids using PlasmidFinder.
For detailed commands, see dna_shotgun_analysis_pipeline.md.

AIV (RNA) Analysis Workflow

Read Processing: After basecalling and demultiplexing with Dorado (which removes primers), reads were quality filtered.
Alignment to Reference Genomes: Filtered reads were aligned to a custom European AIV reference database using Minimap2 (v2.28) with the -ax map-ont setting.
Consensus Sequence Generation: SAMtools was used to process alignments and identify the best-matching reference for each of the eight AIV segments. A consensus sequence for each segment was then generated using BCFtools.

For detailed commands, see aiv_rna_analysis_pipeline.md.

Viral Metagenomics (Viromics) Workflow

Library Prep & Read Processing: RNA was DNase-treated and then converted to cDNA using the Rapid SMART-9N protocol, which employs random priming and template switching. Barcoded amplicons were generated and sequenced. Reads were trimmed (Porechop) and filtered (NanoFilt).
Taxonomic Classification: Reads were taxonomically classified by translated alignment using DIAMOND BLASTx against the NCBI non-redundant (NR) protein database. Viral hits were tallied to profile the RNA virome of each sample.

For detailed commands, see virome_analysis_pipeline.md.

12S rRNA Vertebrate Genomics Workflow

Library Prep: A ~97 bp fragment of the 12S rRNA rRNA gene was amplified using 12S rRNAV05 primers with 9 bp tags. A human-blocking oligonucleotide was included. The final library was prepared with the Ligation Sequencing Kit (SQK-LSK114).
Read Processing: Reads were demultiplexed by their 9 bp tags using OBITools4. Primers were trimmed with Cutadapt.
OTU Clustering & Classification: The VSEARCH pipeline was used to:
- Filter reads by expected error (maxEE 1.0).
- Dereplicate sequences and remove singletons.
- Remove chimeras.
- Cluster sequences into Operational Taxonomic Units (OTUs) at 97% similarity.
Taxonomic Assignment: OTU representative sequences were identified via global alignment against the MIDORI2 reference database.

For detailed commands, see rrna_vertebrate_analysis.md.

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
Binning.md		Binning.md
Installation_tutorial.md		Installation_tutorial.md
README.md		README.md
dna_shotgun_analysis_pipeline.md		dna_shotgun_analysis_pipeline.md
rrna_vertebrate_analysis.md		rrna_vertebrate_analysis.md
virus_analysis_pipeline.md		virus_analysis_pipeline.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Real-time genomic pathogen, resistance, and host range characterization from passive water sampling of wetland ecosystems (PRJEB96272)

Project Overview

ENA Files: https://www.ebi.ac.uk/ena/browser/view/PRJEB96272 for Metagenomic, 12S rRNA, and Virome data.

AIV Files: GSAID EPI4628641 (Note: Account required for download!)

Sample Information and Barcoding Scheme

Site Coding System

1. DNA Shotgun Metagenomics Samples

2. 12S rRNA Vertebrate Metabarcoding Samples

2. 12S rRNA Vertebrate Metabarcoding Samples

3. AIV (RNA) Analysis Samples

4. RNA Viromics Samples

Site Descriptions

Notes:

Analysis Pipeline Overview

Tools Used

Repository Structure

Usage Workflow

DNA Shotgun Metagenomics Workflow

AIV (RNA) Analysis Workflow

Viral Metagenomics (Viromics) Workflow

12S rRNA Vertebrate Genomics Workflow

About

Uh oh!

Releases

Packages

ttmgr/Wetland_Health

Folders and files

Latest commit

History

Repository files navigation

Real-time genomic pathogen, resistance, and host range characterization from passive water sampling of wetland ecosystems (PRJEB96272)

Project Overview

ENA Files: https://www.ebi.ac.uk/ena/browser/view/PRJEB96272 for Metagenomic, 12S rRNA, and Virome data.

AIV Files: GSAID EPI4628641 (Note: Account required for download!)

Sample Information and Barcoding Scheme

Site Coding System

1. DNA Shotgun Metagenomics Samples

2. 12S rRNA Vertebrate Metabarcoding Samples

2. 12S rRNA Vertebrate Metabarcoding Samples

3. AIV (RNA) Analysis Samples

4. RNA Viromics Samples

Site Descriptions

Notes:

Analysis Pipeline Overview

Tools Used

Repository Structure

Usage Workflow

DNA Shotgun Metagenomics Workflow

AIV (RNA) Analysis Workflow

Viral Metagenomics (Viromics) Workflow

12S rRNA Vertebrate Genomics Workflow

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages