PNMI_ML, Plankton Diversity Analysis Pipeline

WP9 study case on the Parc Naturel Marin d'Iroise

This pipeline computes diversity metrics for plankton communities and explores their relationships with environmental variables from CMEMS. The analysis progresses from basic diversity index calculations through correlation analysis to advanced machine learning models (XGBoost and Boosted Regression Trees) for predicting diversity patterns based on oceanographic conditions.

Data source

PNMI Plankton Data
Long-term monitoring dataset (2010–2023) from the Parc Naturel Marin d'Iroise, including:
• Zooplankton abundance and morphological characteristics (UVP5 imaging data, ~650 samples with >655,000 individual organism images)
• Phytoplankton abundance at species and genus levels (~785 samples)
• Environmental variables: temperature, salinity, and other water column properties
The data can be found here and should be renamed after downloading according to their name on the website. It was presented in this data paper.

CMEMS Data: Copernicus Marine Environmental Monitoring Service oceanographic variables (reanalysis/analysis products):
• Physical: surface temperature, surface salinity, mixed layer thickness
• Chemical: nutrients (PO₄, Si, NO₃, NH₄), dissolved oxygen (O₂), dissolved inorganic carbon (DIC), pH
• Biological: chlorophyll, phytoplankton biomass, zooplankton biomass, net primary productivity

Validation metrics

Machine Learning Model Validation:

Regression Metrics (for diversity/biomass prediction):
• Mean Absolute Error (MAE)
• Root Mean Square Error (RMSE)
• R² coefficient of determination
• Cross-validation (5-fold CV)

Classification Metrics (if applicable):

• Precision, recall, F1-score
• Area Under the Receiver Operating Characteristic Curve (AUC-ROC)

Dependencies

Core data manipulation & visualization

tidyverse (2.0.0) # ggplot2, dplyr, tidyr, ggplot2
lubridate (1.9.4) # Date handling
reshape2 (1.4.4) # Data reshaping
patchwork (1.2.0) # Combining plots
ggpubr (0.6.0) # Publication-ready plots
RColorBrewer (1.1.3) # Color palettes
data.table (1.17.0) # Fast file reading

Statistical & diversity analysis

vegan (2.6.6.1) # Diversity indices & multivariate statistics
caret (7.0.1) # Machine learning framework
MLmetrics (1.1.3) # Model performance metrics

Visualization

corrplot (0.92) # Correlation matrix visualization

Machine learning

randomForest (4.7.1.2) # Random forest models
xgboost (1.7.9.1) # Gradient boosting

File organization

• 0_setup.R
Common settings, packages, and utility functions used across all scripts.

• 8_Diversity_indices-1_1_Computation_zooplankton.R
Compute diversity indices for zooplankton communities and analyze temporal patterns.

• 8_Diversity_indices-1_2_Computation_phytoplankton.R
Compute diversity indices for phytoplankton at multiple taxonomic levels and depths.

• 8_Diversity_indices-2_1_Correlation_Plankton_PNMI_VS_CMEMS.R
Explore correlations between plankton diversity and CMEMS oceanographic variables.

• 8_Diversity_indices-2_2_BRT_Plankton_PNMI_VS_CMEMS.R
Predictive modeling of plankton diversity using machine learning (XGBoost and Boosted Regression Trees).

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
figures		figures
scripts		scripts
.gitignore		.gitignore
CODEOWNERS		CODEOWNERS
LICENSE		LICENSE
NECCTON_PNMI.Rproj		NECCTON_PNMI.Rproj
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PNMI_ML, Plankton Diversity Analysis Pipeline

Data source

Validation metrics

Machine Learning Model Validation:

Classification Metrics (if applicable):

Dependencies

Core data manipulation & visualization

Statistical & diversity analysis

Visualization

Machine learning

File organization

About

Uh oh!

Releases 1

Packages

Languages

License

neccton-algo/PNMI_ML

Folders and files

Latest commit

History

Repository files navigation

PNMI_ML, Plankton Diversity Analysis Pipeline

Data source

Validation metrics

Machine Learning Model Validation:

Classification Metrics (if applicable):

Dependencies

Core data manipulation & visualization

Statistical & diversity analysis

Visualization

Machine learning

File organization

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages