WP9 study case on the Parc Naturel Marin d'Iroise
This pipeline computes diversity metrics for plankton communities and explores their relationships with environmental variables from CMEMS. The analysis progresses from basic diversity index calculations through correlation analysis to advanced machine learning models (XGBoost and Boosted Regression Trees) for predicting diversity patterns based on oceanographic conditions.
PNMI Plankton Data
Long-term monitoring dataset (2010–2023) from the Parc Naturel Marin d'Iroise, including:
• Zooplankton abundance and morphological characteristics (UVP5 imaging data, ~650 samples with >655,000 individual organism images)
• Phytoplankton abundance at species and genus levels (~785 samples)
• Environmental variables: temperature, salinity, and other water column properties
The data can be found here and should be renamed after downloading according to their name on the website. It was presented in this data paper.
CMEMS Data: Copernicus Marine Environmental Monitoring Service oceanographic variables (reanalysis/analysis products):
• Physical: surface temperature, surface salinity, mixed layer thickness
• Chemical: nutrients (PO₄, Si, NO₃, NH₄), dissolved oxygen (O₂), dissolved inorganic carbon (DIC), pH
• Biological: chlorophyll, phytoplankton biomass, zooplankton biomass, net primary productivity
Regression Metrics (for diversity/biomass prediction):
• Mean Absolute Error (MAE)
• Root Mean Square Error (RMSE)
• R² coefficient of determination
• Cross-validation (5-fold CV)
• Precision, recall, F1-score
• Area Under the Receiver Operating Characteristic Curve (AUC-ROC)
tidyverse (2.0.0) # ggplot2, dplyr, tidyr, ggplot2
lubridate (1.9.4) # Date handling
reshape2 (1.4.4) # Data reshaping
patchwork (1.2.0) # Combining plots
ggpubr (0.6.0) # Publication-ready plots
RColorBrewer (1.1.3) # Color palettes
data.table (1.17.0) # Fast file reading
vegan (2.6.6.1) # Diversity indices & multivariate statistics
caret (7.0.1) # Machine learning framework
MLmetrics (1.1.3) # Model performance metrics
corrplot (0.92) # Correlation matrix visualization
randomForest (4.7.1.2) # Random forest models
xgboost (1.7.9.1) # Gradient boosting
• 0_setup.R
Common settings, packages, and utility functions used across all scripts.
• 8_Diversity_indices-1_1_Computation_zooplankton.R
Compute diversity indices for zooplankton communities and analyze temporal patterns.
• 8_Diversity_indices-1_2_Computation_phytoplankton.R
Compute diversity indices for phytoplankton at multiple taxonomic levels and depths.
• 8_Diversity_indices-2_1_Correlation_Plankton_PNMI_VS_CMEMS.R
Explore correlations between plankton diversity and CMEMS oceanographic variables.
• 8_Diversity_indices-2_2_BRT_Plankton_PNMI_VS_CMEMS.R
Predictive modeling of plankton diversity using machine learning (XGBoost and Boosted Regression Trees).