Skip to content

neccton-algo/PNMI_ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PNMI_ML, Plankton Diversity Analysis Pipeline

WP9 study case on the Parc Naturel Marin d'Iroise

This pipeline computes diversity metrics for plankton communities and explores their relationships with environmental variables from CMEMS. The analysis progresses from basic diversity index calculations through correlation analysis to advanced machine learning models (XGBoost and Boosted Regression Trees) for predicting diversity patterns based on oceanographic conditions.

Data source

PNMI Plankton Data
Long-term monitoring dataset (2010–2023) from the Parc Naturel Marin d'Iroise, including:
• Zooplankton abundance and morphological characteristics (UVP5 imaging data, ~650 samples with >655,000 individual organism images)
• Phytoplankton abundance at species and genus levels (~785 samples)
• Environmental variables: temperature, salinity, and other water column properties
The data can be found here and should be renamed after downloading according to their name on the website. It was presented in this data paper.

CMEMS Data: Copernicus Marine Environmental Monitoring Service oceanographic variables (reanalysis/analysis products):
Physical: surface temperature, surface salinity, mixed layer thickness
Chemical: nutrients (PO₄, Si, NO₃, NH₄), dissolved oxygen (O₂), dissolved inorganic carbon (DIC), pH
Biological: chlorophyll, phytoplankton biomass, zooplankton biomass, net primary productivity

Validation metrics

Machine Learning Model Validation:

Regression Metrics (for diversity/biomass prediction):
• Mean Absolute Error (MAE)
• Root Mean Square Error (RMSE)
• R² coefficient of determination
• Cross-validation (5-fold CV)

Classification Metrics (if applicable):

• Precision, recall, F1-score
• Area Under the Receiver Operating Characteristic Curve (AUC-ROC)

Dependencies

Core data manipulation & visualization

tidyverse (2.0.0) # ggplot2, dplyr, tidyr, ggplot2
lubridate (1.9.4) # Date handling
reshape2 (1.4.4) # Data reshaping
patchwork (1.2.0) # Combining plots
ggpubr (0.6.0) # Publication-ready plots
RColorBrewer (1.1.3) # Color palettes
data.table (1.17.0) # Fast file reading

Statistical & diversity analysis

vegan (2.6.6.1) # Diversity indices & multivariate statistics
caret (7.0.1) # Machine learning framework
MLmetrics (1.1.3) # Model performance metrics

Visualization

corrplot (0.92) # Correlation matrix visualization

Machine learning

randomForest (4.7.1.2) # Random forest models
xgboost (1.7.9.1) # Gradient boosting

File organization

• 0_setup.R
Common settings, packages, and utility functions used across all scripts.

• 8_Diversity_indices-1_1_Computation_zooplankton.R
Compute diversity indices for zooplankton communities and analyze temporal patterns.

• 8_Diversity_indices-1_2_Computation_phytoplankton.R
Compute diversity indices for phytoplankton at multiple taxonomic levels and depths.

• 8_Diversity_indices-2_1_Correlation_Plankton_PNMI_VS_CMEMS.R
Explore correlations between plankton diversity and CMEMS oceanographic variables.

• 8_Diversity_indices-2_2_BRT_Plankton_PNMI_VS_CMEMS.R
Predictive modeling of plankton diversity using machine learning (XGBoost and Boosted Regression Trees).

About

PNMI_ML, Plankton Diversity Analysis Pipeline

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages