This repository contains the official implementation of the paper "Towards a Unified Copernicus Foundation Model for Earth Vision" (ICCV 2025 oral).
- 🌍 Copernicus-Pretrain: A massive-scale pretraining dataset with 18.7M aligned images from all major Copernicus Sentinel missions, spanning from the Earth's surface to its atmosphere.
- 🤖 Copernicus-FM: A unified foundation model capable of processing any spectral or non-spectral sensor modality using extended dynamic hypernetworks and flexible metadata encoding.
- 📊 Copernicus-Bench: A systematic evaluation benchmark with 15 hierarchical downstream tasks ranging from preprocessing to specialized applications for each Sentinel mission.
- 🌐 Copernicus-Embed-025deg: An embedding dataset that provides a global embedding map (721x1440x768) at 0.25°, integrating various sources of satellite observations at an extremely high compression ratio.
Copernicus-Pretrain is an extension of the SSL4EO-S12 dataset to all major Sentinel missions (S1-S5P). The images are organized into ~310K regional grids (0.25°x0.25°, consistent with ERA5), densely covering the whole land surface and near-land ocean with time series from eight distinct Sentinel modalities.
🔽 Dataset access:
- Raw format (GeoTiff): This version is available on HuggingFace.
- Streaming format (WebDataset): This version is available on HuggingFace.
📂 Further details: Copernicus-Pretrain/
Copernicus-FM is an extension of the DOFA foundation model that can process any spectral or non-spectral sensor modality using extended dynamic hypernetworks and flexible metadata encoding. The model is pretrained on the Copernicus-Pretrain dataset with masked image modeling and continual distillation.
🔽 Weights access: The model weights are available on HuggingFace.
📂 Further details: Copernicus-FM/
Copernicus-Bench is a systematic evaluation benchmark with 15 hierarchical downstream datasets spread into three level of applications covering all major Sentinel missions (S1,2,3,5P). Among them, 9 are derived from existing datasets, and 6 are newly curated.
| Level | Name | Modality | Task | Source |
|---|---|---|---|---|
| L1 | Cloud-S2 | S2 TOA | segmentation (cloud) | CloudSEN12 |
| L1 | Cloud-S3 | S3 OLCI | segmentation (cloud) | new |
| L2 | EuroSAT-S1 | S1 GRD | classification (LULC) | EuroSAT-SAR |
| L2 | EuroSAT-S2 | S2 TOA | classification (LULC) | EuroSAT |
| L2 | BigEarthNet-S1 | S1 GRD | classification (LULC) | BigEarthNet v2.0 |
| L2 | BigEarthNet-S2 | S2 SR | classification (LULC) | BigEarthNet v2.0 |
| L2 | LC100Cls-S3 | S3 OLCI | classification (LULC) | new |
| L2 | DFC2020-S1 | S1 GRD | segmentation (LULC) | DFC2020 |
| L2 | DFC2020-S2 | S2 TOA | segmentation (LULC) | DFC2020 |
| L2 | LC100Seg-S3 | S3 OLCI | segmentation (LULC) | new |
| L3 | Flood-S1 | S1 GRD | change detection (flood) | Kuro Siwo |
| L3 | LCZ-S2 | S2 TOA | classification (local climate zone) | So2Sat LCZ42 |
| L3 | Biomass-S3 | S3 OLCI | regression (biomass) | new |
| L3 | AQ-NO2-S5P | S5P NO2 | regression (air quality) | new |
| L3 | AQ-O3-S5P | S5P O3 | regression (air quality) | new |
🔽 Dataset access: The benchmark datasets are available on HuggingFace.
📂 Further details: Copernicus-Bench/
Copernicus-Embed-025deg is an embedding dataset that provides a global embedding map (721x1440x768) at 0.25°, integrating various sources of satellite observations at an extremely high compression ratio. It has been shown to be beneficial for linking Earth's surface to the atmosphere, unlocking new possibilities in the development of weather/climate foundation models.
🔽 Dataset access: The embedding datasets are available on HuggingFace.
📂 Further details: Copernicus-Embed-025deg/
This repo is licensed under the Apache License 2.0, with portions of third-party code licensed under the MIT/CC-BY-NC-4.0 License. The Copernicus-Pretrain dataset, the newly-curated datasets in Copernicus-Bench, and the pretrained weights of Copernicus-FM are licensed under the CC-BY-4.0 license.
@misc{wang2025unifiedcopernicusfoundationmodel,
title={Towards a Unified Copernicus Foundation Model for Earth Vision},
author={Yi Wang and Zhitong Xiong and Chenying Liu and Adam J. Stewart and Thomas Dujardin and Nikolaos Ioannis Bountos and Angelos Zavras and Franziska Gerken and Ioannis Papoutsis and Laura Leal-Taixé and Xiao Xiang Zhu},
year={2025},
eprint={2503.11849},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.11849},
}


