Copernicus Foundation Model

This repository contains the official implementation of the paper "Towards a Unified Copernicus Foundation Model for Earth Vision" (ICCV 2025 oral).

Key features

🌍 Copernicus-Pretrain: A massive-scale pretraining dataset with 18.7M aligned images from all major Copernicus Sentinel missions, spanning from the Earth's surface to its atmosphere.
🤖 Copernicus-FM: A unified foundation model capable of processing any spectral or non-spectral sensor modality using extended dynamic hypernetworks and flexible metadata encoding.
📊 Copernicus-Bench: A systematic evaluation benchmark with 15 hierarchical downstream tasks ranging from preprocessing to specialized applications for each Sentinel mission.
🌐 Copernicus-Embed-025deg: An embedding dataset that provides a global embedding map (721x1440x768) at 0.25°, integrating various sources of satellite observations at an extremely high compression ratio.

Copernicus-Pretrain

Copernicus-Pretrain is an extension of the SSL4EO-S12 dataset to all major Sentinel missions (S1-S5P). The images are organized into ~310K regional grids (0.25°x0.25°, consistent with ERA5), densely covering the whole land surface and near-land ocean with time series from eight distinct Sentinel modalities.

🔽 Dataset access:

Raw format (GeoTiff): This version is available on HuggingFace.
Streaming format (WebDataset): This version is available on HuggingFace.

📂 Further details: Copernicus-Pretrain/

Copernicus-FM

Copernicus-FM is an extension of the DOFA foundation model that can process any spectral or non-spectral sensor modality using extended dynamic hypernetworks and flexible metadata encoding. The model is pretrained on the Copernicus-Pretrain dataset with masked image modeling and continual distillation.

🔽 Weights access: The model weights are available on HuggingFace.

📂 Further details: Copernicus-FM/

Copernicus-Bench

Copernicus-Bench is a systematic evaluation benchmark with 15 hierarchical downstream datasets spread into three level of applications covering all major Sentinel missions (S1,2,3,5P). Among them, 9 are derived from existing datasets, and 6 are newly curated.

Level	Name	Modality	Task	Source
L1	Cloud-S2	S2 TOA	segmentation (cloud)	CloudSEN12
L1	Cloud-S3	S3 OLCI	segmentation (cloud)	new
L2	EuroSAT-S1	S1 GRD	classification (LULC)	EuroSAT-SAR
L2	EuroSAT-S2	S2 TOA	classification (LULC)	EuroSAT
L2	BigEarthNet-S1	S1 GRD	classification (LULC)	BigEarthNet v2.0
L2	BigEarthNet-S2	S2 SR	classification (LULC)	BigEarthNet v2.0
L2	LC100Cls-S3	S3 OLCI	classification (LULC)	new
L2	DFC2020-S1	S1 GRD	segmentation (LULC)	DFC2020
L2	DFC2020-S2	S2 TOA	segmentation (LULC)	DFC2020
L2	LC100Seg-S3	S3 OLCI	segmentation (LULC)	new
L3	Flood-S1	S1 GRD	change detection (flood)	Kuro Siwo
L3	LCZ-S2	S2 TOA	classification (local climate zone)	So2Sat LCZ42
L3	Biomass-S3	S3 OLCI	regression (biomass)	new
L3	AQ-NO2-S5P	S5P NO2	regression (air quality)	new
L3	AQ-O3-S5P	S5P O3	regression (air quality)	new

🔽 Dataset access: The benchmark datasets are available on HuggingFace.

📂 Further details: Copernicus-Bench/

Copernicus-Embed-025deg

Copernicus-Embed-025deg is an embedding dataset that provides a global embedding map (721x1440x768) at 0.25°, integrating various sources of satellite observations at an extremely high compression ratio. It has been shown to be beneficial for linking Earth's surface to the atmosphere, unlocking new possibilities in the development of weather/climate foundation models.

🔽 Dataset access: The embedding datasets are available on HuggingFace.

📂 Further details: Copernicus-Embed-025deg/

License

This repo is licensed under the Apache License 2.0, with portions of third-party code licensed under the MIT/CC-BY-NC-4.0 License. The Copernicus-Pretrain dataset, the newly-curated datasets in Copernicus-Bench, and the pretrained weights of Copernicus-FM are licensed under the CC-BY-4.0 license.

Citation

@misc{wang2025unifiedcopernicusfoundationmodel,
      title={Towards a Unified Copernicus Foundation Model for Earth Vision}, 
      author={Yi Wang and Zhitong Xiong and Chenying Liu and Adam J. Stewart and Thomas Dujardin and Nikolaos Ioannis Bountos and Angelos Zavras and Franziska Gerken and Ioannis Papoutsis and Laura Leal-Taixé and Xiao Xiang Zhu},
      year={2025},
      eprint={2503.11849},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.11849}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
Copernicus-Bench		Copernicus-Bench
Copernicus-Embed-025deg		Copernicus-Embed-025deg
Copernicus-FM		Copernicus-FM
Copernicus-Pretrain		Copernicus-Pretrain
LICENSES		LICENSES
assets		assets
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Copernicus Foundation Model

Key features

Copernicus-Pretrain

Copernicus-FM

Copernicus-Bench

Copernicus-Embed-025deg

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

zhu-xlab/Copernicus-FM

Folders and files

Latest commit

History

Repository files navigation

Copernicus Foundation Model

Key features

Copernicus-Pretrain

Copernicus-FM

Copernicus-Bench

Copernicus-Embed-025deg

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages