Skip to content

This repository contains the official implementation of the paper "LandSegmenter: Towards a Flexible Foundation Model for Land Use and Land Cover Mapping".

Notifications You must be signed in to change notification settings

zhu-xlab/LandSegmenter

Repository files navigation

LandSegmenter: Towards a Flexible Foundation Model for Land Use and Land Cover Mapping

This repository contains the official implementation of the paper "LandSegmenter: Towards a Flexible Foundation Model for Land Use and Land Cover Mapping".
Chenying Liu1,2  Wei Huang1  Xiao Xiang Zhu1,2
1Technical University of Munich (TUM)  2Munich Center for Machine Learning (MCML) 

[arXiv][Project]

We propose LandSegmenter, an LULC FM framework that resolves three- stage challenges at the input, model, and output levels.

Key features

  • 👽​​ LandSegmenter: A task-specific Foundation Model (FM) for Land Use and Land Cover (LULC) mapping, characterized by high flexibility in both inputs (multi-band, multi-resolution imagery) and outputs (customizable category definitions), supporting both zero-shot inference and fine-tuning.
  • 🕹️​ Confidence-guided Fusion: A class-wise confidence-guided fusion strategy that boosts LandSegmenter’s zero-shot inference through dynamic integration of CLIP's knowledge to reduce semantic omissions.
  • 🗺️​ LAS: A large-scale weakly supervised dataset for LandSegmenter training, with ~150K globally sampled points from diverse satellite sensors and LULC products, spanning from high-resolution RGB to low-resolution multispectral imagery.

LandSegmenter

LandSegmenter is the first LULC FM trained with LAS, designed with high flexibility to handle diverse input modalities and customizable category settings. It builds on SAM2's backbone for robust multi-scale spatial representation, further enhanced by multispectral features from DOFA and high-frequency components for refined structural details. A text-based prompter derived from GeoRSCLIP, which takes class names as inputs, further strengthens its semantic understanding, enabling concept-aware and adaptable segmentation across heterogeneous data sources.

Fig. 1. Architecture of LandSegmenter, where the attention-based fusion module (AFM) is depicted per block to indicate the consistent additional input at every stage. The embeddings sent to the decoder are the summation of the outputs from Blocks 4(upsampled) and 3. For simplicity, we omit this operator in the figure.

📦​ Model weights

You can download the model weights from HuggingFace.

📥 Installation

First, clone the repo:

    git clone https://github.com/zhu-xlab/LandSegmenter.git && cd LandSegmenter

We provide two installation options: with or without Docker.

  • With docker (recommended):
    sh launch_docker.sh /path/to/data
    docker exec -it $USER-landseg-1 bash  
    # please replace your username at $USER, and the last number is $cid set in launch_docker.sh
    conda activate sam2
  • Without docker:
    conda env create -f environment.yml
    conda activate sam2
    pip install -e . --no-build-isolation

🎰 Getting started

Download the checkpoints:

    cd checkpoints
    sh download_ckpts.sh
    cd ..

Prepare the data:

data # geotiff
├── exact # accurately labeled training part
│   ├── openearthmap_train.lmdb
│   ├── dynamicearthnet_seasonal.lmdb
├── weak  # weakly labeled training part
|   ├── worldcover_v100.lmdb
|   ├── worldcover_v200.lmdb
|   ├── ...
├── ft_train # fine-tuning training sets
|   ├── dw_train_lmdb
│   ├── multisenge_train.lmdb
│   ├── ...
├── test # test sets
|   ├── dw_test_lmdb
│   ├── multisenge_test.lmdb

Change the data path in dataset/datasets_settings.py at Line 3: data_dir=...

​🔮​​ Zero-shot Inference with LandSegmenter

Test on the full set and calculate accuracy:

    sh test_OV.sh $CUDA_ID $DATASET
    # Set your $CUDA_ID for export CUDA_VISIBLE_DEVICES=$CUDA_ID
    # $DATASET: potsdam / loveda / nyc / dw / osm / multisenge

To test single mothed, replace ProxyCLIP/test_from_lmdb_all.py with ProxyCLIP/test_from_lmdb_landsegmenter.py, ProxyCLIP/test_from_lmdb_pc.py, ProxyCLIP/test_from_lmdb_fusion.py for only testing LandSegmenter, ProxyCLIP, Fusion, respective.

We also provide a demo for playing around without data downloading (demo data are at demo_data/image):

    CUDA_VISIBLE_DEVICES=0 python ProxyCLIP/demo.py --demo_dataset multisenge
    # demo_dataset can be set to: potsdam / loveda / nyc / dw / osm / multisenge

Note that the batch size can affect the final outputs of ProxyCLIP models as well as the fusion results, since ProxyCLIP applies a batch-wise adaptive normalization and masking approach to the similarity matrix. Larger batch sizes generally yield more robust results compared to batch_size=1. All results reported in our paper were generated with batch_size=20.

👻​ LandSegmenter Training

    sh train_landsegmenter.sh
    # Set your $CUDA_ID for export CUDA_VISIBLE_DEVICES=$CUDA_ID
    # $DATASET: potsdam / loveda / nyc / dw / osm / multisenge

Notice that the script automatically detects and uses all the GPUs available. Please set your own CUDA_VISIBLE_DEVICES, or remove the CUDA_VISIBLE_DEVICES if you use slurm with sbath.

☄️​ LandSegmenter Fine-tuning

    sh train_fine_tune.sh $CUDA_ID $DATASET

LAnd Segment (LAS) dataset

We curate the LAnd Segment (LAS) dataset, which builds upon existing LULC products as weak supervision to tackle the scarcity of medium-to-low resolution annotations for model training. The dataset is sampled from regional grids covering diverse land surface types, with an exact-to-weak label ratio of 1:4. LAS also incorporates a larger number of region-level classes, enriching the semantic representation of Earth surface structures.

Fig. 2: LAS dataset for LandSegmenter training. Middle: geographic distributions of each subset. From left to right, read the distributions of high-resolution, Sentinel-2 (S2), and Landsat-8/9 (L8/9) subsets. Top and Bottom: examples from each subset.

🗒️ Core properties:

Designed to bridge gaps between natural image processing and LULC mapping, LAS addresses:

  • integration of multispectral RS data beyond RGB;
  • adaptation to medium-to-low-resolution RS imagery;
  • domain knowledge of land surface properties.

As a result, LAS includes ∼150k globally distributed sample points (∼311k image patches and ∼200k label masks) across eight subsets:

  • high-resolution RGB subset from OpenEarthMap (GSD: 0.25–0.5m, patch size: 320);
  • RGB-NIR subset from DynamicEarthNet (GSD: 3–4m; patch size: 256);
  • three Sentinel-2 (S2) subsets (GSD: 10m; 12–13 bands; patch size: 264);
  • three Landsat-8/9 (L8/9) subsets (GSD: 30m; 7–11 bands; patch size: 264).

​📥​ Download

Raw data:

The scripts used for downloading the weakly labeled data can be found under data_preparation.

LMDB:

We also provide our processed data (in lmdb format) on HuggingFace: weak & exact.

Test datasets

We utilized 6 accurately labeled LULC datasets which are publicly available for evaluation:

For reproducibility, we also provide our processed data in lmdb format on HuggingFace:

Acknowledgements

This repository builds upon SAM2, SAM-Adapter, ProxyCLIP, and CromSS. We sincerely thank the authors for their contributions to the open-source community. All usage of these resources is subject to their original licenses.

Citation

@misc{liu2025landsegmenter,
      title={LandSegmenter: Towards a Flexible Foundation Model for Land Use and Land Cover Mapping}, 
      author={Chenying Liu and Wei Huang and Xiao Xiang Zhu},
      year={2025},
      eprint={2511.08156},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.08156}, 
}

About

This repository contains the official implementation of the paper "LandSegmenter: Towards a Flexible Foundation Model for Land Use and Land Cover Mapping".

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published