We propose LandSegmenter, an LULC FM framework that resolves three-
stage challenges at the input, model, and output levels.
- 👽 LandSegmenter: A task-specific Foundation Model (FM) for Land Use and Land Cover (LULC) mapping, characterized by high flexibility in both inputs (multi-band, multi-resolution imagery) and outputs (customizable category definitions), supporting both zero-shot inference and fine-tuning.
- 🕹️ Confidence-guided Fusion: A class-wise confidence-guided fusion strategy that boosts LandSegmenter’s zero-shot inference through dynamic integration of CLIP's knowledge to reduce semantic omissions.
- 🗺️ LAS: A large-scale weakly supervised dataset for LandSegmenter training, with ~150K globally sampled points from diverse satellite sensors and LULC products, spanning from high-resolution RGB to low-resolution multispectral imagery.
LandSegmenter is the first LULC FM trained with LAS, designed with high flexibility to handle diverse input modalities and customizable category settings. It builds on SAM2's backbone for robust multi-scale spatial representation, further enhanced by multispectral features from DOFA and high-frequency components for refined structural details. A text-based prompter derived from GeoRSCLIP, which takes class names as inputs, further strengthens its semantic understanding, enabling concept-aware and adaptable segmentation across heterogeneous data sources.
Fig. 1. Architecture of LandSegmenter, where the attention-based fusion module (AFM) is depicted per block to indicate the consistent additional input at every stage. The embeddings sent to the decoder are the summation of the outputs from Blocks 4(upsampled) and 3. For simplicity, we omit this operator in the figure.
You can download the model weights from HuggingFace.
First, clone the repo:
git clone https://github.com/zhu-xlab/LandSegmenter.git && cd LandSegmenterWe provide two installation options: with or without Docker.
- With docker (recommended):
sh launch_docker.sh /path/to/data
docker exec -it $USER-landseg-1 bash
# please replace your username at $USER, and the last number is $cid set in launch_docker.sh
conda activate sam2- Without docker:
conda env create -f environment.yml
conda activate sam2
pip install -e . --no-build-isolationDownload the checkpoints:
cd checkpoints
sh download_ckpts.sh
cd ..Prepare the data:
- All the data can be downloaded from HuggingFace-LAS. For details, see LAnd Segment (LAS) dataset and Test datasets.
- For test purposes, you can prepare only the
testdata. - Please structure your data as follows:
data # geotiff
├── exact # accurately labeled training part
│ ├── openearthmap_train.lmdb
│ ├── dynamicearthnet_seasonal.lmdb
├── weak # weakly labeled training part
| ├── worldcover_v100.lmdb
| ├── worldcover_v200.lmdb
| ├── ...
├── ft_train # fine-tuning training sets
| ├── dw_train_lmdb
│ ├── multisenge_train.lmdb
│ ├── ...
├── test # test sets
| ├── dw_test_lmdb
│ ├── multisenge_test.lmdbChange the data path in dataset/datasets_settings.py at Line 3: data_dir=...
Test on the full set and calculate accuracy:
sh test_OV.sh $CUDA_ID $DATASET
# Set your $CUDA_ID for export CUDA_VISIBLE_DEVICES=$CUDA_ID
# $DATASET: potsdam / loveda / nyc / dw / osm / multisengeTo test single mothed, replace ProxyCLIP/test_from_lmdb_all.py with ProxyCLIP/test_from_lmdb_landsegmenter.py, ProxyCLIP/test_from_lmdb_pc.py, ProxyCLIP/test_from_lmdb_fusion.py for only testing LandSegmenter, ProxyCLIP, Fusion, respective.
We also provide a demo for playing around without data downloading (demo data are at demo_data/image):
CUDA_VISIBLE_DEVICES=0 python ProxyCLIP/demo.py --demo_dataset multisenge
# demo_dataset can be set to: potsdam / loveda / nyc / dw / osm / multisengeNote that the batch size can affect the final outputs of ProxyCLIP models as well as the fusion results, since ProxyCLIP applies a batch-wise adaptive normalization and masking approach to the similarity matrix. Larger batch sizes generally yield more robust results compared to batch_size=1. All results reported in our paper were generated with batch_size=20.
sh train_landsegmenter.sh
# Set your $CUDA_ID for export CUDA_VISIBLE_DEVICES=$CUDA_ID
# $DATASET: potsdam / loveda / nyc / dw / osm / multisengeNotice that the script automatically detects and uses all the GPUs available. Please set your own CUDA_VISIBLE_DEVICES, or remove the CUDA_VISIBLE_DEVICES if you use slurm with sbath.
sh train_fine_tune.sh $CUDA_ID $DATASETWe curate the LAnd Segment (LAS) dataset, which builds upon existing LULC products as weak supervision to tackle the scarcity of medium-to-low resolution annotations for model training. The dataset is sampled from regional grids covering diverse land surface types, with an exact-to-weak label ratio of 1:4. LAS also incorporates a larger number of region-level classes, enriching the semantic representation of Earth surface structures.
Fig. 2: LAS dataset for LandSegmenter training. Middle: geographic distributions of each subset. From left to right, read the distributions of high-resolution, Sentinel-2 (S2), and Landsat-8/9 (L8/9) subsets. Top and Bottom: examples from each subset.
Designed to bridge gaps between natural image processing and LULC mapping, LAS addresses:
- integration of multispectral RS data beyond RGB;
- adaptation to medium-to-low-resolution RS imagery;
- domain knowledge of land surface properties.
As a result, LAS includes ∼150k globally distributed sample points (∼311k image patches and ∼200k label masks) across eight subsets:
- high-resolution RGB subset from OpenEarthMap (GSD: 0.25–0.5m, patch size: 320);
- RGB-NIR subset from DynamicEarthNet (GSD: 3–4m; patch size: 256);
- three Sentinel-2 (S2) subsets (GSD: 10m; 12–13 bands; patch size: 264);
- three Landsat-8/9 (L8/9) subsets (GSD: 30m; 7–11 bands; patch size: 264).
- Six weakly labeled subsets can be downloaded from HuggingFace.
- Two accurately labeled subsets are publicly available: OpenEarthMap & DynamicEarthNet.
The scripts used for downloading the weakly labeled data can be found under data_preparation.
We also provide our processed data (in lmdb format) on HuggingFace: weak & exact.
We utilized 6 accurately labeled LULC datasets which are publicly available for evaluation:
For reproducibility, we also provide our processed data in lmdb format on HuggingFace:
This repository builds upon SAM2, SAM-Adapter, ProxyCLIP, and CromSS. We sincerely thank the authors for their contributions to the open-source community. All usage of these resources is subject to their original licenses.
@misc{liu2025landsegmenter,
title={LandSegmenter: Towards a Flexible Foundation Model for Land Use and Land Cover Mapping},
author={Chenying Liu and Wei Huang and Xiao Xiang Zhu},
year={2025},
eprint={2511.08156},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.08156},
}