Weihao Xiaβ
Cengiz Oztireli
University of Cambridge
AAAI 2026
2025/11/15π Codes are released.2025/11/08π BASIC is accepted to AAAI 2026 in Singapore.
Current evaluation protocols are limited in:
- Discriminative power: metrics often saturate and fail to reveal meaningful differences.
- Neuroscientific grounding: they poorly capture perceptual plausibility or human-like alignment.
- Visual cognition coverage: they overlook the multi-level nature of visual understanding.
We introduce BASIC (Brain-Aligned Structural, Inferential, and Contextual similarity), a discriminative, interpretable, and comprehensive framework designed to fill key gaps in brain visual decoding evaluation. BASIC combines semantic reasoning and structural matching to measure how closely decoded outputs align with reference stimuli across multiple levels.
BASIC evaluates decoding from three complementary perspectives:
- Structural: measures spatial organization and category boundaries using granularity-aware correspondence.
- Inferential: assesses semantic accuracy via object categories, attributes, and relations from MLLM captions.
- Contextual: evaluates global plausibility via MLLM-based scene reasoning, quantifying narrative coherence.
The visual decoding results should be structured as follows:
res_decoding
βββ cc2017
β βββ decofuse
β βββ mind-video
β βββ neuroclips
βββ eeg-neuro3d
β βββ neuro3d
βββ eeg-things
β βββ atm
β βββ cognitioncapturer
β βββ images
βββ fmri-shape
β βββ images
β βββ mind3d
β βββ mind3d++
βββ nsd
β βββ braindiffuser
β βββ brainguard
β βββ dream
β βββ images
β βββ mindbridge
β βββ mindeye
β βββ mindeye2
β βββ mindtuner
β βββ neuropictor
β βββ sdrecon
β βββ sepbrain
β βββ sttm
β βββ test_images
β βββ umbrae
β βββ unibrain
β βββ neurovla
βββ seed-dv
Download LLaVA checkpoints into detailcap/llava/model_weights.
# git lfs install
ckpt = 'llava-v1.5-7b' # 'llava-v1.5-13b', 'llava-next-vicuna-7b', 'llava-next-vicuna-13b'
from huggingface_hub import snapshot_download
snapshot_download(repo_id=f"liuhaotian/{ckpt}", local_dir=ckpt)# Step 1: Generate detailed captions for GT images and decoded images
models=("llava-v1.5-7b") # "llava-v1.5-13b" "llava-next-vicuna-7b", "llava-next-vicuna-13b", "llava-next-34b")
methods=(
"nsd/images"
"nsd/sdrecon/sub01"
"nsd/braindiffuser/sub01"
"nsd/mindeye/sub01"
"nsd/umbrae/sub01"
"nsd/dream/sub01"
"nsd/mindbridge/sub01"
"nsd/mindeye2/sub01"
"nsd/neurovla/sub01"
"nsd/sepbrain/sub01"
"nsd/unibrain/sub01"
"nsd/brainguard/sub01"
"nsd/neuropictor/sub01"
)
for model in "${models[@]}"; do
for method in "${methods[@]}"; do
echo "Processing model: $model, data: $method"
python detailcap/generate_detailed_image_caption.py --model_path "detailcap/llava/model_weights/${model}" --prompt detailcap/prompts/basic.txt \
--data_path "res_decoding/${method}" --save_path "res_decoding/captions/${model}/${method}" \
done
done# Step 2: Evaluate the detailed captions
for model in "${models[@]}"; do
refs="res_decoding/captions/${model}/nsd/images/fmricap.json"
for method in "${methods[@]}"; do
if [ "$method" == "nsd/images" ]; then
echo "Skipping evaluation for GT data: $method"
continue
fi
echo "Evaluating model: $model, data: $method"
preds="res_decoding/captions/${model}/${method}/fmricap.json"
python capture/eval_detailcap.py "${refs}" "${preds}" --is_strict True
done
doneNote
Set is_strict = False when performing VINDEX evaluation.
Download codes and pretrained models from IDEA-Research/Grounded-SAM-2, and follow the official instructions for environment setup.
cd basic/grounded-sam2
git clone https://github.com/IDEA-Research/Grounded-SAM-2We provide processed hierarchical categories in grounded-sam2/hierarchical_classes.
# Step 1: Generate hierarchical class captions for images
model="llava-next-vicuna-13b"
for data in "nsd/images"
do
python detailcap/generate_detailed_image_caption.py --model_path "detailcap/llava/model_weights/${model}" \
--data_path "res_decoding/${data}" --save_path "res_decoding/captions/${model}/${data}/hierarchical_classes" \
--prompt detailcap/prompts/hierarchical_class.txt
done# Step 2: Generate multigranular segmentation masks based on the hierarchical classes
prompt_path='basic/grounded-sam2/hierarchical_classes/nsd/class_dict.json'
methods=(
"nsd/images"
"nsd/sdrecon/sub01"
"nsd/braindiffuser/sub01"
"nsd/mindeye/sub01"
"nsd/umbrae/sub01"
"nsd/dream/sub01"
"nsd/mindbridge/sub01"
"nsd/mindeye2/sub01"
"nsd/neurovla/sub01"
"nsd/sepbrain/sub01"
"nsd/unibrain/sub01"
"nsd/brainguard/sub01"
"nsd/neuropictor/sub01"
)
tth=0.3
bth=0.25
for data in "${methods[@]}"; do
echo "Processing $data with box threshold $bth and text threshold $tth"
python batch_process_grounded_sam2_hierclass.py --text-prompt-path $prompt_path \
--img-path "res_decoding/${data}" --output-dir "res_decoding/${data}_b${bth}_t${tth}_hierarchical_llava" \
--box-threshold $bth --text-threshold $tth --img-size 512
done# Step 3: Evaluate the segmentation masks
methods=("sdrecon" "braindiffuser" "mindeye" "umbrae" "dream" "mindbridge" "mindeye2" "neurovla" "sepbrain" "unibrain" "brainguard" "neuropictor")
for method in "${methods[@]}"; do
echo "Calculating matching ratio for method: $method (bth=$bth, tth=$tth)"
python cal_hierarchical_matching_ratio_ap.py --dataset nsd --method "$method" --thbox "$bth" --thtxt "$tth"
doneThe codebase is built upon the following excellent projects: LLaVA, CAPTURE and Grounded SAM.
We also highlight a series of our works on multimodal brain decoding and benchmarking:
- DREAM: mirrors pathways in the human visual system for stimulus reconstruction.
- UMBRAE: interprets brain activations into multimodal explanations with task-specific MLLM prompting.
- VINDEX: explores the impact of different image feature spaces on fine-grained multimodal decoding.
- MEVOX: introduces Multi-Expert Vision systems for Omni-contextual eXplanations.
- BASIC: provides interpretable and multigranular benchmarking for brain visual decoding.
@inproceedings{xia2026basic,
title = {Multigranular Evaluation for Brain Visual Decoding},
author = {Xia, Weihao and Γztireli, Cengiz},
booktitle = {AAAI Conference on Artificial Intelligence (AAAI)},
year = {2026},
}