[WIP] MAEB task selection #3867

isaac-chung · 2026-01-05T20:30:30Z

See the draft benchmarks. (For audio-text I actually use the full collection, no filtering) You'll also find the filtering notebook and the script to generate "Table 1".

@KennethEnevoldsen @AdnanElAssadi56 maybe another one for environmental or something?

Implements new task selection approach using correlation analysis and clustering for MAEB evaluation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4 <[email protected]>

- Add domain, category, and language checks to is_candidate_valid_removal to preserve at least one task from each unique domain, category, and language - Add top 5 longest tasks display for CLAP model reference timing - Add diagnostic cell for tasks with many negative correlations - Expand correlation thresholds to include 0.8 and 0.9 - Add Languages, Domains, Categories columns to summary table - Comment out license filtering to include all tasks - Handle empty model coverage gracefully with fallback logic 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

…ased tasks_to_keep - Move UMAP+HDBSCAN clustering right after initial correlation matrix - Define tasks_to_keep from outlier cluster (label -1) instead of empty list - Split function definitions to break circular dependency - Add domain counts cell after results DataFrame - Add model coverage distribution analysis (models at each task count) - Use models with >= 50 tasks for runtime estimation - Show task coverage in runtime output (N/M tasks with eval times) 🤖 Generated with [Claude Code](https://claude.ai/claude-code) Co-Authored-By: Claude <[email protected]>

- Add get_pairs_above_threshold helper to get all correlated pairs - Track skipped_pairs where neither task can be removed - Continue to next pair when current pair is protected - Clear skipped_pairs when task set changes after removal - Only stop when all pairs above threshold have been tried 🤖 Generated with [Claude Code](https://claude.ai/claude-code) Co-Authored-By: Claude <[email protected]>

Visualizes results_df with: - Blue gradient colormap (light to dark) - White background for NaN values - Adaptive text color (white for high scores, black for low) - Dynamic figure sizing based on data dimensions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Add MAEB(audio-text) benchmark with 17 cross-modal retrieval tasks (8 audio-to-text, 9 text-to-audio) selected via correlation threshold 0.95 - Inline task lists directly in MAEB benchmark objects - Add threshold 0.95 to task selection notebook - Convert comparison plot from 1x5 to 2x3 layout for 6 thresholds - Fix tasks_to_select_from to use modality-filtered tasks - Use models with complete eval times for runtime estimation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Expand MAEB(audio-text) benchmark from 17 to 29 tasks (14 A2T + 15 T2A) - Fix msclap model revision from "N/A" to "no_revision" to match results cache - Update benchmark contacts 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Script generates top 10 model rankings for MAEB(audio) and MAEB(audio-text) benchmarks using Borda count, with per-category averages. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

KennethEnevoldsen · 2026-01-06T10:39:27Z

scripts/task_selection/task_selection_maeb_corr_and_cluster_mieb_method.py

I generally like marimo, but damn this is not the easiest thing to review. This is one of the cases where you really need the results to know what is filtered and why (having to git pull and run it to see seems like a big drawback). Is it possible to convert it to an .ipynb or .md for the results?

Ya I can export a pdf or html or smth?

task_selection_maeb_corr_and_cluster_mieb_method.html

mteb/models/model_implementations/msclap_models.py

mteb/benchmarks/benchmarks/benchmarks.py

Samoed · 2026-01-06T13:31:30Z

Created overview table for tasks and where they're used. Also version for google sheets https://docs.google.com/spreadsheets/d/1wyTvW0q6TIat7RMmfimlNKXri9O7cs_S0uebGTNya0c/edit?usp=sharing

Table

	Task Name	Task description	Task type	Task language(s)	In MAEB(audio)	In MAEB(audio-text)
0	AmbientAcousticContext	The Ambient Acoustic Context dataset contains 1-second segments for activities that occur in a workplace setting. This is a downsampled version with ~100 train and ~50 test samples per class.	AudioClassification	eng-Latn	No	No
1	AmbientAcousticContextClustering	Clustering task based on a subset of the Ambient Acoustic Context dataset containing 1-second segments for workplace activities.	AudioClustering	eng-Latn	No	No
2	AudioCapsA2TRetrieval	Natural language description for any kind of audio in the wild.	Any2AnyRetrieval	eng-Latn	No	Yes
3	AudioCapsMiniReranking	A smaller subset of AudioCaps dataset preprocessed for audio reranking	AudioReranking	eng-Latn	Yes	No
4	AudioCapsT2ARetrieval	Natural language description for any kind of audio in the wild.	Any2AnyRetrieval	eng-Latn	No	Yes
5	AudioSetMini	AudioSet consists of an expanding ontology of 632 audio event classes and a collection of 2,084,320 human-labeled 10-second sound clips drawn from YouTube videos. This is a mini version that is sampled from the original dataset.	AudioMultilabelClassification	eng-Latn	No	No
6	AudioSetStrongA2TRetrieval	Retrieve all temporally-strong labeled events within 10s audio clips from the AudioSet Strongly-Labeled subset.	Any2AnyRetrieval	eng-Latn	No	Yes
7	AudioSetStrongT2ARetrieval	Retrieve audio segments corresponding to a given sound event label from the AudioSet Strongly-Labeled 10s clips.	Any2AnyRetrieval	eng-Latn	No	Yes
8	BeijingOpera	Audio classification of percussion instruments into one of 4 classes: `Bangu`, `Naobo`, `Daluo`, and `Xiaoluo`	AudioClassification	eng-Latn	No	No
9	BirdCLEF	BirdCLEF+ 2025 dataset for species identification from audio, focused on birds, amphibians, mammals and insects from the Middle Magdalena Valley of Colombia. Downsampled to 50 classes with 20 samples each.	AudioClassification	eng-Latn	No	No
10	BirdSet	BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics	AudioClassification	eng-Latn	Yes	No
11	CMUArcticA2TRetrieval	Retrieve the correct transcription for an English speech segment. The dataset is derived from the phonetically balanced CMU Arctic single-speaker TTS corpora. The corpora contains 1150 samples based on read-aloud segments from books, which are out of copyright and derived from the Gutenberg project.	Any2AnyRetrieval	eng-Latn	No	Yes
12	CMUArcticT2ARetrieval	Retrieve the correct audio segment for an English transcription. The dataset is derived from the phonetically balanced CMU Arctic single-speaker TTS corpora. The corpora contains 1150 audio-text pairs based on read-aloud segments from public domain books originally sourced from the Gutenberg project.	Any2AnyRetrieval	eng-Latn	No	Yes
13	CREMADPairClassification	Classifying pairs as having same or different emotions in actor's voice recordings of text spoken in 6 different emotions	AudioPairClassification	eng-Latn	Yes	No
14	CREMA_D	Emotion classification of audio into one of 6 classes: Anger, Disgust, Fear, Happy, Neutral, Sad.	AudioClassification	eng-Latn	No	No
15	CREMA_DClustering	Emotion clustering task with audio data for 6 emotions: Anger, Disgust, Fear, Happy, Neutral, Sad.	AudioClustering	eng-Latn	No	No
16	ClothoA2TRetrieval	An audio captioning datasetst containing audio clips and their corresponding captions.	Any2AnyRetrieval	eng-Latn	No	Yes
17	ClothoT2ARetrieval	An audio captioning datasetst containing audio clips from the Freesound platform and their corresponding captions.	Any2AnyRetrieval	eng-Latn	No	Yes
18	CommonLanguageAgeDetection	Age Classification. This is a stratified subsampled version of the original CommonLanguage dataset.	AudioClassification	eng-Latn	Yes	No
19	CommonLanguageGenderDetection	Gender Classification. This is a stratified subsampled version of the original CommonLanguage datasets.	AudioClassification	eng-Latn	No	No
20	CommonLanguageLanguageDetection	Language Classification. This is a stratified subsampled version of the original CommonLanguage dataset.	AudioClassification	eng-Latn	No	No
21	CommonVoice17A2TRetrieval	Speech recordings with corresponding text transcriptions from CommonVoice dataset.	Any2AnyRetrieval	abk-Latn,afr-Latn,amh-Ethi,ara-Arab,asm-Beng,ast-Latn,aze-Latn,bak-Cyrl,bas-Latn,bel-Cyrl,ben-Beng,bre-Latn,bul-Cyrl,cat-Latn,ces-Latn,chv-Cyrl,ckb-Arab,cnh-Latn,cym-Latn,dan-Latn,deu-Latn,div-Thaa,dyu-Latn,ell-Grek,eng-Latn,epo-Latn,est-Latn,eus-Latn,fas-Arab,fin-Latn,fra-Latn,fry-Latn,gle-Latn,glg-Latn,grn-Latn,hau-Latn,heb-Hebr,hin-Deva,hsb-Latn,hun-Latn,hye-Armn,ibo-Latn,ina-Latn,ind-Latn,spa-Latn	No	No
22	CommonVoice17T2ARetrieval	Speech recordings with corresponding text transcriptions from CommonVoice dataset.	Any2AnyRetrieval	abk-Latn,afr-Latn,amh-Ethi,ara-Arab,asm-Beng,ast-Latn,aze-Latn,bak-Cyrl,bas-Latn,bel-Cyrl,ben-Beng,bre-Latn,bul-Cyrl,cat-Latn,ces-Latn,chv-Cyrl,ckb-Arab,cnh-Latn,cym-Latn,dan-Latn,deu-Latn,div-Thaa,dyu-Latn,ell-Grek,eng-Latn,epo-Latn,est-Latn,eus-Latn,fas-Arab,fin-Latn,fra-Latn,fry-Latn,gle-Latn,glg-Latn,grn-Latn,hau-Latn,heb-Hebr,hin-Deva,hsb-Latn,hun-Latn,hye-Armn,ibo-Latn,ina-Latn,ind-Latn,spa-Latn	No	No
23	CommonVoice21A2TRetrieval	Speech recordings with corresponding text transcriptions from CommonVoice dataset.	Any2AnyRetrieval	abk-Latn,afr-Latn,amh-Ethi,ara-Arab,asm-Beng,ast-Latn,aze-Latn,bak-Cyrl,bas-Latn,bel-Cyrl,ben-Beng,bre-Latn,bul-Cyrl,cat-Latn,ces-Latn,chv-Cyrl,ckb-Arab,cnh-Latn,cym-Latn,dan-Latn,deu-Latn,div-Thaa,dyu-Latn,ell-Grek,eng-Latn,epo-Latn,est-Latn,eus-Latn,fas-Arab,fin-Latn,fra-Latn,fry-Latn,gle-Latn,glg-Latn,grn-Latn,hau-Latn,heb-Hebr,hin-Deva,hsb-Latn,hun-Latn,hye-Armn,ibo-Latn,ina-Latn,ind-Latn,spa-Latn	No	No
24	CommonVoice21T2ARetrieval	Speech recordings with corresponding text transcriptions from CommonVoice dataset.	Any2AnyRetrieval	abk-Latn,afr-Latn,amh-Ethi,ara-Arab,asm-Beng,ast-Latn,aze-Latn,bak-Cyrl,bas-Latn,bel-Cyrl,ben-Beng,bre-Latn,bul-Cyrl,cat-Latn,ces-Latn,chv-Cyrl,ckb-Arab,cnh-Latn,cym-Latn,dan-Latn,deu-Latn,div-Thaa,dyu-Latn,ell-Grek,eng-Latn,epo-Latn,est-Latn,eus-Latn,fas-Arab,fin-Latn,fra-Latn,fry-Latn,gle-Latn,glg-Latn,grn-Latn,hau-Latn,heb-Hebr,hin-Deva,hsb-Latn,hun-Latn,hye-Armn,ibo-Latn,ina-Latn,ind-Latn,spa-Latn	No	No
25	ESC50	Environmental Sound Classification Dataset.	AudioClassification	eng-Latn	No	No
26	ESC50AudioReranking	ESC-50 environmental sound dataset adapted for audio reranking. Given a query audio of environmental sounds, rank 5 relevant audio samples higher than 16 irrelevant ones from different sound classes. Contains 200 queries across 50 environmental sound categories for robust evaluation.	AudioReranking	eng-Latn	No	No
27	ESC50Clustering	The ESC-50 dataset contains 2,000 labeled environmental audio recordings evenly distributed across 50 classes (40 clips per class). These classes are organized into 5 broad categories: animal sounds, natural soundscapes & water sounds, human (non-speech) sounds, interior/domestic sounds, and exterior/urban noises. This task evaluates unsupervised clustering performance on environmental audio recordings.	AudioClustering	eng-Latn	No	No
28	ESC50PairClassification	Environmental Sound Classification Dataset.	AudioPairClassification	eng-Latn	No	No
29	ESC50_Zeroshot	Environmental Sound Classification Dataset.	AudioZeroshotClassification	eng-Latn	No	No
30	EmoVDBA2TRetrieval	Natural language emotional captions for speech segments from the EmoV-DB emotional voices database.	Any2AnyRetrieval	eng-Latn	No	Yes
31	EmoVDBT2ARetrieval	Natural language emotional captions for speech segments from the EmoV-DB emotional voices database.	Any2AnyRetrieval	eng-Latn	No	Yes
32	FSD2019Kaggle	Multilabel Audio Classification.	AudioMultilabelClassification	eng-Latn	Yes	No
33	FSD50K	Multilabel Audio Classification.	AudioMultilabelClassification	eng-Latn	Yes	No
34	FSDD	Spoken digit classification of audio into one of 10 classes: 0-9	AudioClassification	eng-Latn	No	No
35	FSDnoisy18kAudioReranking	FSDnoisy18k sound event dataset adapted for audio reranking. Given a query audio with potential label noise, rank 4 relevant audio samples higher than 16 irrelevant ones from different sound classes. Contains 200 queries across 20 sound event categories.	AudioReranking	eng-Latn	Yes	No
36	FleursA2TRetrieval	Speech recordings with corresponding text transcriptions from the FLEURS dataset.	Any2AnyRetrieval	afr-Latn,amh-Ethi,ara-Arab,asm-Beng,ast-Latn,aze-Latn,bel-Cyrl,ben-Beng,bos-Latn,bul-Cyrl,cat-Latn,ceb-Latn,ces-Latn,ckb-Arab,cmn-Hans,cym-Latn,dan-Latn,deu-Latn,ell-Grek,eng-Latn,est-Latn,fas-Arab,fil-Latn,fin-Latn,fra-Latn,ful-Latn,gle-Latn,glg-Latn,guj-Gujr,hau-Latn,heb-Hebr,hin-Deva,hrv-Latn,hun-Latn,hye-Armn,ibo-Latn,ind-Latn,isl-Latn,ita-Latn,jav-Latn,jpn-Jpan,kam-Latn,kan-Knda,kat-Geor,kaz-Cyrl,kea-Latn,khm-Khmr,kir-Cyrl,kor-Hang,lao-Laoo,lin-Latn,lit-Latn,ltz-Latn,lug-Latn,luo-Latn,lvs-Latn,mal-Mlym,mar-Deva,mkd-Cyrl,mlt-Latn,mon-Cyrl,mri-Latn,msa-Latn,mya-Mymr,nld-Latn,nob-Latn,npi-Deva,nso-Latn,nya-Latn,oci-Latn,ori-Orya,orm-Latn,pan-Guru,pol-Latn,por-Latn,pus-Arab,ron-Latn,rus-Cyrl,slk-Latn,slv-Latn,sna-Latn,snd-Arab,som-Latn,spa-Latn,srp-Cyrl,swe-Latn,swh-Latn,tam-Taml,tel-Telu,tgk-Cyrl,tha-Thai,tur-Latn,ukr-Cyrl,umb-Latn,urd-Arab,uzn-Latn,vie-Latn,wol-Latn,xho-Latn,yor-Latn,yue-Hant,zul-Latn	No	Yes
37	FleursT2ARetrieval	Speech recordings with corresponding text transcriptions from the FLEURS dataset.	Any2AnyRetrieval	afr-Latn,amh-Ethi,ara-Arab,asm-Beng,ast-Latn,aze-Latn,bel-Cyrl,ben-Beng,bos-Latn,bul-Cyrl,cat-Latn,ceb-Latn,ces-Latn,ckb-Arab,cmn-Hans,cym-Latn,dan-Latn,deu-Latn,ell-Grek,eng-Latn,est-Latn,fas-Arab,fil-Latn,fin-Latn,fra-Latn,ful-Latn,gle-Latn,glg-Latn,guj-Gujr,hau-Latn,heb-Hebr,hin-Deva,hrv-Latn,hun-Latn,hye-Armn,ibo-Latn,ind-Latn,isl-Latn,ita-Latn,jav-Latn,jpn-Jpan,kam-Latn,kan-Knda,kat-Geor,kaz-Cyrl,kea-Latn,khm-Khmr,kir-Cyrl,kor-Hang,lao-Laoo,lin-Latn,lit-Latn,ltz-Latn,lug-Latn,luo-Latn,lvs-Latn,mal-Mlym,mar-Deva,mkd-Cyrl,mlt-Latn,mon-Cyrl,mri-Latn,msa-Latn,mya-Mymr,nld-Latn,nob-Latn,npi-Deva,nso-Latn,nya-Latn,oci-Latn,ori-Orya,orm-Latn,pan-Guru,pol-Latn,por-Latn,pus-Arab,ron-Latn,rus-Cyrl,slk-Latn,slv-Latn,sna-Latn,snd-Arab,som-Latn,spa-Latn,srp-Cyrl,swe-Latn,swh-Latn,tam-Taml,tel-Telu,tgk-Cyrl,tha-Thai,tur-Latn,ukr-Cyrl,umb-Latn,urd-Arab,uzn-Latn,vie-Latn,wol-Latn,xho-Latn,yor-Latn,yue-Hant,zul-Latn	No	Yes
38	GTZANAudioReranking	GTZAN music genre dataset adapted for audio reranking. Given a query audio from one of 10 music genres, rank 3 relevant audio samples higher than 10 irrelevant ones from different genres. Contains 100 queries across 10 music genres for comprehensive evaluation.	AudioReranking	eng-Latn	No	No
39	GTZANGenre	Music Genre Classification (10 classes)	AudioClassification	eng-Latn	No	No
40	GTZANGenreClustering	Music genre clustering task based on GTZAN dataset with 10 music genres.	AudioClustering	eng-Latn	No	No
41	GigaSpeechA2TRetrieval	Given an English speech segment, retrieve its correct transcription. Audio comes from the 10 000‑hour training subset of GigaSpeech, which originates from ≈40 000 hours of transcribed audiobooks, podcasts, and YouTube.	Any2AnyRetrieval	eng-Latn	No	Yes
42	GigaSpeechT2ARetrieval	Given an English transcription, retrieve its corresponding audio segment. Audio comes from the 10 000‑hour training subset of GigaSpeech, sourced from ≈40 000 hours of transcribed audiobooks, podcasts, and YouTube.	Any2AnyRetrieval	eng-Latn	No	Yes
43	GunshotTriangulation	Classifying a weapon based on its muzzle blast	AudioClassification	eng-Latn	No	No
44	HiFiTTSA2TRetrieval	Sentence-level text captions aligned to 44.1 kHz audiobook speech segments from the Hi‑Fi Multi‑Speaker English TTS dataset. Dataset is based on public audiobooks from LibriVox and texts from Project Gutenberg.	Any2AnyRetrieval	eng-Latn	No	Yes
45	HiFiTTST2ARetrieval	Sentence-level text captions aligned to 44.1 kHz audiobook speech segments from the Hi‑Fi Multi‑Speaker English TTS dataset. Dataset is based on public audiobooks from LibriVox and texts from Project Gutenberg.	Any2AnyRetrieval	eng-Latn	No	Yes
46	IEMOCAPEmotion	Classification of speech samples into emotions (angry, happy, sad, neutral, frustrated, excited, fearful, surprised, disgusted) from interactive emotional dyadic conversations.	AudioClassification	eng-Latn	No	No
47	IEMOCAPGender	Classification of speech samples by speaker gender (male/female) from the IEMOCAP database of interactive emotional dyadic conversations.	AudioClassification	eng-Latn	No	No
48	JLCorpusA2TRetrieval	Emotional speech segments from the JL-Corpus, balanced over long vowels and annotated for primary and secondary emotions.	Any2AnyRetrieval	eng-Latn	No	Yes
49	JLCorpusT2ARetrieval	Emotional speech segments from the JL-Corpus, balanced over long vowels and annotated for primary and secondary emotions.	Any2AnyRetrieval	eng-Latn	No	Yes
50	LibriCount	Multiclass speaker count identification. Dataset contains audio recordings with between 0 to 10 speakers.	AudioClassification	eng-Latn	Yes	No
51	LibriTTSA2TRetrieval	Given audiobook speech segments from the multi‑speaker LibriTTS corpus, retrieve the correct text transcription. LibriTTS is a 585‑hour, 24 kHz, multi‑speaker English TTS corpus derived from LibriVox (audio) and Project Gutenberg (text).	Any2AnyRetrieval	eng-Latn	No	Yes
52	LibriTTST2ARetrieval	Given an English text transcription, retrieve its corresponding audiobook speech segment from the multi‑speaker LibriTTS corpus. LibriTTS is a 585‑hour, 24 kHz, multi‑speaker English TTS corpus derived from LibriVox and Project Gutenberg.	Any2AnyRetrieval	eng-Latn	No	Yes
53	MACSA2TRetrieval	Audio captions and tags for urban acoustic scenes in TAU Urban Acoustic Scenes 2019 development dataset.	Any2AnyRetrieval	eng-Latn	No	Yes
54	MACST2ARetrieval	Audio captions and tags for urban acoustic scenes in TAU Urban Acoustic Scenes 2019 development dataset.	Any2AnyRetrieval	eng-Latn	No	Yes
55	MInDS14	MInDS-14 is an evaluation resource for intent detection with spoken data in 14 diverse languages.	AudioClassification	ces-Latn,deu-Latn,eng-Latn,fra-Latn,ita-Latn,kor-Hang,nld-Latn,pol-Latn,por-Latn,rus-Cyrl,spa-Latn,zho-Hans	Yes	No
56	MridinghamStroke	Stroke classification of Mridingham (a pitched percussion instrument) into one of 10 classes: ["bheem", "cha", "dheem", "dhin", "num", "tham", "ta", "tha", "thi", "thom"]	AudioClassification	eng-Latn	Yes	No
57	MridinghamTonic	Tonic classification of Mridingham (a pitched percussion instrument) into one of 6 classes: B,C,C#,D,D#,E	AudioClassification	eng-Latn	No	No
58	MusicCapsA2TRetrieval	Natural language description for music audio.	Any2AnyRetrieval	eng-Latn	No	Yes
59	MusicCapsT2ARetrieval	Natural language description for music audio.	Any2AnyRetrieval	eng-Latn	No	Yes
60	MusicGenreClustering	Clustering music recordings in 9 different genres.	AudioClustering	eng-Latn	Yes	No
61	NMSQAPairClassification	A textless Q&A dataset. Given a pair of audio question and audio answer, is the answer relevant to the question?	AudioPairClassification	eng-Latn	Yes	No
62	NSynth	Instrument Source Classification: one of acoustic, electronic, or synthetic.	AudioClassification	eng-Latn	No	No
63	RavdessZeroshot	Emotion classification Dataset. RAVDESS contains 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. Speech emotions includes neutral,calm, happy, sad, angry, fearful, surprise, and disgust expressions. These 8 emtoions also serve as labels for the dataset.	AudioZeroshotClassification	eng-Latn	Yes	No
64	SIBFLEURS	Topic Classification for multilingual audio dataset. This dataset is a stratified and downsampled subset of the SIBFLEURS dataset, which is a collection of 1000+ hours of audio data in 100+ languages.	AudioMultilabelClassification	afr-Latn,amh-Ethi,arb-Arab,asm-Beng,ast-Latn,azj-Latn,bel-Cyrl,ben-Beng,bos-Latn,bul-Cyrl,cat-Latn,ceb-Latn,ces-Latn,ckb-Arab,cym-Latn,dan-Latn,deu-Latn,ell-Grek,eng-Latn,est-Latn,fin-Latn,fra-Latn,fuv-Latn,gaz-Latn,gle-Latn,glg-Latn,guj-Gujr,hau-Latn,heb-Hebr,hin-Deva,hrv-Latn,hun-Latn,hye-Armn,ibo-Latn,ind-Latn,isl-Latn,ita-Latn,jav-Latn,jpn-Jpan,kam-Latn,kan-Knda,kat-Geor,kaz-Cyrl,kea-Latn,khk-Cyrl,khm-Khmr,kir-Cyrl,kor-Hang,lao-Laoo,lin-Latn,lit-Latn,ltz-Latn,lug-Latn,luo-Latn,lvs-Latn,mal-Mlym,mar-Deva,mkd-Cyrl,mlt-Latn,mri-Latn,mya-Mymr,nld-Latn,nob-Latn,npi-Deva,nso-Latn,nya-Latn,oci-Latn,ory-Orya,pan-Guru,pbt-Arab,pes-Arab,pol-Latn,por-Latn,ron-Latn,rus-Cyrl,slk-Latn,slv-Latn,sna-Latn,snd-Arab,som-Latn,spa-Latn,srp-Cyrl,swe-Latn,swh-Latn,tam-Taml,tel-Telu,tgk-Cyrl,tgl-Latn,tha-Thai,tur-Latn,ukr-Cyrl,umb-Latn,urd-Arab,uzn-Latn,vie-Latn,wol-Latn,xho-Latn,yor-Latn,zho-Hans,zho-Hant,zsm-Latn,zul-Latn	Yes	No
65	SoundDescsA2TRetrieval	Natural language description for different audio sources from the BBC Sound Effects webpage.	Any2AnyRetrieval	eng-Latn	No	Yes
66	SoundDescsT2ARetrieval	Natural language description for different audio sources from the BBC Sound Effects webpage.	Any2AnyRetrieval	eng-Latn	No	Yes
67	SpeechCommands	A set of one-second .wav audio files, each containing a single spoken English word or background noise. To keep evaluation fast, we use a downsampled version of the original dataset by keeping ~50 samples per class for training.	AudioClassification	eng-Latn	No	No
68	SpeechCommandsZeroshotv0.01	Sound Classification/Keyword Spotting Dataset. This is a set of one-second audio clips containing a single spoken English word or background noise. These words are from a small set of commands such as 'yes', 'no', and 'stop' spoken by various speakers. With a total of 10 labels/commands for keyword spotting and a total of 30 labels for other auxiliary tasks	AudioZeroshotClassification	eng-Latn	Yes	No
69	SpokeNEnglish	Human Sound Classification Dataset.	AudioClassification	eng-Latn	Yes	No
70	SpokenQAForIC	SpokenQA dataset reformulated as Intent Classification (IC) task	AudioClassification	eng-Latn	Yes	No
71	SpokenSQuADT2ARetrieval	Text-to-audio retrieval task based on SpokenSQuAD dataset. Given a text question, retrieve relevant audio segments that contain the answer. Questions are derived from SQuAD reading comprehension dataset with corresponding spoken passages.	Any2AnyRetrieval	eng-Latn	No	Yes
72	TUTAcousticScenes	TUT Urban Acoustic Scenes 2018 dataset consists of 10-second audio segments from 10 acoustic scenes recorded in six European cities. This is a stratified subsampled version of the original dataset.	AudioClassification	eng-Latn	Yes	No
73	UrbanSound8KA2TRetrieval	UrbanSound8K: Audio-to-text retrieval of urban sound events.	Any2AnyRetrieval	eng-Latn	No	Yes
74	UrbanSound8KAudioReranking	UrbanSound8K urban sound dataset adapted for audio reranking. Given a query audio of urban sounds, rank 4 relevant audio samples higher than 16 irrelevant ones from different urban sound classes. Contains 200 queries across 10 urban sound categories for comprehensive evaluation.	AudioReranking	eng-Latn	No	No
75	UrbanSound8KT2ARetrieval	UrbanSound8K: Text-to-audio retrieval of urban sound events.	Any2AnyRetrieval	eng-Latn	No	Yes
76	UrbanSound8kZeroshot	Environmental Sound Classification Dataset.	AudioZeroshotClassification	eng-Latn	No	No
77	VehicleSoundClustering	Clustering vehicle sounds recorded from smartphones (0 (car class), 1 (truck, bus and van class), 2 (motorcycle class))	AudioClustering	eng-Latn	No	No
78	VocalSound	Human Vocal Sound Classification Dataset.	AudioClassification	eng-Latn	No	No
79	VocalSoundAudioReranking	VocalSound dataset adapted for audio reranking. Given a query vocal sound from one of 6 categories, rank 4 relevant vocal samples higher than 16 irrelevant ones from different vocal sound types. Contains 198 queries across 6 vocal sound categories for robust evaluation.	AudioReranking	eng-Latn	Yes	No
80	VocalSoundPairClassification	Recognizing whether two audio clips are the same human vocal expression (laughing, sighing, etc.)	AudioPairClassification	eng-Latn	Yes	No
81	VoiceGenderClustering	Clustering audio recordings based on gender (male vs female).	AudioClustering	eng-Latn	Yes	No
82	VoxCelebClustering	Clustering task based on the VoxCeleb dataset for sentiment analysis, clustering by positive/negative sentiment.	AudioClustering	eng-Latn	Yes	No
83	VoxCelebSA	VoxCeleb dataset augmented for Sentiment Analysis task	AudioClassification	eng-Latn	Yes	No
84	VoxLingua107_Top10	Spoken Language Identification for a given audio samples (10 classes/languages)	AudioClassification	eng-Latn	No	No
85	VoxPopuliAccentClustering	Clustering English speech samples by non-native accent from European Parliament recordings.	AudioClustering	eng-Latn	Yes	No
86	VoxPopuliAccentID	Classification of English speech samples into one of 15 non-native accents from European Parliament recordings. This is a stratified subsampled version of the original VoxPopuli dataset.	AudioClassification	eng-Latn	Yes	No
87	VoxPopuliAccentPairClassification	Classifying same or different regional accent of English	AudioPairClassification	eng-Latn	No	No
88	VoxPopuliGenderClustering	Subsampled Dataset for clustering speech samples by speaker gender (male/female) from European Parliament recordings.	AudioClustering	deu-Latn,eng-Latn,fra-Latn,pol-Latn,spa-Latn	No	No
89	VoxPopuliGenderID	Subsampled Dataset Classification of speech samples by speaker gender (male/female) from European Parliament recordings.	AudioClassification	deu-Latn,eng-Latn,fra-Latn,pol-Latn,spa-Latn	Yes	No
90	VoxPopuliLanguageID	Subsampled Dataset for classification of speech samples into one of 5 European languages (English, German, French, Spanish, Polish) from European Parliament recordings.	AudioClassification	deu-Latn,eng-Latn,fra-Latn,pol-Latn,spa-Latn	No	No

script

import mteb
import pandas as pd

tasks = mteb.get_tasks(modalities=["audio"])

audio_tasks_names = [t.metadata.name for t in mteb.get_benchmark("MAEB(audio)")]
audio_text_tasks_names = [t.metadata.name for t in mteb.get_benchmark("MAEB(audio-text)")]

row = []
for task in tasks:
    print(task.metadata.name)
    in_audio = task.metadata.name in audio_tasks_names
    in_audio_text = task.metadata.name in audio_text_tasks_names
    row.append(
        {
            "Task Name": task.metadata.name,
            "Task description": task.metadata.description,
            "Task type": task.metadata.type,
            "Task language(s)": ", ".join(task.metadata.eval_langs) if isinstance(task.metadata.eval_langs, list) else ", ".join(v[0] for v in task.metadata.eval_langs.values()),
            "In MAEB(audio)": "Yes" if in_audio else "No",
            "In MAEB(audio-text)": "Yes" if in_audio_text else "No",
        }
    )

df = pd.DataFrame(row)
df = df.sort_values(by=["Task Name", "Task type"]).reset_index(drop=True)
df.to_csv("audio_tasks_table.csv", index=False)
df.to_markdown("audio_tasks_table.md")

Samoed · 2026-01-06T13:33:32Z

Probably we can create english only version, but I'm not sure if it is relevant, because most of the tasks are english only

isaac-chung · 2026-01-06T13:42:56Z

Where are all the multilingual tasks?

Samoed · 2026-01-06T13:45:08Z

I think we can create

MAEB(audio)
MAEB(audio-text-multilingual)
MAEB(audio-text-eng)

But this might be complicated to understand for users

isaac-chung · 2026-01-06T13:46:55Z

I think we can create

MAEB(audio)

MAEB(audio-text-multilingual)

MAEB(audio-text-eng)

But this might be complicated to understand for users

Why would it be complicated? Seems clear to me

KennethEnevoldsen · 2026-01-06T14:00:30Z

Hmm I would maybe do:

MAEB: the full MAEB, including audio, audio-text and multilingual
MAEB(audio): The audio-only subset of MAEB
MAEB(english): The english subset of MAEB

However, I would probably argue we could just make two columns that are audio-only and English and just maintain a single benchmark. WDYT? This both simplifies use, the selection and the paper itself.

PS: We have to fix the language annotations - birdset for example, is not English.

Samoed · 2026-01-06T14:20:12Z

We have to fix the language annotations - birdset for example, is not English

How we should name it? Just other or you have something specific in mind? We probably need to change it also to GTZAN (music classification), GunshotTriangulation, MridinghamTonic, NSynth tasks. Added an issue #3872

However, I would probably argue we could just make two columns that are audio-only and English and just maintain a single benchmark. WDYT? This both simplifies use, the selection and the paper itself.

For leaderboard, I agree, but for the users I'm not sure because this can create problems on inference

isaac-chung · 2026-01-06T19:45:00Z

we could just make two columns that are audio-only and English and just maintain a single benchmark.

~~Sorry it's been a long day, and for some reason I struggle to envision this. What would this look like? Would this need any change to the LB?~~

Ah I get it now, only maintain MAEB. Do we bother filtering out similar tasks? or use the entire collection?

MAEB is the full Massive Audio Embedding Benchmark (v1), containing all tasks with audio modality across 7 task types: classification (35), clustering (10), pair classification (5), reranking (6), zero-shot classification (5), audio-to-text retrieval (18), and text-to-audio retrieval (17). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Samoed · 2026-01-06T21:17:36Z

I'm a bit afraid that if we use only 1 benchmark, but users would want to evaluate only on part of it, e.g. audio only. They would need to filter tasks

isaac-chung · 2026-01-06T22:05:53Z

What if we have an english list, an audio list, a "the rest of the collection" list, and MAEB is english + audio + "the rest"? We can still have MAEB(eng)v1, MAEB(audio)v1, and MAEBv1 ?

Rename UrbanSound8kZeroshotClassification to UrbanSound8kClassification in audio_classification module to avoid collision with the identically named class in audio_zeroshot_classification module. Both classes had the same Python name but different task names: - audio_classification: task name "UrbanSound8k" - audio_zeroshot_classification: task name "UrbanSound8kZeroshot" The * imports caused the zeroshot version to overwrite the classification version, leaving only "UrbanSound8kZeroshot" registered in the task registry and breaking MAEB benchmarks that reference "UrbanSound8k". 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

The dill/datasets library had a pickle incompatibility with Python 3.14. Datasets v4+ resolves this issue. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

The v0.02 task class was defined but not exported in __init__.py, causing KeyError when referenced in benchmarks. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Renamed classes to match their metadata names so they can be found in the task registry: - JamAltArtist → JamAltArtistA2ARetrieval - JamAltLyricsT2A → JamAltLyricT2ARetrieval - JamAltLyricsA2T → JamAltLyricA2TRetrieval Also added explicit imports and exports for proper registration. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

This reverts commit b244226.

This reverts commit 3147c20.

AdnanElAssadi56 · 2026-01-07T02:36:17Z

Possible Splits we could have:

By Modality & Language

MAEB (English): The standard "default" benchmark.
Contains: English tasks and and zxx (universal) tasks.

MAEB (Multilingual): For multilingual evaluation.
Contains: FLEURS, CommonVoice, VoxPopuli, MInDS-14.

MAEB (Audio-Only): Pure audio tasks (no text encoders required).
Contains: Classification, Clustering, PairClassification, etc...

MAEB (Audio-Text): Cross-modal tasks.
Contains: Retrieval, Zeroshot

By Domain (Specialized)

MAEB (Environmental):
Contains: ESC50, UrbanSound, Gunshot, Vehicle, TUT Acoustic Scenes.

MAEB (Music):
Contains: GTZAN, MusicCaps, BeijingOpera, NSynth.

MAEB (Bioacoustics):
Contains: BirdCLEF, BirdSet.

Note on Language Tags: For the specialized domains (Music, Bio, Env), I suggest we use the language tag zxx (No Linguistic Content) instead of eng-Latn. This clarifies that they are universal and fit into both English and Multilingual suites.

Samoed · 2026-01-07T06:18:19Z

I don't think we should split between audio-text and multilingual/english

isaac-chung · 2026-01-07T07:42:03Z

Don't think we can group a handful of tasks to call them "massive benchmarks". It's likely better to say we cover ABC domains in a single benchmark.

For practicality, modality will be the biggest driver, and I think we should keep the split by modalities.

- Export MAEB benchmark from benchmarks/__init__.py - Add Audio tab after Image tab in benchmark selector with MAEB(audio), MAEB(audio-text), and MAEB benchmarks - Add skip_cache_file option to load results from local cache path - Configure local maeb-results path for development testing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

# Conflicts: # mteb/leaderboard/app.py

- Add 5 zero-shot classification tasks to MAEB_AUDIO_TEXT benchmark - Update description to reflect 34 total tasks - Add KennethEnevworlds and Samoed to all MAEB benchmark contacts 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Zero-shot classification tasks require text modality and are now only in MAEB(audio-text). MAEB(audio) now has 24 tasks across 4 task types. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

isaac-chung · 2026-01-07T10:32:07Z

I resolved #3877 and removed zeroshot tasks from the audio-only benchmark.

Resolved conflict in any_2_any_retrieval/__init__.py, keeping correct class names: - JamAltArtistA2ARetrieval - JamAltLyricA2TRetrieval - JamAltLyricT2ARetrieval 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

isaac-chung · 2026-01-07T21:37:54Z

Just added the MAEB audio extended and lite benchmarks. These have 54 tasks, 38 models and 19 tasks, 44 models respectively. MAEB audio-text lite has 30 tasks, 10 models.

This is done by finding the most number of tasks with the most number of model eval runs completed. No filtering is applied. @AdnanElAssadi56 @KennethEnevoldsen @Samoed would love a quick pair of eyes on these. I'd say we can probably start with these, and start filling in the relevant paper subsections.

Audio, Extended

Audio, Lite

Audio-Text, Lite

Replace MAEB(audio) and MAEB(audio-text) with new benchmarks optimized for maximum model coverage: - MAEB(audio, lite): 19 tasks, 44 models with complete results - MAEB(audio, extended): 54 tasks, 38 models with complete results - MAEB(audio-text, lite): 30 tasks, 10 models with complete results Tasks selected via greedy algorithm maximizing models with all tasks. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

The prerun loop was calling on_benchmark_select() and update_task_list() which return gr.update() objects, but then passing those objects to functions expecting raw lists. This caused cache corruption and Gradio validation errors when switching between benchmarks with different task types (e.g., from MAEB(audio-text, lite) with Any2AnyRetrieval to MAEB(audio, lite) without it). Fix by calling the underlying cached functions directly: - _cache_on_benchmark_select() instead of on_benchmark_select() - _cache_update_task_list() instead of update_task_list() 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Leaderboard fixes: - Cancel pending filter events when benchmark changes to prevent race conditions with stale values - Make _update_description derive counts from benchmark tasks directly instead of filter selections to avoid validation errors Benchmark changes: - Remove AudioCapsMiniReranking from MAEB, MAEB(audio, lite), and MAEB(audio, extended) - Update task counts in descriptions (96→95, 19→18, 54→53) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Samoed · 2026-01-08T05:10:11Z

Look great!

- Use MAEB(audio, lite) and MAEB(audio-text, lite) benchmarks - Table 1: Classification, PairClassification, Reranking, Clustering - Table 2: Retrieval, ZeroshotClassification - Make table functions accept task_names and benchmark_name parameters 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Add alias mapping for task types that lose digits during column name processing (e.g., Any2AnyRetrieval -> AnyAnyRetrieval). Also add more audio models to annotation list. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Marimo notebook for analyzing evaluation times across MAEB benchmarks. Loads model metadata and task results to compare eval times between large and small models for audio and audio-text benchmarks. Co-Authored-By: Claude Opus 4.5 <[email protected]>

mteb/leaderboard/app.py

AdnanElAssadi56 · 2026-01-09T06:19:04Z

Great work, @isaac-chung!

When we say audio-text, lite here, are we implying an extended version to the readers?

isaac-chung · 2026-01-09T07:41:31Z

Great work, @isaac-chung!

When we say audio-text, lite here, are we implying an extended version to the readers?

It's the most complete collection bases on what we have run. We're missing results for an extended version.

Resolve conflicts in pyproject.toml and uv.lock, taking maeb's version for speechbrain dependency constraint. Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Add MAEB(audio-text, extended) benchmark with 36 tasks: - All 30 tasks from lite version - Clotho A2T/T2A for audio captioning - Fleurs A2T/T2A (102 languages) - CommonVoice 21 A2T/T2A (82+ languages) - Refine MAEB(audio-text, lite) to 17 tasks: - Remove redundant A2T tasks that have T2A equivalents - Remove SpeechCommandsZeroshotv0.01 (keep only v0.02) - Keep 13 T2A retrieval + 4 zero-shot classification - Add MAEB(audio-text, extended) to benchmark selector Co-Authored-By: Claude Opus 4.5 <[email protected]>

isaac-chung · 2026-01-09T11:17:48Z

Added MAEB(audio-text, extended) as well. Removed same task A2T variants from MAEB(audio-text, lite)

New utility script that calculates total evaluation times for specified benchmarks and models. Features: - Takes --benchmarks and --models as required arguments - Optional --results-dir for custom cache location - Outputs formatted table with task coverage and times per benchmark - Shows totals per model Usage: python scripts/calculate_eval_times.py \ -b "MAEB(audio-text, lite)" "MAEB(audio-text, extended)" \ -m "OpenMuQ/MuQ-MuLan-large" "laion/clap-htsat-unfused" \ -r /path/to/results Co-Authored-By: Claude Opus 4.5 <[email protected]>

Computes Spearman and Pearson correlations between MAEB lite and extended benchmark variants to validate that lite benchmarks preserve model rankings. Outputs correlation values and scatter plots (PNG and PDF). Co-Authored-By: Claude Opus 4.5 <[email protected]>

isaac-chung and others added 10 commits January 4, 2026 15:16

Add MAEB task selection method with correlation and clustering

6047669

Implements new task selection approach using correlation analysis and clustering for MAEB evaluation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4 <[email protected]>

add missing files

61ed764

make lint

8b023dc

KennethEnevoldsen reviewed Jan 6, 2026

View reviewed changes

isaac-chung and others added 5 commits January 7, 2026 00:20

Upgrade datasets to v4+ for Python 3.14 compatibility

3147c20

The dill/datasets library had a pickle incompatibility with Python 3.14. Datasets v4+ resolves this issue. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Track uv.lock in repository

b244226

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Export SpeechCommandsZeroshotClassificationv02 task

fb7061a

The v0.02 task class was defined but not exported in __init__.py, causing KeyError when referenced in benchmarks. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

isaac-chung force-pushed the maeb-task-selection branch 2 times, most recently from 2631fc8 to 411a4ce Compare January 6, 2026 23:21

Revert "Track uv.lock in repository"

e1c1d64

This reverts commit b244226.

Revert "Upgrade datasets to v4+ for Python 3.14 compatibility"

aebf51d

This reverts commit 3147c20.

isaac-chung and others added 3 commits January 7, 2026 11:14

Merge remote-tracking branch 'origin/maeb' into maeb-task-selection

5c541ec

# Conflicts: # mteb/leaderboard/app.py

isaac-chung mentioned this pull request Jan 7, 2026

Specify language for non-human created audio tasks #3872

Closed

16 tasks

isaac-chung and others added 2 commits January 7, 2026 12:13

fix metadata test for SpeechCommandsZeroshotv0.02

bf49fb3

isaac-chung and others added 2 commits January 7, 2026 22:20

add 2 draft benchmarks and fix dep python issue

27b0dfb

isaac-chung and others added 4 commits January 7, 2026 23:42

add MAEB(audio-text, lite)

8c94b75

isaac-chung and others added 3 commits January 8, 2026 11:15

Fix radar chart task type matching for Any2Any tasks

e172ae3

Add alias mapping for task types that lose digits during column name processing (e.g., Any2AnyRetrieval -> AnyAnyRetrieval). Also add more audio models to annotation list. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Samoed reviewed Jan 8, 2026

View reviewed changes

mteb/leaderboard/app.py Show resolved Hide resolved

isaac-chung and others added 2 commits January 9, 2026 12:03

Merge maeb branch into maeb-task-selection

54f5e02

Resolve conflicts in pyproject.toml and uv.lock, taking maeb's version for speechbrain dependency constraint. Co-Authored-By: Claude Opus 4.5 <[email protected]>

isaac-chung and others added 2 commits January 9, 2026 13:36

[WIP] MAEB task selection #3867

Are you sure you want to change the base?

[WIP] MAEB task selection #3867

Conversation

isaac-chung commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KennethEnevoldsen Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

isaac-chung Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

isaac-chung Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Samoed commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Samoed commented Jan 6, 2026

Uh oh!

isaac-chung commented Jan 6, 2026

Uh oh!

Samoed commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

isaac-chung commented Jan 6, 2026

Uh oh!

KennethEnevoldsen commented Jan 6, 2026

Uh oh!

Samoed commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

isaac-chung commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Samoed commented Jan 6, 2026

Uh oh!

isaac-chung commented Jan 6, 2026

Uh oh!

AdnanElAssadi56 commented Jan 7, 2026

Uh oh!

Samoed commented Jan 7, 2026

Uh oh!

isaac-chung commented Jan 7, 2026

Uh oh!

isaac-chung commented Jan 7, 2026

Uh oh!

isaac-chung commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Audio, Extended

Audio, Lite

Audio-Text, Lite

Uh oh!

Samoed commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

AdnanElAssadi56 commented Jan 9, 2026

Uh oh!

isaac-chung commented Jan 9, 2026

Uh oh!

isaac-chung commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

isaac-chung commented Jan 5, 2026 •

edited

Loading

Samoed commented Jan 6, 2026 •

edited

Loading

Samoed commented Jan 6, 2026 •

edited

Loading

Samoed commented Jan 6, 2026 •

edited

Loading

isaac-chung commented Jan 6, 2026 •

edited

Loading

isaac-chung commented Jan 7, 2026 •

edited

Loading

Samoed commented Jan 8, 2026 •

edited

Loading