Make birdset dataset handling more efficient #3863

AdnanElAssadi56 · 2026-01-05T09:13:01Z

If you add a model or a dataset, please add the corresponding checklist:

KennethEnevoldsen · 2026-01-05T09:26:57Z

@Samoed can I ask to take this one?

mteb/tasks/audio/audio_multilabel_classification/eng/bird_set.py

Samoed · 2026-01-05T10:31:36Z

mteb/tasks/audio/audio_multilabel_classification/eng/bird_set.py

+            lambda x: x[self.label_column_name] is not None and len(x[self.label_column_name]) > 0
+        )
+
+        # Only subsample splits that are larger than n_samples to avoid division by zero


Not sure why this is required

Label had wrong type, and handling was taking some time.

Samoed · 2026-01-07T15:59:22Z

Can you reupload this dataset? You can do it by task.push_dataset_to_hub(f"mteb/{task.metadata.name}"). I tried to do this on kaggle, but casting took too much memory. This would help to close #3499

Samoed · 2026-01-07T17:32:06Z

By the way do you know why we're using only HSN subset?

AdnanElAssadi56 · 2026-01-09T06:08:07Z

@imadtyx I see that you've first integrated this. Can you please tell us about this?

By the way do you know why we're using only HSN subset?

Make birdset dataset handling more efficient

e8999b9

Samoed reviewed Jan 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make birdset dataset handling more efficient #3863

Make birdset dataset handling more efficient #3863

Uh oh!

AdnanElAssadi56 commented Jan 5, 2026

Uh oh!

KennethEnevoldsen commented Jan 5, 2026

Uh oh!

Uh oh!

Samoed Jan 5, 2026

Uh oh!

AdnanElAssadi56 Jan 6, 2026

Uh oh!

Samoed commented Jan 7, 2026 •

edited

Loading

Uh oh!

Samoed commented Jan 7, 2026

Uh oh!

AdnanElAssadi56 commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Make birdset dataset handling more efficient #3863

Are you sure you want to change the base?

Make birdset dataset handling more efficient #3863

Uh oh!

Conversation

AdnanElAssadi56 commented Jan 5, 2026

Uh oh!

KennethEnevoldsen commented Jan 5, 2026

Uh oh!

Uh oh!

Samoed Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

AdnanElAssadi56 Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Samoed commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Samoed commented Jan 7, 2026

Uh oh!

AdnanElAssadi56 commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Samoed commented Jan 7, 2026 •

edited

Loading