Fix multi-label classification in run_classification.py (closes #43116) #43198

686f6c61 · 2026-01-09T15:31:08Z

Summary

This PR addresses multi-label classification bugs in run_classification.py and adds confidence scores output following community feedback.

Bug fixes

Fixed 4 bugs that broke multi-label classification with JSON datasets:

Missing imports for SequenceFeature and expit
Regression detection fails with AttributeError on JSON datasets - fixed by adding hasattr() check before accessing dtype
Multi-label detection misses JSON-based datasets - fixed by adding isinstance() check for SequenceFeature
Predictions missing sigmoid activation - added expit() before thresholding in both compute_metrics and do_predict

New features

Based on feedback from @ziorufus, added configurable threshold and confidence scores output.

New parameters:

--output_confidence_scores (bool, default: False) - Output JSON with confidence scores instead of binary predictions
--multi_label_threshold (float, default: 0.5) - Threshold for converting probabilities to binary predictions
--top_k_labels (int, optional) - Limit output to top K most confident labels

Output format follows transformers Pipeline API convention:

Traditional mode (default):

index   prediction
0       ['positive', 'urgent']

Confidence scores mode:

[
  {
    "index": 0,
    "predictions": [
      {"label": "positive", "score": 0.89},
      {"label": "urgent", "score": 0.67}
    ]
  }
]

Backward compatibility

Default behavior unchanged. New features require explicit flags. No breaking changes.

Testing

Validated with:

Ruff linting and formatting (all checks passed)
Python syntax check
Logic validation with simulated multi-label data
Backward compatibility verification

Test scenarios:

Traditional TSV output with default threshold
JSON confidence scores output
Custom threshold values
Top-K filtering

Usage examples

Traditional mode:

python run_classification.py \
  --model_name_or_path bert-base-uncased \
  --test_file test.json \
  --do_predict \
  --output_dir ./output

Confidence scores:

python run_classification.py \
  --model_name_or_path bert-base-uncased \
  --test_file test.json \
  --do_predict \
  --output_confidence_scores \
  --output_dir ./output

Custom threshold:

python run_classification.py \
  --test_file test.json \
  --do_predict \
  --multi_label_threshold 0.3 \
  --output_dir ./output

Top-K labels:

python run_classification.py \
  --test_file test.json \
  --do_predict \
  --output_confidence_scores \
  --top_k_labels 3 \
  --output_dir ./output

Implementation notes

Follows Pipeline API format used by text-classification and zero-shot pipelines
Scores sorted descending by confidence
JSON output enables downstream processing and custom threshold application
Type hints and documentation complete

Changes

examples/pytorch/text-classification/run_classification.py (+78, -20)
3 new parameters added
1 import added
58 net lines added

Fixes four bugs that prevented multi-label classification from working with JSON data files: 1. AttributeError when detecting regression (line 416) 2. AttributeError in regression type casting (line 432) 3. AttributeError when detecting multi-label (line 442) 4. Empty predictions due to missing sigmoid (lines 651, 718) Changes: - Add hasattr() checks before accessing dtype attribute - Use isinstance() for proper multi-label detection - Apply sigmoid before thresholding in predictions All changes are backwards compatible and tested with single-label, multi-label, and regression tasks. Fixes huggingface#43116

Fix formatting issues detected by CircleCI check_code_quality.

Implement configurable threshold and confidence scores output following transformers Pipeline API conventions: - Add --output_confidence_scores flag (default: False for backward compatibility) - Add --multi_label_threshold parameter (default: 0.5, configurable) - Add --top_k_labels parameter to limit output to top K labels - Output JSON format with {"label": str, "score": float} when enabled - Maintain backward compatible TSV format when disabled This addresses feedback from issue huggingface#43116 to provide more flexibility for multi-label classification workflows.

686f6c61 · 2026-01-10T13:29:30Z

Updated this PR to include confidence scores output based on @ziorufus feedback.

What changed

Added three new parameters for multi-label classification:

--output_confidence_scores - Output JSON with scores instead of binary 0/1 (default: False for backward compatibility)
--multi_label_threshold - Configurable threshold for binary predictions (default: 0.5, was hardcoded before)
--top_k_labels - Limit to top K most confident labels (optional)

Implementation

Following transformers Pipeline API convention, output format is:

[
  {
    "index": 0,
    "predictions": [
      {"label": "positive", "score": 0.89},
      {"label": "urgent", "score": 0.67}
    ]
  }
]

This matches text-classification and zero-shot pipelines for consistency.

Why I think this could be useful

@ziorufus suggested that outputting raw scores could give users more flexibility to:

Apply custom thresholds post-prediction
See model confidence per label
Integrate with downstream systems more easily

I implemented this approach but I'm open to feedback and suggestions for improvements if there are better ways to handle this.

Default behavior unchanged - traditional TSV output with threshold=0.5. New features are opt-in.

Testing

All code quality checks passed:

Ruff linting
Ruff formatting
Logic validation
Backward compatibility verified

686f6c61 mentioned this pull request Jan 9, 2026

Multi-label classification always returns empty results in run_classification.py example script #43116

Open

4 tasks

686f6c61 added 3 commits January 9, 2026 16:35

Apply code formatting with ruff and black

d354bfe

Apply ruff format (v0.14.11) for code style compliance

cd036d5

Fix formatting issues detected by CircleCI check_code_quality.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix multi-label classification in run_classification.py (closes #43116) #43198

Fix multi-label classification in run_classification.py (closes #43116) #43198

686f6c61 commented Jan 9, 2026 •

edited

Loading

Uh oh!

686f6c61 commented Jan 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix multi-label classification in run_classification.py (closes #43116) #43198

Are you sure you want to change the base?

Fix multi-label classification in run_classification.py (closes #43116) #43198

Conversation

686f6c61 commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Bug fixes

New features

Backward compatibility

Testing

Usage examples

Implementation notes

Changes

Uh oh!

686f6c61 commented Jan 10, 2026

What changed

Implementation

Why I think this could be useful

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

686f6c61 commented Jan 9, 2026 •

edited

Loading