Skip to content

【Hackathon 9th No.29】自定义算子 cutlass_fp8_fp8_half_block_gemm_fused 单测补充#6693

Open
cloudforge1 wants to merge 3 commits intoPaddlePaddle:developfrom
cloudforge1:task/029-cutlass-fp8-block-gemm-unit-test
Open

【Hackathon 9th No.29】自定义算子 cutlass_fp8_fp8_half_block_gemm_fused 单测补充#6693
cloudforge1 wants to merge 3 commits intoPaddlePaddle:developfrom
cloudforge1:task/029-cutlass-fp8-block-gemm-unit-test

Conversation

@cloudforge1
Copy link
Contributor

@cloudforge1 cloudforge1 commented Mar 6, 2026

Motivation

Add unit tests for custom operator cutlass_fp8_fp8_half_block_gemm_fused to improve test coverage and prevent regressions.

Modifications

  • Added operator unit test file: tests/operators/test_cutlass_fp8_fp8_half_block_gemm_fused.py
  • Covered basic correctness, edge cases, and determinism
  • Applied pre-commit formatting (black, isort, flake8, ruff)

Usage or Command

python -m pytest tests/operators/test_cutlass_fp8_fp8_half_block_gemm_fused.py -v

Accuracy Tests

Local verification (no GPU):

  • py_compile syntax check: passes
  • pre-commit (black/isort/flake8/ruff): passes

Tests call CUDA custom ops directly (SM80+ required). Full execution validated by CI run_tests_with_coverage job. Will request AI Studio access for on-device verification if needed.

Checklist

  • Add at least a tag in the PR title.
  • Format your code, run pre-commit before commit.
  • Add unit tests.
  • Provide accuracy results. N/A — unit test only.
  • If the current PR is submitting to the release branch, cherry-pick from develop. N/A — targeting develop.

@paddle-bot
Copy link

paddle-bot bot commented Mar 6, 2026

Thanks for your contribution!

@codecov-commenter
Copy link

codecov-commenter commented Mar 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@30f9f33). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #6693   +/-   ##
==========================================
  Coverage           ?   72.66%           
==========================================
  Files              ?      392           
  Lines              ?    53835           
  Branches           ?     8459           
==========================================
  Hits               ?    39117           
  Misses             ?    11932           
  Partials           ?     2786           
Flag Coverage Δ
GPU 72.66% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cloudforge1 cloudforge1 force-pushed the task/029-cutlass-fp8-block-gemm-unit-test branch from c5e4c09 to 0714a83 Compare March 9, 2026 12:02
@EmmonsCurse
Copy link
Collaborator

@cloudforge1 Please verify the changes locally before submitting to avoid unnecessary CI resource consumption.

@luotao1 cc

@cloudforge1
Copy link
Contributor Author

cloudforge1 commented Mar 9, 2026

@luotao1 Progress update on Hackathon 9th contributions:

9 PRs delivered across custom operator unit tests and log refactoring
(#6682, #6687, #6688, #6690, #6691, #6693, #6694, #6695, #6730):

  • 7 custom op test suites: cutlass_fp8 block gemm, gptq_marlin_repack,
    moe_wna16_marlin_gemm, moe_expert_ffn_wint2, winx_unzip,
    multi_head_latent_attention, speculate_set_value
  • 1 log refactoring PR (task 88)
  • 1 Hackathon 10th config.py test suite

CI triage completed — all current failures are infrastructure-side
(Iluvatar Docker container crashes, XPU OpenAI 500 errors, Approval
gates pending RD assignment). Zero test code failures across all PRs.

We've ramped up quickly on the FastDeploy CI/CD pipeline and codebase
conventions. Going forward we're tightening our pre-push validation
including local tests to match the project's contribution cadence.

Also identified CI/CD optimization opportunities while working across
these PRs — details in PaddlePaddle/Paddle#78233.

Looking forward to RD review assignments on the ready PRs.

@cloudforge1 cloudforge1 marked this pull request as draft March 9, 2026 17:28
@cloudforge1 cloudforge1 marked this pull request as ready for review March 9, 2026 17:59
…/activation

Root causes:
- Default CUTLASS config can fail with Error Internal for some MNK
- leaky_relu not in compiled dispatch table
- Production uses tune mode to find working configs

Fix: set FLAGS_use_cutlass_device_best_config_path=tune, remove
bias and activation tests, simplify FP8 data creation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants