Skip to content

[Question] Int8 Gemm's perf degraded in real models. #2351

@foreverlms

Description

@foreverlms

I have encountered a problem that whe benchmark the

void genericInt8GemmKernelLauncher(int8_t const* A, int8_t const* B, tk::QuantMode quantOption, float const* alphaCol,
float const* alphaRow, T* C, int m, int n, int k, tkc::CutlassGemmConfig gemmConfig, char* workspace,
Int8 cutlass kernel seperately, the kernel really beats the fp16 gemm kernel.
But when it comes to real model, the int8 gemm kernel's perf degraded a lot. (I did use gemmprofilerplugin).

Two pictures from the nsight:

Int8 Gemm (above: seperate kernels, below: in real models)

int8 in benchmark:

Image

int8 in models:

Image

Device: A100 SXM-80GB

For 16, 6144, 4096; 14us -> 24us, almost doubbled. The config is exactly the same as the nsight shows.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Low PrecisionLower-precision formats (INT8/INT4/FP8) for TRTLLM quantization (AWQ, GPTQ).questionFurther information is requestedtriagedIssue has been triaged by maintainerswaiting for feedback

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions