[Question] Int8 Gemm's perf degraded in real models.

I have encountered a problem that whe benchmark the https://github.com/NVIDIA/TensorRT-LLM/blob/a65dba7aaf7e2d8bb0120eea8f8f04deff145d6a/cpp/tensorrt_llm/kernels/cutlass_kernels/int8_gemm/int8_gemm_template.h#L62-L63 Int8 cutlass kernel seperately, the kernel really beats the fp16 gemm kernel.
But when it comes to real model, the int8 gemm kernel's perf degraded a lot. (I did use gemmprofilerplugin).

Two pictures from the nsight:

Int8 Gemm (above: seperate kernels, below: in real models)

int8 in benchmark:

![Image](https://github.com/user-attachments/assets/6d1e333b-f89f-4ef6-a870-479dd408546f)

int8 in models:

![Image](https://github.com/user-attachments/assets/11d6f0e4-c7f0-4665-8ae0-4e52b6348e21)


Device: A100 SXM-80GB

For 16, 6144, 4096; 14us -> 24us, almost doubbled. The config is exactly the same as the nsight shows.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] Int8 Gemm's perf degraded in real models. #2351

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	void genericInt8GemmKernelLauncher(int8_t const* A, int8_t const* B, tk::QuantMode quantOption, float const* alphaCol,
	float const* alphaRow, T* C, int m, int n, int k, tkc::CutlassGemmConfig gemmConfig, char* workspace,

[Question] Int8 Gemm's perf degraded in real models. #2351

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions