-
Notifications
You must be signed in to change notification settings - Fork 76
[benchmarks] Add options to print SW efficiency #5493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@etiotto @whitneywhtsang What do you think about this function? Would you use it? @whitneywhtsang This is unrelated to grafana HW efficiency calculations, as we can probably just do it on the fly, don't need to change it in the repo. This PR is for local dev runs. |
For me, I would only use it if hardware capability is provided automatically, and have NV SW efficiency as reference. |
etiotto
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea but, as you mentioned in the description, my preference would be to encode the HW capabilities into the file, and detect the HW the benchmark is running on automatically.
|
Current output: |
|
@etiotto @whitneywhtsang I added automatic knowledge about hardware capability, so now you can just call |
|
Test run just in case: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/19506637222 |
|
Closes #5514 |
User provides hardware capability with a call like this:
python benchmarks/triton_kernels_benchmark/gemm_tensor_desc_benchmark.py --hw_gbps 1229 --hw_tflops 356 --briefand we can print software efficiency and save it to the report as well.If this functionality is popular and required, we could save hardware capability to the file and read it automatically, maybe with a call to
scripts/capture-hw-details.shbefore the script or during the benchmark. Then user will not have to provide device properties.We potentially could also just print one efficiency (max between compute and memory).
Example output: