Skip to content

Blocking gemm #110

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Blocking gemm #110

wants to merge 4 commits into from

Conversation

mgates3
Copy link
Collaborator

@mgates3 mgates3 commented Apr 28, 2025

[Based on PR #99.]

Implements a basic blocked, templated gemm. This improves performance, particularly for transposed cases. Still a long ways from optimized libraries like OpenBLAS.

Compare performance after:

blaspp/test> setenv OMP_NUM_THREADS 1
blaspp/test> ./tester --dim 1000 --type s,d,c,z --transA n,t --transB n,t gemm
BLAS++ version 2024.10.26, id e29bac75,  OpenBLAS 0.3.29
input: ./tester --dim '1000,1024' --type 's,d,c,z' --transA 'n,t' --transB 'n,t' gemm

type  layout   transA   transB       m       n       k      alpha       beta     error   time (s)       gflop/s  ref time (s)   ref gflop/s  status
   s     col  notrans  notrans    1000    1000    1000   3.1+1.4i   2.7+1.7i  6.23e-09     0.0453        44.104        0.0215        93.127  pass
   s     col  notrans    trans    1000    1000    1000   3.1+1.4i   2.7+1.7i  6.24e-09     0.0501        39.929        0.0204        97.962  pass
   s     col    trans  notrans    1000    1000    1000   3.1+1.4i   2.7+1.7i  6.24e-09     0.0527        37.946        0.0204        97.914  pass
   s     col    trans    trans    1000    1000    1000   3.1+1.4i   2.7+1.7i  6.22e-09     0.0583        34.295        0.0204        97.924  pass

   d     col  notrans  notrans    1000    1000    1000   3.1+1.4i   2.7+1.7i  9.11e-18     0.0967        20.682        0.0413        48.417  pass
   d     col  notrans    trans    1000    1000    1000   3.1+1.4i   2.7+1.7i  9.11e-18     0.0999        20.020        0.0411        48.668  pass
   d     col    trans  notrans    1000    1000    1000   3.1+1.4i   2.7+1.7i  9.11e-18      0.102        19.681        0.0407        49.162  pass
   d     col    trans    trans    1000    1000    1000   3.1+1.4i   2.7+1.7i  9.10e-18      0.105        19.023        0.0405        49.416  pass

   c     col  notrans  notrans    1000    1000    1000   3.1+1.4i   2.7+1.7i  1.75e-09      0.631        12.680        0.0997        80.226  pass
   c     col  notrans    trans    1000    1000    1000   3.1+1.4i   2.7+1.7i  1.75e-09      0.634        12.621        0.0996        80.287  pass
   c     col    trans  notrans    1000    1000    1000   3.1+1.4i   2.7+1.7i  1.75e-09      0.646        12.383         0.101        79.401  pass
   c     col    trans    trans    1000    1000    1000   3.1+1.4i   2.7+1.7i  1.75e-09      0.646        12.378        0.0997        80.251  pass

   z     col  notrans  notrans    1000    1000    1000   3.1+1.4i   2.7+1.7i  2.81e-18      0.712        11.230         0.201        39.831  pass
   z     col  notrans    trans    1000    1000    1000   3.1+1.4i   2.7+1.7i  2.81e-18      0.744        10.752         0.200        39.939  pass
   z     col    trans  notrans    1000    1000    1000   3.1+1.4i   2.7+1.7i  2.81e-18      0.651        12.291         0.201        39.810  pass
   z     col    trans    trans    1000    1000    1000   3.1+1.4i   2.7+1.7i  2.81e-18      0.663        12.071         0.200        39.902  pass

with before:

blaspp/test> ./tester --dim 1000,1024 --type s,d,c,z --transA n,t --transB n,t gemm
BLAS++ version 2024.10.26, id 952c1da9,  OpenBLAS 0.3.29
input: ./tester --dim '1000,1024' --type 's,d,c,z' --transA 'n,t' --transB 'n,t' gemm

type  layout   transA   transB       m       n       k      alpha       beta     error   time (s)       gflop/s  ref time (s)   ref gflop/s  status
   s     col  notrans  notrans    1000    1000    1000   3.1+1.4i   2.7+1.7i  1.23e-08     0.0894        22.360        0.0209        95.841  pass
   s     col  notrans    trans    1000    1000    1000   3.1+1.4i   2.7+1.7i  1.23e-08     0.0912        21.938        0.0204        97.843  pass
   s     col    trans  notrans    1000    1000    1000   3.1+1.4i   2.7+1.7i  1.05e-08      0.807         2.478        0.0205        97.347  pass
   s     col    trans    trans    1000    1000    1000   3.1+1.4i   2.7+1.7i  1.05e-08      1.148         1.742        0.0206        97.120  pass

   d     col  notrans  notrans    1000    1000    1000   3.1+1.4i   2.7+1.7i  2.18e-17      0.170        11.785        0.0407        49.111  pass
   d     col  notrans    trans    1000    1000    1000   3.1+1.4i   2.7+1.7i  2.17e-17      0.173        11.556        0.0406        49.311  pass
   d     col    trans  notrans    1000    1000    1000   3.1+1.4i   2.7+1.7i  1.94e-17      1.162         1.721        0.0407        49.151  pass
   d     col    trans    trans    1000    1000    1000   3.1+1.4i   2.7+1.7i  1.94e-17      1.150         1.740        0.0405        49.358  pass

   c     col  notrans  notrans    1000    1000    1000   3.1+1.4i   2.7+1.7i  4.15e-09      0.714        11.211        0.0998        80.166  pass
   c     col  notrans    trans    1000    1000    1000   3.1+1.4i   2.7+1.7i  4.15e-09      0.719        11.121        0.0997        80.278  pass
   c     col    trans  notrans    1000    1000    1000   3.1+1.4i   2.7+1.7i  3.72e-09      0.918         8.717        0.0999        80.045  pass
   c     col    trans    trans    1000    1000    1000   3.1+1.4i   2.7+1.7i  3.72e-09      0.920         8.697        0.0996        80.329  pass

   z     col  notrans  notrans    1000    1000    1000   3.1+1.4i   2.7+1.7i  7.55e-18      0.730        10.966         0.204        39.144  pass
   z     col  notrans    trans    1000    1000    1000   3.1+1.4i   2.7+1.7i  7.55e-18      0.765        10.462         0.200        39.936  pass
   z     col    trans  notrans    1000    1000    1000   3.1+1.4i   2.7+1.7i  6.74e-18      0.927         8.627         0.201        39.745  pass
   z     col    trans    trans    1000    1000    1000   3.1+1.4i   2.7+1.7i  6.72e-18      0.936         8.546         0.200        39.906  pass

@mgates3 mgates3 mentioned this pull request Apr 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant