Skip to content

adityasz/matmul

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fast FP32 matmul in C++

A row-major implementation of sgemm.c, utilizing compile time metaprogramming for a 60% reduction in lines of code!

The kernel is templated, and >=g++-14 does not spill registers when using C-style arrays.

Gets ~97% of the speed of Intel MKL on certain shapes (single-threaded); see TODO.

TODO

  • Tune block sizes for Alder Lake
  • Parallelize with OpenMP

About

Fast, row-major FP32 matrix multiplication in C++

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published