fixup! New Xe DPAS MMA atoms, part 2
avoid to allocate buffer_C if not necessary (
#461 )
Pull request merge
enable int8_t mma for mixed dtype (
#460 )
Pull request merge
Create a helper for constructing tiled copies of default size (
#454 )
Pull request merge
header files inclusion (
#458 )
Pull request merge
June release changelog (
#451 )
Pull request merge
Refactor tests for Flash Attention Prefill (
#446 )
Pull request merge
Separate output and accumulator type for Flash Attention Prefill Cac…
Pull request merge
Separate output and accumulator type for Flash Attention Prefill (
#443 )
Pull request merge
A16S8 gemm && tensor-wise quantization (
#441 )
Pull request merge
FP8 Grouped GEMM CollectiveMma (
#351 )
Pull request merge
implement zero data type int4_t and add cases (
#440 )
Pull request merge
Check copy alignment for MMA and Epilogue (
#438 )
Pull request merge
Adding Fp8 input support for flash attention prefill (
#419 )
Pull request merge
support different scale/zero data types (int8, bf16, fp16) for mixed …
Pull request merge
Fix deprecated FetchContent usage (
#434 )
Pull request merge
Update PVC drivers (
#391 )
Pull request merge
Add documentation for 2D copy (
#386 )
Pull request merge
A(16bits)xB(8bits) GEMM (
#416 )
Pull request merge
Add Paged Attention for Flash Attention Decode (
#403 )
Pull request merge
Fix for U8 transpose (
#392 )
Pull request merge
Move FP8 conversion to NumericArrayConverter (
#424 )
Pull request merge
Remove unused metadata ValueShape (
#430 )
Pull request merge
Add data type conversion support in epilogue (
#418 )
Pull request merge
You can’t perform that action at this time.