Activity

fixup! New Xe DPAS MMA atoms, part 2

petercadpushed 7 commits to petercad/reorder_atoms • 685dca8…4a6161e •

2 days ago

cri run pass

taozha2created zt/cri_debug • dc571a9 •

3 days ago

R2R demo

petercadcreated petercad/reorder_atoms • 685dca8 •

10 days ago

avoid to allocate buffer_C if not necessary (#461)

Pull request merge

aacostadiazpushed 1 commit to sycl-develop • 49922fd…1a0980b •

on Jul 15

enable int8_t mma for mixed dtype (#460)

Pull request merge

aacostadiazpushed 1 commit to sycl-develop • c177726…49922fd •

on Jul 14

Create a helper for constructing tiled copies of default size (#454)

Pull request merge

aacostadiazpushed 1 commit to sycl-develop • f118cd0…c177726 •

on Jul 10

header files inclusion (#458)

Pull request merge

aacostadiazpushed 1 commit to sycl-develop • 467a2bb…f118cd0 •

on Jul 10

Deleted tag

mehdi-golideleted refs/tags/v3.9-03 •

on Jun 30

June release changelog (#451)

Pull request merge

mehdi-golipushed 1 commit to sycl-develop • 3da91e1…467a2bb •

on Jun 30

F8 scaling (#450)

Pull request merge

aacostadiazpushed 1 commit to sycl-develop • 725aab4…3da91e1 •

on Jun 30

Refactor tests for Flash Attention Prefill (#446)

Pull request merge

aacostadiazpushed 1 commit to sycl-develop • 5377d14…725aab4 •

on Jun 30

Separate output and accumulator type for Flash Attention Prefill Cac…

Pull request merge

aacostadiazpushed 1 commit to sycl-develop • d5f1886…5377d14 •

on Jun 30

Separate output and accumulator type for Flash Attention Prefill (#443)

Pull request merge

aacostadiazpushed 1 commit to sycl-develop • c316fb5…d5f1886 •

on Jun 28

A16S8 gemm && tensor-wise quantization (#441)

Pull request merge

aacostadiazpushed 1 commit to sycl-develop • ef256ab…c316fb5 •

on Jun 28

FP8 Grouped GEMM CollectiveMma (#351)

Pull request merge

aacostadiazpushed 1 commit to sycl-develop • ac786ea…ef256ab •

on Jun 28

implement zero data type int4_t and add cases (#440)

Pull request merge

aacostadiazpushed 1 commit to sycl-develop • b458965…ac786ea •

on Jun 25

Check copy alignment for MMA and Epilogue (#438)

Pull request merge

aacostadiazpushed 1 commit to sycl-develop • 576a533…b458965 •

on Jun 25

New fp8 decode (#439)

Pull request merge

mehdi-golipushed 1 commit to sycl-develop • 2e4b923…576a533 •

on Jun 24

Adding Fp8 input support for flash attention prefill (#419)

Pull request merge

mehdi-golipushed 1 commit to sycl-develop • 04e29a5…2e4b923 •

on Jun 24

Add FP16 MMA (#368)

Pull request merge

aacostadiazpushed 1 commit to sycl-develop • 5941d8b…04e29a5 •

on Jun 23

support different scale/zero data types (int8, bf16, fp16) for mixed …

Pull request merge

aacostadiazpushed 1 commit to sycl-develop • 098e8a7…5941d8b •

on Jun 20

Fix deprecated FetchContent usage (#434)

Pull request merge

muhammad-tanvir-1211pushed 1 commit to sycl-develop • 0c04a1f…098e8a7 •

on Jun 18

Update PVC drivers (#391)

Pull request merge

aacostadiazpushed 1 commit to sycl-develop • f431b2c…0c04a1f •

on Jun 17

Add documentation for 2D copy (#386)

Pull request merge

aacostadiazpushed 1 commit to sycl-develop • 580fad8…f431b2c •

on Jun 17

A(16bits)xB(8bits) GEMM (#416)

Pull request merge

t4c1pushed 1 commit to sycl-develop • 41193b7…580fad8 •

on Jun 17

Add Paged Attention for Flash Attention Decode (#403)

Pull request merge

aacostadiazpushed 1 commit to sycl-develop • c5e27e7…41193b7 •

on Jun 14

Fix for U8 transpose (#392)

Pull request merge

aacostadiazpushed 1 commit to sycl-develop • 83a26ea…c5e27e7 •

on Jun 13

Move FP8 conversion to NumericArrayConverter (#424)

Pull request merge

aacostadiazpushed 1 commit to sycl-develop • 5ee64b2…83a26ea •

on Jun 13

Remove unused metadata ValueShape (#430)

Pull request merge

aacostadiazpushed 1 commit to sycl-develop • b70dba6…5ee64b2 •

on Jun 13

Add data type conversion support in epilogue (#418)

Pull request merge

aacostadiazpushed 1 commit to sycl-develop • 4318b48…b70dba6 •

on Jun 13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fixup! New Xe DPAS MMA atoms, part 2

cri run pass

R2R demo

avoid to allocate buffer_C if not necessary (#461)

enable int8_t mma for mixed dtype (#460)

Create a helper for constructing tiled copies of default size (#454)

header files inclusion (#458)

Deleted tag

June release changelog (#451)

F8 scaling (#450)

Refactor tests for Flash Attention Prefill (#446)

Separate output and accumulator type for Flash Attention Prefill Cac…

Separate output and accumulator type for Flash Attention Prefill (#443)

A16S8 gemm && tensor-wise quantization (#441)

FP8 Grouped GEMM CollectiveMma (#351)

implement zero data type int4_t and add cases (#440)

Check copy alignment for MMA and Epilogue (#438)

New fp8 decode (#439)

Adding Fp8 input support for flash attention prefill (#419)

Add FP16 MMA (#368)

support different scale/zero data types (int8, bf16, fp16) for mixed …

Fix deprecated FetchContent usage (#434)

Update PVC drivers (#391)

Add documentation for 2D copy (#386)

A(16bits)xB(8bits) GEMM (#416)

Add Paged Attention for Flash Attention Decode (#403)

Fix for U8 transpose (#392)

Move FP8 conversion to NumericArrayConverter (#424)

Remove unused metadata ValueShape (#430)

Add data type conversion support in epilogue (#418)