Skip to content

Activity

fixup! New Xe DPAS MMA atoms, part 2

petercadpushed 7 commits to petercad/reorder_atoms • 685dca8…4a6161e • 
2 days ago

cri run pass

taozha2created zt/cri_debug • dc571a9 • 
3 days ago

R2R demo

petercadcreated petercad/reorder_atoms • 685dca8 • 
10 days ago

avoid to allocate buffer_C if not necessary (#461)

Pull request merge
aacostadiazpushed 1 commit to sycl-develop • 49922fd…1a0980b • 
on Jul 15

enable int8_t mma for mixed dtype (#460)

Pull request merge
aacostadiazpushed 1 commit to sycl-develop • c177726…49922fd • 
on Jul 14

Create a helper for constructing tiled copies of default size (#454)

Pull request merge
aacostadiazpushed 1 commit to sycl-develop • f118cd0…c177726 • 
on Jul 10

header files inclusion (#458)

Pull request merge
aacostadiazpushed 1 commit to sycl-develop • 467a2bb…f118cd0 • 
on Jul 10

Deleted tag

mehdi-golideleted refs/tags/v3.9-03 • 
on Jun 30

June release changelog (#451)

Pull request merge
mehdi-golipushed 1 commit to sycl-develop • 3da91e1…467a2bb • 
on Jun 30

F8 scaling (#450)

Pull request merge
aacostadiazpushed 1 commit to sycl-develop • 725aab4…3da91e1 • 
on Jun 30

Refactor tests for Flash Attention Prefill (#446)

Pull request merge
aacostadiazpushed 1 commit to sycl-develop • 5377d14…725aab4 • 
on Jun 30

Separate output and accumulator type for Flash Attention Prefill Cac…

Pull request merge
aacostadiazpushed 1 commit to sycl-develop • d5f1886…5377d14 • 
on Jun 30

Separate output and accumulator type for Flash Attention Prefill (#443)

Pull request merge
aacostadiazpushed 1 commit to sycl-develop • c316fb5…d5f1886 • 
on Jun 28

A16S8 gemm && tensor-wise quantization (#441)

Pull request merge
aacostadiazpushed 1 commit to sycl-develop • ef256ab…c316fb5 • 
on Jun 28

FP8 Grouped GEMM CollectiveMma (#351)

Pull request merge
aacostadiazpushed 1 commit to sycl-develop • ac786ea…ef256ab • 
on Jun 28

implement zero data type int4_t and add cases (#440)

Pull request merge
aacostadiazpushed 1 commit to sycl-develop • b458965…ac786ea • 
on Jun 25

Check copy alignment for MMA and Epilogue (#438)

Pull request merge
aacostadiazpushed 1 commit to sycl-develop • 576a533…b458965 • 
on Jun 25

New fp8 decode (#439)

Pull request merge
mehdi-golipushed 1 commit to sycl-develop • 2e4b923…576a533 • 
on Jun 24

Adding Fp8 input support for flash attention prefill (#419)

Pull request merge
mehdi-golipushed 1 commit to sycl-develop • 04e29a5…2e4b923 • 
on Jun 24

Add FP16 MMA (#368)

Pull request merge
aacostadiazpushed 1 commit to sycl-develop • 5941d8b…04e29a5 • 
on Jun 23

support different scale/zero data types (int8, bf16, fp16) for mixed …

Pull request merge
aacostadiazpushed 1 commit to sycl-develop • 098e8a7…5941d8b • 
on Jun 20

Fix deprecated FetchContent usage (#434)

Pull request merge
muhammad-tanvir-1211pushed 1 commit to sycl-develop • 0c04a1f…098e8a7 • 
on Jun 18

Update PVC drivers (#391)

Pull request merge
aacostadiazpushed 1 commit to sycl-develop • f431b2c…0c04a1f • 
on Jun 17

Add documentation for 2D copy (#386)

Pull request merge
aacostadiazpushed 1 commit to sycl-develop • 580fad8…f431b2c • 
on Jun 17

A(16bits)xB(8bits) GEMM (#416)

Pull request merge
t4c1pushed 1 commit to sycl-develop • 41193b7…580fad8 • 
on Jun 17

Add Paged Attention for Flash Attention Decode (#403)

Pull request merge
aacostadiazpushed 1 commit to sycl-develop • c5e27e7…41193b7 • 
on Jun 14

Fix for U8 transpose (#392)

Pull request merge
aacostadiazpushed 1 commit to sycl-develop • 83a26ea…c5e27e7 • 
on Jun 13

Move FP8 conversion to NumericArrayConverter (#424)

Pull request merge
aacostadiazpushed 1 commit to sycl-develop • 5ee64b2…83a26ea • 
on Jun 13

Remove unused metadata ValueShape (#430)

Pull request merge
aacostadiazpushed 1 commit to sycl-develop • b70dba6…5ee64b2 • 
on Jun 13

Add data type conversion support in epilogue (#418)

Pull request merge
aacostadiazpushed 1 commit to sycl-develop • 4318b48…b70dba6 • 
on Jun 13