@@ -98,12 +98,11 @@ types for source, destination, weights, and bias tensors:
9898| Source | Weights | Destination | Bias |
9999| :-----------------| :---------------------------------------| :---------------------------------| :----------------------------|
100100| f64 | f64 | f64 | f64, f32, f16, bf16, s8, u8 |
101- | f32 | f32 | f32 | f32, bf16, f16, u8, s8 |
101+ | f32 | f32, u8, s8, u4, s4 | f32 | f32, bf16, f16, u8, s8 |
102102| f16 | f16, u8, s8, u4, s4 | f16, u8, s8 | f32 |
103- | f16 | f16, u8, s8 | f32 | f32, f16 |
103+ | f16 | f16, u8, s8, u4, s4 | f32, f16 | f32, f16 |
104104| bf16 | bf16, u8, s8, u4, s4 | f32, bf16 | f32, bf16 |
105- | f32, bf16, f16 | u8, s8 | f32, bf16, f16 | f32, bf16, f16 |
106- | f32, bf16, f16 | u8, s8 | f32, bf16, f16 | f32, bf16, f16 |
105+ | f32, bf16, f16 | u8, s8, u4, s4 | f32, bf16, f16 | f32, bf16, f16 |
107106| bf16, f16 | f8_e5m2, f8_e4m3, f4_e2m1, f4_e3m0 | f32, f16, bf16 | f32, bf16, f16 |
108107| f8_e5m2, f8_e4m3 | f8_e5m2, f8_e4m3 | f32, f16, bf16, f8_e5m2, f8_e4m3 | f32, bf16, f16 |
109108| f4_e2m1, f4_e3m0 | f4_e2m1, f4_e3m0 | f32, f16, bf16, f4_e2m1, f4_e3m0 | f32, bf16, f16 |
@@ -146,8 +145,8 @@ The following attributes and post-ops are supported:
146145
147146| Type | Operation | Description | Restrictions |
148147| :----------| :---------------------------------------------------------------| :------------------------------------------------------------------------------| :------------------------------------|
149- | Attribute | [ Scales] (@ref dnnl::primitive_attr::set_scales_mask) | Scales the result by given scaling factor(s) | |
150- | Attribute | [ Zero-points] (@ref dnnl::primitive_attr::set_zero_points_mask) | Sets zero-point(s) for the corresponding tensors | ` int8 ` computations only |
148+ | Attribute | [ Scales] (@ref dnnl::primitive_attr::set_scales_mask) | Scales the result by given scaling factor(s) | |
149+ | Attribute | [ Zero-points] (@ref dnnl::primitive_attr::set_zero_points_mask) | Sets zero-point(s) for the corresponding tensors | |
151150| Attribute | [ Dropout] (@ref dnnl::primitive_attr::set_dropout) | Applies pseudo-random dropout to destination buffer, also fills mask buffer | |
152151| Attribute | [ Precomputed reductions] (@ref dnnl::primitive_attr::set_precomputed_reductions) | Sets precomputed reductions for the corresponding tensors | Requires weight zero-points and full matrix mask |
153152| Post-op | [ Eltwise] (@ref dnnl::post_ops::append_eltwise) | Applies an @ref dnnl_api_eltwise operation to the result | |
@@ -156,9 +155,13 @@ The following attributes and post-ops are supported:
156155| Post-op | [ Prelu] (@ref dnnl::post_ops::append_prelu) | Applies an @ref dnnl_api_prelu operation to the result | |
157156
158157The following masks are supported by the primitive:
159- - 0, which applies one scale / zero point value to an entire tensor, and
160- - 2, which applies a scale value per column along the
161- ` n ` dimension for ` DNNL_ARG_WEIGHTS ` .
158+ - 0, which applies one scale / zero point value to an entire tensor
159+ - 1, which applies a scale / zero point values along ` k ` -dimension
160+ for ` DNNL_ARG_WEIGHTS ` . Values could be grouped along this dimension
161+ via specifying scales / zero points shapes for the attribute
162+ (see the code example @ref weights_decompression_matmul_cpp).
163+ - 2, which applies a scale / zero point values per column along the
164+ ` n ` -dimension for ` DNNL_ARG_WEIGHTS ` .
162165
163166When scales and/or zero-points masks are specified, the user must
164167provide the corresponding scales and/or zero-points as additional
0 commit comments