-
Notifications
You must be signed in to change notification settings - Fork 4
ggml_scale_bias #27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ggml_scale_bias #27
Conversation
WalkthroughThis update extends the tensor scaling operation across all backends by introducing a bias term, enabling affine transformations (x = s * a + b). New API functions for scale-with-bias are added. All CPU, GPU, and accelerator backend implementations and kernels are updated to support the new bias parameter. Tests are also updated to validate the new functionality. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant API
participant Backend
participant Kernel
User->>API: ggml_scale_bias(ctx, a, s, b)
API->>Backend: Prepare op_params = [s, b]
Backend->>Kernel: Launch scale kernel with scale, bias
Kernel->>Backend: Compute x = s * a + b
Backend->>API: Return result tensor
API->>User: Output tensor with affine transformation
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
@coderabbitai nitpicking this plz |
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain@ngxson I'll conduct a detailed nitpicking review of this PR. Let me examine the changes first to understand what's being modified. 🏁 Script executed:
Length of output: 611 🏁 Script executed:
Length of output: 3497 🏁 Script executed:
Length of output: 10025 🏁 Script executed:
Length of output: 23856 🏁 Script executed:
Length of output: 5964 🏁 Script executed:
Length of output: 22445 @ngxson I'll provide a detailed nitpicking review of this scale operation implementation across all backends. Here are the issues I found: 🐛 Critical Issues1. Gradient Computation BugLocation: case GGML_OP_SCALE: {
if (src0_needs_grads) {
float s;
memcpy(&s, tensor->op_params, sizeof(float)); // Only extracts scale
ggml_add_or_set(ctx, cgraph, isrc0, ggml_scale_impl(ctx, grad, s, 0.0, false));
}
} break; Issue: The gradient computation only extracts the scale parameter but completely ignores the bias parameter. This means gradients will be incorrect when using Fix: Should be: float s, b;
memcpy(&s, tensor->op_params, sizeof(float));
memcpy(&b, tensor->op_params + sizeof(float), sizeof(float)); 🔧 Code Quality Issues2. Inconsistent Parameter Extraction PatternsBackends use different patterns for extracting parameters: CUDA (good): float scale = ((const float *)(dst->op_params))[0];
float bias = ((const float *)(dst->op_params))[1]; OpenCL (inconsistent): // Uses different variable names and spacing
float scale = *(float *)&dst->op_params[0];
float bias = *(float *)&dst->op_params[sizeof(float)]; 3. Missing Const CorrectnessLocation: Multiple backends // Should be const
const float scale = ((const float *)(dst->op_params))[0];
const float bias = ((const float *)(dst->op_params))[1]; 4. Redundant Code in Vulkan ShaderLocation: const uint num_iter = 4;
[[unroll]] for (uint i = 0; i < num_iter; ++i) {
if (idx >= p.ne) {
continue; // This check is inside the loop
}
// ...
idx += num_threads;
} Issue: The bounds check 5. Missing Error HandlingLocation: Multiple backends // No validation that op_params contains valid float values
// No check for NaN or infinite values
// No bounds checking on tensor dimensions 6. Suboptimal OpenCL KernelLocation: kernel void kernel_scale(
global float4 * src0, // Uses float4 but no vectorization benefits
ulong offset0,
global float4 * dst,
ulong offsetd,
float scale,
float bias
) {
// Simple element-wise operation doesn't benefit from float4
dst[get_global_id(0)] = src0[get_global_id(0)] * scale + bias;
} Issue: Using 📚 Documentation Issues7. Missing Function DocumentationLocation: GGML_API struct ggml_tensor * ggml_scale(
struct ggml_context * ctx,
struct ggml_tensor * a,
float s);
// in-place, returns view(a)
GGML_API struct ggml_tensor * ggml_scale_inplace(
struct ggml_context * ctx,
struct ggml_tensor * a,
float s); Issue: Missing documentation for parameters and return values. The 8. Inconsistent Naming ConventionLocation: Various files // Inconsistent naming across backends
scale_f32_cuda() // CUDA
scale_f32_sycl() // SYCL
kernel_scale() // OpenCL
// Some use underscore, others don't ⚡ Performance Issues9. Suboptimal CPU VectorizationLocation: inline static void ggml_vec_scale_f32(const int n, float * y, const float v) {
// Long switch between different SIMD implementations
// Could benefit from better branch prediction
} Issue: The function has many conditional compilation branches that could impact performance. 10. Missing Bounds CheckingLocation: Multiple backends // No validation that tensor dimensions match expected sizes
// No overflow checks when calculating offsets 🧪 Test Coverage Issues11. Limited Edge Case TestingLocation: test_cases.emplace_back(new test_scale());
test_cases.emplace_back(new test_scale(GGML_TYPE_F32, {10, 10, 10, 10}, 2.0f, 1.0f)); Issue: Missing tests for:
🔄 Consistency Issues12. Different Block SizesLocation: Various backend implementations // CUDA
#define CUDA_SCALE_BLOCK_SIZE 256
// SYCL
#define SYCL_SCALE_BLOCK_SIZE 256
// But OpenCL uses different approach
const uint num_threads = 128; // Vulkan Issue: Inconsistent block sizes across backends without clear justification. These issues range from critical bugs (gradient computation) to minor style inconsistencies. The gradient computation bug should be addressed immediately as it affects correctness of backpropagation. 🧠 Learnings used
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
ggml/include/ggml.h
(1 hunks)ggml/src/ggml-cann/ggml-cann.cpp
(1 hunks)ggml/src/ggml-cpu/ops.cpp
(2 hunks)ggml/src/ggml-cpu/vec.h
(1 hunks)ggml/src/ggml-cuda/scale.cu
(2 hunks)ggml/src/ggml-metal/ggml-metal.m
(2 hunks)ggml/src/ggml-metal/ggml-metal.metal
(1 hunks)ggml/src/ggml-opencl/ggml-opencl.cpp
(2 hunks)ggml/src/ggml-opencl/kernels/scale.cl
(1 hunks)ggml/src/ggml-sycl/ggml-sycl.cpp
(4 hunks)ggml/src/ggml-vulkan/ggml-vulkan.cpp
(1 hunks)ggml/src/ggml-vulkan/vulkan-shaders/scale.comp
(1 hunks)ggml/src/ggml.c
(4 hunks)tests/test-backend-ops.cpp
(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: ngxson
PR: ngxson/llama.cpp#14
File: examples/lookahead/lookahead.cpp:97-98
Timestamp: 2025-03-13T22:56:47.417Z
Learning: ngxson prefers to prioritize simplification in some cases and defer adding error handling to a later time, as indicated by their response to a suggestion about adding error checks for llama_decode_ext calls.
🧬 Code Graph Analysis (1)
tests/test-backend-ops.cpp (2)
ggml/include/ggml.h (4)
ggml_new_tensor
(743-747)ggml_set_param
(802-802)ggml_set_name
(795-795)ggml_scale_bias
(1301-1305)ggml/src/ggml.c (4)
ggml_new_tensor
(1688-1694)ggml_set_param
(6797-6800)ggml_set_name
(1790-1797)ggml_scale_bias
(3101-3107)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: build-linux-cross / ubuntu-24-arm64-vulkan-cross
- GitHub Check: build-linux-cross / debian-13-loongarch64-vulkan-cross
- GitHub Check: windows-latest-cmake (cpu-x64 (static), x64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmak...
- GitHub Check: build-linux-cross / ubuntu-24-ppc64el-cpu-cross
- GitHub Check: windows-msys2 (CLANG64, clang-x86_64, Release)
- GitHub Check: build-linux-cross / ubuntu-24-riscv64-vulkan-cross
- GitHub Check: build-cmake-pkg / linux
- GitHub Check: windows-latest-cmake (vulkan-x64, x64, -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DLLAMA_BUILD...
- GitHub Check: build-linux-cross / ubuntu-24-riscv64-cpu-cross
- GitHub Check: build-linux-cross / debian-13-loongarch64-cpu-cross
- GitHub Check: windows-msys2 (UCRT64, ucrt-x86_64, Release)
- GitHub Check: build-linux-cross / ubuntu-24-ppc64el-vulkan-cross
- GitHub Check: windows-latest-cmake (llvm-arm64-opencl-adreno, arm64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN...
- GitHub Check: windows-latest-cmake (llvm-arm64, arm64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/ar...
- GitHub Check: windows-latest-cmake (openblas-x64, x64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x6...
- GitHub Check: ubuntu-22-cmake-hip
- GitHub Check: server (UNDEFINED, RelWithDebInfo)
- GitHub Check: server (ADDRESS, RelWithDebInfo)
- GitHub Check: server-windows
- GitHub Check: server (Release)
🔇 Additional comments (29)
ggml/src/ggml-vulkan/vulkan-shaders/scale.comp (1)
21-21
: LGTM! Clean implementation of affine transformation.The modification correctly implements the scale-plus-bias operation (
y = ax + b
) by adding the bias termparam2
to the existing scale operation. The use of appropriate type casting and the fused multiply-add pattern aligns well with the broader PR objective of extending tensor scaling operations across all backends.ggml/src/ggml-metal/ggml-metal.metal (2)
1017-1019
: LGTM! Clean implementation of affine transformation.The addition of the bias parameter correctly implements the affine transformation
x = s * a + b
. The kernel signature and computation are both properly updated.
1026-1028
: LGTM! Consistent vectorized implementation.The float4 vectorized version correctly applies the scalar bias to each component of the vector, maintaining consistency with the scalar kernel implementation.
ggml/src/ggml-opencl/kernels/scale.cl (2)
11-12
: LGTM: Clean parameter addition for bias support.The addition of the
float bias
parameter to the kernel signature is well-structured and follows OpenCL conventions. This change enables affine transformations and aligns with the broader backend consistency improvements.
16-16
: LGTM: Correct affine transformation implementation.The computation correctly implements the affine transformation
x * scale + bias
. The scalar bias value will be properly broadcast to all components of thefloat4
vector, which is the expected behavior for element-wise operations.ggml/src/ggml-vulkan/ggml-vulkan.cpp (1)
7511-7511
: LGTM! Successfully implements bias parameter for scale operations.The change correctly forwards the bias parameter (
op_params[1]
) to the Vulkan compute shader, replacing the previous hardcoded zero. This aligns perfectly with the PR objective of extending scale operations to support affine transformations (x = s * a + b) and maintains consistency with similar updates across all other backends.ggml/include/ggml.h (1)
1300-1312
: LGTM! Well-structured API extension.The new
ggml_scale_bias
andggml_scale_bias_inplace
functions are well-designed and follow the library's established patterns. The comment clearly explains the mathematical operation, parameter naming is intuitive, and both regular and in-place variants are provided consistently with other GGML operations.ggml/src/ggml-opencl/ggml-opencl.cpp (1)
5605-5606
: Kernel arg order update – double-check all back-ends
bias
is passed as the 6-th kernel argument (index 5
).
Ensure the corresponding OpenCL kernel (kernel_scale
) and all other back-end scale kernels have been updated to accept this extra parameter in the same position; otherwise you’ll silently overwrite work-group sizes or run with an uninitialised argument.ggml/src/ggml-metal/ggml-metal.m (2)
2258-2259
: LGTM! Parameter extraction is correct.The addition of bias parameter extraction is well-implemented and consistent with the existing scale parameter extraction pattern. The indexing (scale at [0], bias at [1]) follows the expected parameter layout for the new affine transformation functionality.
2276-2276
: LGTM! Encoder parameter setting is correct.The bias parameter is properly set as a byte buffer at the correct index (3) using the same pattern as the scale parameter. This maintains consistency with the Metal compute encoder API usage.
ggml/src/ggml-cpu/ops.cpp (1)
4674-4674
: ggml_vec_mad1_f32 signature verifiedConfirmed that
ggml_vec_mad1_f32
is defined inggml/src/ggml-cpu/vec.h
as:inline static void ggml_vec_mad1_f32(const int n, float *y, const float s, const float b) { … }It matches the call in
ops.cpp
, so no changes are needed.tests/test-backend-ops.cpp (5)
2371-2371
: LGTM! Bias member variable added correctly.The addition of the
bias
member variable properly extends thetest_scale
struct to support affine transformations.
2374-2374
: LGTM! Parameter representation updated correctly.The
vars()
method now correctly includes all four parameters (type, ne, scale, bias) in the string representation for debugging and logging purposes.
2379-2381
: LGTM! Constructor parameters updated appropriately.The constructor correctly adds the
bias
parameter with a sensible default value of 0.0f, maintaining backward compatibility while enabling the new bias functionality.
2388-2388
: LGTM! Graph building updated to use new API.The build_graph method correctly calls
ggml_scale_bias()
instead of the previousggml_scale()
function, properly passing both scale and bias parameters to support the extended functionality.
5049-5049
: LGTM! New test case exercises scale with bias.The addition of a test case with explicit scale (2.0f) and bias (1.0f) values properly validates the new scale-with-bias functionality, complementing the existing default test case.
ggml/src/ggml-cuda/scale.cu (6)
3-3
: LGTM: Kernel signature correctly updated for bias support.The addition of the
bias
parameter to the kernel signature is well-positioned and follows CUDA conventions.
10-10
: LGTM: Affine transformation correctly implemented.The computation
dst[i] = scale * x[i] + bias
properly implements the affine transformation and maintains backward compatibility when bias is zero.
13-13
: LGTM: Host wrapper function signature updated consistently.The function signature matches the kernel parameters, maintaining consistency across the implementation.
15-15
: LGTM: Kernel launch updated with correct parameter passing.The kernel launch correctly passes all parameters including the new bias parameter in the proper order.
30-30
: LGTM: Function call updated with correct parameter passing.The call to
scale_f32_cuda
correctly passes both scale and bias parameters, maintaining consistency with the updated function signature.
27-28
: Verify SCALE operator populates both scale and bias in op_paramsPlease confirm that the CPU‐side implementation of the SCALE operator writes two floats into
op_params
—one forscale
and one forbias
—to match the GPU kernel’s read of[0]
and[1]
. A missing entry could lead to out-of-bounds reads or undefined behavior.• File: ggml/src/ggml.c
Function: ggml_scale (or whereverGGML_OP_SCALE
is set)
Check that it calls something like:ggml_set_op_params_f32(result, 0, scale); ggml_set_op_params_f32(result, 1, bias);• If it uses
memcpy
onresult->op_params
, ensure there are two copies for both values.ggml/src/ggml-sycl/ggml-sycl.cpp (3)
1698-1708
: LGTM: Correct implementation of affine transformationThe kernel correctly implements the updated scale operation with bias support. The computation
dst[i] = scale * x[i] + bias
properly realizes the affine transformation (x = s * a + b) as intended.
1845-1855
: LGTM: Consistent parameter threadingThe wrapper function correctly includes the bias parameter and passes it through to the kernel. The implementation maintains consistency with the kernel interface updates.
2321-2324
: Verify parameter array bounds and safetyThe implementation correctly extracts scale and bias parameters and passes them to the SYCL function. However, consider verifying that:
dst->op_params
is not null before dereferencing- The op_params array has at least 2 elements before accessing indices 0 and 1
- The parameter types match the expected float format
The core logic is correct and consistent with the bias extension.
ggml/src/ggml.c (4)
3068-3085
: Well-implemented bias parameter addition to scale operation.The addition of the bias parameter
b
toggml_scale_impl
correctly extends the operation fromx = s * a
tox = s * a + b
, enabling affine transformations. The parameter storage change from a single float to a 2-element array is properly handled, and the function signature change is clean.
3087-3099
: Excellent backward compatibility preservation.The existing
ggml_scale
andggml_scale_inplace
functions correctly maintain their original behavior by passing0.0
as the bias parameter to the updated implementation. This ensures no breaking changes for existing code.
3101-3115
: New API functions follow consistent patterns.The new
ggml_scale_bias
andggml_scale_bias_inplace
functions properly follow the existing naming conventions and parameter patterns in the codebase. The implementation correctly delegates toggml_scale_impl
with the appropriateinplace
flag.
5875-5875
: Backward pass correctly updated for new parameter format.The scale operation backward pass properly maintains existing behavior by passing
0.0
as the bias parameter, ensuring gradients are computed correctly for the affine transformation.
@@ -5777,7 +5795,7 @@ static void ggml_compute_backward( | |||
} break; | |||
case GGML_OP_MEAN: { | |||
if (src0_needs_grads) { | |||
ggml_add1_or_set(ctx, cgraph, isrc0, ggml_scale_impl(ctx, grad, 1.0f/src0->ne[0], false)); | |||
ggml_add1_or_set(ctx, cgraph, isrc0, ggml_scale_impl(ctx, grad, 1.0f/src0->ne[0], 0.0, false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Verify consistency across all backend implementations.
The backward pass correctly maintains existing behavior by passing 0.0
as the bias parameter. However, ensure that all backend implementations (CPU, CUDA, Metal, OpenCL, SYCL, Vulkan, CANN) have been consistently updated to handle the new 2-element parameter array instead of a single float.
🏁 Script executed:
#!/bin/bash
# Verify that backend implementations handle the new parameter format
echo "Checking backend implementations for scale operation parameter handling..."
# Search for scale operation implementations in backends
rg -A 10 -B 5 "GGML_OP_SCALE" ggml/src/ggml-*/
# Look for parameter extraction patterns that might need updating
rg -A 5 "op_params.*scale\|scale.*op_params" ggml/src/ggml-*/
Length of output: 26206
Review CPU & other backends for updated SCALE op parameter handling
Let’s confirm that every frontend now reads both scale
and bias
from the 2-element op_params
array:
- ggml/src/ggml-cpu/ggml-cpu.c → verify
ggml_compute_forward_scale(params, …)
signature and body consume two floats - ggml/src/ggml-cuda/ggml-cuda.cu → inspect
ggml_cuda_op_scale
for two-param extraction - ggml/src/ggml-opencl/ggml-opencl.cpp → inspect
ggml_cl_scale
for two-param extraction - ggml/src/ggml-sycl/ggml-sycl.cpp → inspect
ggml_sycl_scale
for two-param extraction - ggml/src/ggml-metal/ggml-metal.m → inspect SCALE case to read both
op_params[0]
and[1]
- ggml/src/ggml-vulkan/ggml-vulkan.cpp → ensure
ggml_vk_scale
pipeline push includes both params
🏁 Script executed:
#!/bin/bash
# Verify SCALE op uses both scale & bias params in each backend
grep -n "ggml_compute_forward_scale" ggml/src/ggml-cpu/ggml-cpu.c
grep -n "ggml_cuda_op_scale" ggml/src/ggml-cuda/ggml-cuda.cu
grep -n "ggml_cl_scale" ggml/src/ggml-opencl/ggml-opencl.cpp
grep -n "ggml_sycl_scale" ggml/src/ggml-sycl/ggml-sycl.cpp
grep -n "case GGML_OP_SCALE" ggml/src/ggml-metal/ggml-metal.m
grep -n "ggml_vk_scale" ggml/src/ggml-vulkan/ggml-vulkan.cpp
Length of output: 1063
To confirm each backend correctly reads both scale
and bias
, let’s extract the SCALE implementations:
🏁 Script executed:
#!/bin/bash
# CPU: ggml_compute_forward_scale
sed -n '1770,1840p' ggml/src/ggml-cpu/ggml-cpu.c
# CUDA: ggml_cuda_op_scale
sed -n '2350,2450p' ggml/src/ggml-cuda/ggml-cuda.cu
# OpenCL: ggml_cl_scale
sed -n '5560,5660p' ggml/src/ggml-opencl/ggml-opencl.cpp
# SYCL: ggml_sycl_scale
sed -n '3500,3600p' ggml/src/ggml-sycl/ggml-sycl.cpp
# Metal: first GGML_OP_SCALE case
sed -n '1700,1800p' ggml/src/ggml-metal/ggml-metal.m
# Vulkan: ggml_vk_scale
sed -n '7480,7600p' ggml/src/ggml-vulkan/ggml-vulkan.cpp
# CANN: ggml_cann_scale
grep -R -n "ggml_cann_scale" -n ggml/src/ggml-cann
sed -n '1,200p' ggml/src/ggml-cann/ggml-cann.cpp
Length of output: 32299
To confirm each backend actually reads both scale
and bias
, let’s pull in their implementations:
🏁 Script executed:
#!/bin/bash
set -e
echo "CPU: locate compute_forward_scale definition"
grep -R -n "void ggml_compute_forward_scale" -n ggml/src/ggml-cpu/ggml-cpu.c || echo "not found"
echo
echo "CUDA: locate ggml_cuda_op_scale definition"
grep -R -n "void ggml_cuda_op_scale" -n ggml/src/ggml-cuda/ggml-cuda.cu || echo "not found"
echo
echo "SYCL: locate ggml_sycl_op_scale definition"
grep -R -n "void ggml_sycl_op_scale" -n ggml/src/ggml-sycl/ggml-sycl.cpp || echo "not found"
echo
echo "Metal: find kernel args for SCALE"
grep -R -n "kernel_scale" -n ggml/src/ggml-metal/ggml-metal.m || echo "not found"
grep -R -n "float scale" -n ggml/src/ggml-metal/ggml-metal.m || echo "not found"
echo
echo "CANN: inspect aclnn_ops implementation"
sed -n '340,430p' ggml/src/ggml-cann/aclnn_ops.cpp
Length of output: 4783
Ensure SCALE op consumes both scale & bias in every backend
We’ve added a second bias
parameter in op_params
, but only OpenCL, Metal, and Vulkan extract both elements. The following backends still need to be updated to read op_params[1]
(bias) in addition to op_params[0]
(scale):
-
CPU
File: ggml/src/ggml-cpu/ggml-cpu.c
Function: ggml_compute_forward_scale
– Change from reading a singlescale
inparams[0]
to also fetchingbias = params[1]
and adding it after multiplication. -
CUDA
File: ggml/src/ggml-cuda/ggml-cuda.cu
Function: ggml_cuda_op_scale
– Extract bothscale
andbias
fromdst->op_params
and incorporate bias in the kernel launch. -
SYCL
File: ggml/src/ggml-sycl/ggml-sycl.cpp
Function: ggml_sycl_op_scale
– Update to readbias
alongsidescale
and pass both into the SYCL kernel. -
CANN
File: ggml/src/ggml-cann/aclnn_ops.cpp
Function: ggml_cann_scale
– Currently only copies one float (scale
). Mustmemcpy
a second float forbias
and call the appropriate CANN op (e.g.Add
afterMuls
) or use a single fused kernel that supports bias.
OpenCL, Metal, and Vulkan already handle [scale, bias]
correctly.
🤖 Prompt for AI Agents
In ggml/src/ggml-cpu/ggml-cpu.c within the ggml_compute_forward_scale function,
update the code to read both scale and bias from params[0] and params[1], then
multiply the input by scale and add bias. In ggml/src/ggml-cuda/ggml-cuda.cu
inside ggml_cuda_op_scale, extract scale and bias from dst->op_params and modify
the CUDA kernel launch to apply both scale and bias. In
ggml/src/ggml-sycl/ggml-sycl.cpp in ggml_sycl_op_scale, similarly read both
scale and bias and pass them to the SYCL kernel to apply scaling and bias
addition. Finally, in ggml/src/ggml-cann/aclnn_ops.cpp in ggml_cann_scale,
update the memcpy to copy both scale and bias floats and adjust the CANN
operations to apply scale and then add bias, either by calling separate ops or
using a fused kernel that supports bias.
Summary by CodeRabbit
New Features
Tests