Skip to content

ggml_scale_bias #27

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 19 commits into from
Closed

ggml_scale_bias #27

wants to merge 19 commits into from

Conversation

ngxson
Copy link
Owner

@ngxson ngxson commented Jul 8, 2025

Summary by CodeRabbit

  • New Features

    • Introduced support for scaling tensors with an additional bias term, enabling affine transformations (x = s * a + b) across all supported backends including CPU, CUDA, Metal, OpenCL, SYCL, Vulkan, and CANN.
    • Added new API functions for scale-and-bias operations, both in-place and out-of-place.
  • Tests

    • Expanded and updated tests to cover scenarios involving both scaling and bias parameters.

Copy link

coderabbitai bot commented Jul 8, 2025

Walkthrough

This update extends the tensor scaling operation across all backends by introducing a bias term, enabling affine transformations (x = s * a + b). New API functions for scale-with-bias are added. All CPU, GPU, and accelerator backend implementations and kernels are updated to support the new bias parameter. Tests are also updated to validate the new functionality.

Changes

File(s) Change Summary
ggml/include/ggml.h Added ggml_scale_bias and ggml_scale_bias_inplace API declarations.
ggml/src/ggml.c Added bias support to scale operations, new API functions, updated backward computation accordingly.
ggml/src/ggml-cpu/ops.cpp Modified scale operation to apply both scale and bias using new fused multiply-add.
ggml/src/ggml-cpu/vec.h Added ggml_vec_mad1_f32 for in-place multiply-add with platform-specific optimizations.
ggml/src/ggml-cann/ggml-cann.cpp Scale op support now conditional: only supports bias==0.0f for now.
ggml/src/ggml-cuda/scale.cu Updated kernel and host code to support bias parameter in scaling.
ggml/src/ggml-metal/ggml-metal.m, ggml-metal.metal Metal backend and kernels updated to accept and use bias parameter.
ggml/src/ggml-opencl/ggml-opencl.cpp, kernels/scale.cl OpenCL backend and kernel updated to support and use bias parameter in scaling.
ggml/src/ggml-sycl/ggml-sycl.cpp SYCL backend and kernel updated for bias parameter in scaling.
ggml/src/ggml-vulkan/ggml-vulkan.cpp, vulkan-shaders/scale.comp Vulkan backend and shader updated to forward and use bias parameter.
tests/test-backend-ops.cpp Extended scale tests to include bias; added new test cases with nonzero bias.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant API
    participant Backend
    participant Kernel

    User->>API: ggml_scale_bias(ctx, a, s, b)
    API->>Backend: Prepare op_params = [s, b]
    Backend->>Kernel: Launch scale kernel with scale, bias
    Kernel->>Backend: Compute x = s * a + b
    Backend->>API: Return result tensor
    API->>User: Output tensor with affine transformation
Loading

Poem

A rabbit hopped with bias and scale,
Now tensors transform with a new tale.
Across CPU, GPU, and metal shine,
Affine magic in every line!
With tests that check each bunny leap,
The code grows strong, the math runs deep.
🐇✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cd1703a and ebbad77.

📒 Files selected for processing (2)
  • ggml/src/ggml-cpu/ops.cpp (2 hunks)
  • ggml/src/ggml-cpu/vec.h (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • ggml/src/ggml-cpu/vec.h
  • ggml/src/ggml-cpu/ops.cpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: build-cmake-pkg / linux
  • GitHub Check: ubuntu-22-cmake-sycl-fp16
  • GitHub Check: ubuntu-22-cmake-hip
  • GitHub Check: ubuntu-22-cmake-sycl
  • GitHub Check: ubuntu-latest-cmake-cuda
  • GitHub Check: android-build
  • GitHub Check: ubuntu-22-cmake-musa
  • GitHub Check: ubuntu-latest-cmake-sanitizer (UNDEFINED, Debug)
  • GitHub Check: ubuntu-22-cmake-vulkan
  • GitHub Check: ubuntu-latest-cmake-sanitizer (ADDRESS, Debug)
  • GitHub Check: ubuntu-latest-cmake-sanitizer (THREAD, Debug)
  • GitHub Check: ubuntu-cpu-cmake (x64, ubuntu-22.04)
  • GitHub Check: ubuntu-latest-llguidance
  • GitHub Check: ubuntu-cpu-cmake (arm64, ubuntu-22.04-arm)
  • GitHub Check: ubuntu-latest-cmake-rpc
  • GitHub Check: macOS-latest-cmake-arm64
  • GitHub Check: server-windows
  • GitHub Check: server (Release)
  • GitHub Check: server (UNDEFINED, RelWithDebInfo)
  • GitHub Check: server (ADDRESS, RelWithDebInfo)
✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@ngxson
Copy link
Owner Author

ngxson commented Jul 8, 2025

@coderabbitai nitpicking this plz

Copy link

coderabbitai bot commented Jul 8, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

@ngxson I'll conduct a detailed nitpicking review of this PR. Let me examine the changes first to understand what's being modified.


🏁 Script executed:

#!/bin/bash
# Get PR details and the changes
gh pr view 27 --json title,body,headRefName,baseRefName,changedFiles,additions,deletions

# Get the diff to understand what changes were made
git diff HEAD~1 --name-only

# Get detailed diff for all changed files
git diff HEAD~1 --stat

Length of output: 611


🏁 Script executed:

#!/bin/bash
# Let's check the repository structure and see what files exist
echo "Repository structure:"
find . -name "*.h" -o -name "*.c" -o -name "*.cpp" -o -name "*.cu" -o -name "*.cl" -o -name "*.comp" -o -name "*.m" -o -name "*.metal" | head -20

# Check if the files mentioned in the automated comment exist
echo -e "\n\nChecking files mentioned in the automated comment:"
files=(
    "ggml/include/ggml.h"
    "ggml/src/ggml-cann/ggml-cann.cpp"
    "ggml/src/ggml-cpu/ops.cpp"
    "ggml/src/ggml-cpu/vec.h"
    "ggml/src/ggml-cuda/scale.cu"
    "ggml/src/ggml-metal/ggml-metal.m"
    "ggml/src/ggml-metal/ggml-metal.metal"
    "ggml/src/ggml-opencl/ggml-opencl.cpp"
    "ggml/src/ggml-opencl/kernels/scale.cl"
    "ggml/src/ggml-sycl/ggml-sycl.cpp"
    "ggml/src/ggml-vulkan/ggml-vulkan.cpp"
    "ggml/src/ggml-vulkan/vulkan-shaders/scale.comp"
    "ggml/src/ggml.c"
    "tests/test-backend-ops.cpp"
)

for file in "${files[@]}"; do
    if [ -f "$file" ]; then
        echo "✓ $file exists"
    else
        echo "✗ $file not found"
    fi
done

Length of output: 3497


🏁 Script executed:

#!/bin/bash
# Let's examine the key files to understand what changes might be related to scale operations
echo "=== Examining ggml.h for scale-related declarations ==="
grep -n -A 3 -B 3 "scale" ggml/include/ggml.h

echo -e "\n=== Examining ggml.c for scale-related implementations ==="
grep -n -A 5 -B 5 "scale" ggml/src/ggml.c | head -30

echo -e "\n=== Examining CPU ops for scale operations ==="
grep -n -A 5 -B 5 "scale" ggml/src/ggml-cpu/ops.cpp | head -20

echo -e "\n=== Examining CUDA scale implementation ==="
cat ggml/src/ggml-cuda/scale.cu | head -50

Length of output: 10025


🏁 Script executed:

#!/bin/bash
# Let's examine the test file to understand what's being tested
echo "=== Examining test file for scale operations ==="
grep -n -A 10 -B 5 "scale" tests/test-backend-ops.cpp

echo -e "\n=== Let's look at the Metal implementation ==="
grep -n -A 10 -B 5 "scale" ggml/src/ggml-metal/ggml-metal.m | head -40

echo -e "\n=== Let's look at the OpenCL implementation ==="
grep -n -A 10 -B 5 "scale" ggml/src/ggml-opencl/ggml-opencl.cpp | head -30

echo -e "\n=== Let's examine the OpenCL kernel ==="
cat ggml/src/ggml-opencl/kernels/scale.cl

Length of output: 23856


🏁 Script executed:

#!/bin/bash
# Let's examine the SYCL and Vulkan implementations to understand consistency across backends
echo "=== SYCL implementation ==="
grep -n -A 15 -B 5 "scale" ggml/src/ggml-sycl/ggml-sycl.cpp | head -40

echo -e "\n=== Vulkan implementation ==="
grep -n -A 10 -B 5 "scale" ggml/src/ggml-vulkan/ggml-vulkan.cpp | head -30

echo -e "\n=== Vulkan shader ==="
cat ggml/src/ggml-vulkan/vulkan-shaders/scale.comp

echo -e "\n=== Let's check the Metal shader ==="
grep -n -A 15 -B 5 "scale" ggml/src/ggml-metal/ggml-metal.metal | head -40

Length of output: 5964


🏁 Script executed:

#!/bin/bash
# Let's examine the CPU vectorized implementation to check for consistency
echo "=== CPU vectorized implementation ==="
grep -n -A 20 -B 5 "ggml_vec_scale" ggml/src/ggml-cpu/vec.h

echo -e "\n=== Let's check the main ggml.c implementation of scale operations ==="
grep -n -A 25 -B 5 "ggml_scale_impl" ggml/src/ggml.c

echo -e "\n=== Let's check for any potential issues in parameter handling ==="
grep -n -A 10 -B 5 "op_params" ggml/src/ggml.c | grep -A 10 -B 5 "scale"

echo -e "\n=== Let's check the CANN implementation ==="
grep -n -A 10 -B 5 "scale" ggml/src/ggml-cann/ggml-cann.cpp

Length of output: 22445


@ngxson I'll provide a detailed nitpicking review of this scale operation implementation across all backends. Here are the issues I found:

🐛 Critical Issues

1. Gradient Computation Bug

Location: ggml.c:5874-5876

case GGML_OP_SCALE: {
    if (src0_needs_grads) {
        float s;
        memcpy(&s, tensor->op_params, sizeof(float));  // Only extracts scale
        ggml_add_or_set(ctx, cgraph, isrc0, ggml_scale_impl(ctx, grad, s, 0.0, false));
    }
} break;

Issue: The gradient computation only extracts the scale parameter but completely ignores the bias parameter. This means gradients will be incorrect when using ggml_scale_bias. The bias should also be considered in gradient computation.

Fix: Should be:

float s, b;
memcpy(&s, tensor->op_params, sizeof(float));
memcpy(&b, tensor->op_params + sizeof(float), sizeof(float));

🔧 Code Quality Issues

2. Inconsistent Parameter Extraction Patterns

Backends use different patterns for extracting parameters:

CUDA (good):

float scale = ((const float *)(dst->op_params))[0];
float bias  = ((const float *)(dst->op_params))[1];

OpenCL (inconsistent):

// Uses different variable names and spacing
float scale = *(float *)&dst->op_params[0];
float bias = *(float *)&dst->op_params[sizeof(float)];

3. Missing Const Correctness

Location: Multiple backends

// Should be const
const float scale = ((const float *)(dst->op_params))[0];
const float bias  = ((const float *)(dst->op_params))[1];

4. Redundant Code in Vulkan Shader

Location: vulkan-shaders/scale.comp

const uint num_iter = 4;
[[unroll]] for (uint i = 0; i < num_iter; ++i) {
    if (idx >= p.ne) {
        continue;  // This check is inside the loop
    }
    // ...
    idx += num_threads;
}

Issue: The bounds check idx >= p.ne is performed inside the unrolled loop, which is inefficient.

5. Missing Error Handling

Location: Multiple backends

// No validation that op_params contains valid float values
// No check for NaN or infinite values
// No bounds checking on tensor dimensions

6. Suboptimal OpenCL Kernel

Location: ggml-opencl/kernels/scale.cl

kernel void kernel_scale(
        global float4 * src0,  // Uses float4 but no vectorization benefits
        ulong offset0,
        global float4 * dst,
        ulong offsetd,
        float scale,
        float bias
) {
    // Simple element-wise operation doesn't benefit from float4
    dst[get_global_id(0)] = src0[get_global_id(0)] * scale + bias;
}

Issue: Using float4 without actual vectorization benefits adds complexity.

📚 Documentation Issues

7. Missing Function Documentation

Location: ggml.h:1289-1298

GGML_API struct ggml_tensor * ggml_scale(
        struct ggml_context * ctx,
        struct ggml_tensor  * a,
        float                 s);

// in-place, returns view(a)
GGML_API struct ggml_tensor * ggml_scale_inplace(
        struct ggml_context * ctx,
        struct ggml_tensor  * a,
        float                 s);

Issue: Missing documentation for parameters and return values. The scale_bias functions lack any documentation.

8. Inconsistent Naming Convention

Location: Various files

// Inconsistent naming across backends
scale_f32_cuda()      // CUDA
scale_f32_sycl()      // SYCL  
kernel_scale()        // OpenCL
// Some use underscore, others don't

Performance Issues

9. Suboptimal CPU Vectorization

Location: ggml-cpu/vec.h:419-476

inline static void ggml_vec_scale_f32(const int n, float * y, const float v) {
    // Long switch between different SIMD implementations
    // Could benefit from better branch prediction
}

Issue: The function has many conditional compilation branches that could impact performance.

10. Missing Bounds Checking

Location: Multiple backends

// No validation that tensor dimensions match expected sizes
// No overflow checks when calculating offsets

🧪 Test Coverage Issues

11. Limited Edge Case Testing

Location: tests/test-backend-ops.cpp:5048-5049

test_cases.emplace_back(new test_scale());
test_cases.emplace_back(new test_scale(GGML_TYPE_F32, {10, 10, 10, 10}, 2.0f, 1.0f));

Issue: Missing tests for:

  • Zero scale values
  • Negative scale values
  • Very large/small values
  • NaN/infinity handling
  • Different tensor types (F16, etc.)

🔄 Consistency Issues

12. Different Block Sizes

Location: Various backend implementations

// CUDA
#define CUDA_SCALE_BLOCK_SIZE 256

// SYCL  
#define SYCL_SCALE_BLOCK_SIZE 256

// But OpenCL uses different approach
const uint num_threads = 128;  // Vulkan

Issue: Inconsistent block sizes across backends without clear justification.

These issues range from critical bugs (gradient computation) to minor style inconsistencies. The gradient computation bug should be addressed immediately as it affects correctness of backpropagation.


🧠 Learnings used
Learnt from: ngxson
PR: ngxson/llama.cpp#14
File: examples/lookahead/lookahead.cpp:97-98
Timestamp: 2025-03-13T22:56:47.417Z
Learning: ngxson prefers to prioritize simplification in some cases and defer adding error handling to a later time, as indicated by their response to a suggestion about adding error checks for llama_decode_ext calls.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6efcd65 and 0e51a0a.

📒 Files selected for processing (14)
  • ggml/include/ggml.h (1 hunks)
  • ggml/src/ggml-cann/ggml-cann.cpp (1 hunks)
  • ggml/src/ggml-cpu/ops.cpp (2 hunks)
  • ggml/src/ggml-cpu/vec.h (1 hunks)
  • ggml/src/ggml-cuda/scale.cu (2 hunks)
  • ggml/src/ggml-metal/ggml-metal.m (2 hunks)
  • ggml/src/ggml-metal/ggml-metal.metal (1 hunks)
  • ggml/src/ggml-opencl/ggml-opencl.cpp (2 hunks)
  • ggml/src/ggml-opencl/kernels/scale.cl (1 hunks)
  • ggml/src/ggml-sycl/ggml-sycl.cpp (4 hunks)
  • ggml/src/ggml-vulkan/ggml-vulkan.cpp (1 hunks)
  • ggml/src/ggml-vulkan/vulkan-shaders/scale.comp (1 hunks)
  • ggml/src/ggml.c (4 hunks)
  • tests/test-backend-ops.cpp (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: ngxson
PR: ngxson/llama.cpp#14
File: examples/lookahead/lookahead.cpp:97-98
Timestamp: 2025-03-13T22:56:47.417Z
Learning: ngxson prefers to prioritize simplification in some cases and defer adding error handling to a later time, as indicated by their response to a suggestion about adding error checks for llama_decode_ext calls.
🧬 Code Graph Analysis (1)
tests/test-backend-ops.cpp (2)
ggml/include/ggml.h (4)
  • ggml_new_tensor (743-747)
  • ggml_set_param (802-802)
  • ggml_set_name (795-795)
  • ggml_scale_bias (1301-1305)
ggml/src/ggml.c (4)
  • ggml_new_tensor (1688-1694)
  • ggml_set_param (6797-6800)
  • ggml_set_name (1790-1797)
  • ggml_scale_bias (3101-3107)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: build-linux-cross / ubuntu-24-arm64-vulkan-cross
  • GitHub Check: build-linux-cross / debian-13-loongarch64-vulkan-cross
  • GitHub Check: windows-latest-cmake (cpu-x64 (static), x64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmak...
  • GitHub Check: build-linux-cross / ubuntu-24-ppc64el-cpu-cross
  • GitHub Check: windows-msys2 (CLANG64, clang-x86_64, Release)
  • GitHub Check: build-linux-cross / ubuntu-24-riscv64-vulkan-cross
  • GitHub Check: build-cmake-pkg / linux
  • GitHub Check: windows-latest-cmake (vulkan-x64, x64, -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DLLAMA_BUILD...
  • GitHub Check: build-linux-cross / ubuntu-24-riscv64-cpu-cross
  • GitHub Check: build-linux-cross / debian-13-loongarch64-cpu-cross
  • GitHub Check: windows-msys2 (UCRT64, ucrt-x86_64, Release)
  • GitHub Check: build-linux-cross / ubuntu-24-ppc64el-vulkan-cross
  • GitHub Check: windows-latest-cmake (llvm-arm64-opencl-adreno, arm64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN...
  • GitHub Check: windows-latest-cmake (llvm-arm64, arm64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/ar...
  • GitHub Check: windows-latest-cmake (openblas-x64, x64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x6...
  • GitHub Check: ubuntu-22-cmake-hip
  • GitHub Check: server (UNDEFINED, RelWithDebInfo)
  • GitHub Check: server (ADDRESS, RelWithDebInfo)
  • GitHub Check: server-windows
  • GitHub Check: server (Release)
🔇 Additional comments (29)
ggml/src/ggml-vulkan/vulkan-shaders/scale.comp (1)

21-21: LGTM! Clean implementation of affine transformation.

The modification correctly implements the scale-plus-bias operation (y = ax + b) by adding the bias term param2 to the existing scale operation. The use of appropriate type casting and the fused multiply-add pattern aligns well with the broader PR objective of extending tensor scaling operations across all backends.

ggml/src/ggml-metal/ggml-metal.metal (2)

1017-1019: LGTM! Clean implementation of affine transformation.

The addition of the bias parameter correctly implements the affine transformation x = s * a + b. The kernel signature and computation are both properly updated.


1026-1028: LGTM! Consistent vectorized implementation.

The float4 vectorized version correctly applies the scalar bias to each component of the vector, maintaining consistency with the scalar kernel implementation.

ggml/src/ggml-opencl/kernels/scale.cl (2)

11-12: LGTM: Clean parameter addition for bias support.

The addition of the float bias parameter to the kernel signature is well-structured and follows OpenCL conventions. This change enables affine transformations and aligns with the broader backend consistency improvements.


16-16: LGTM: Correct affine transformation implementation.

The computation correctly implements the affine transformation x * scale + bias. The scalar bias value will be properly broadcast to all components of the float4 vector, which is the expected behavior for element-wise operations.

ggml/src/ggml-vulkan/ggml-vulkan.cpp (1)

7511-7511: LGTM! Successfully implements bias parameter for scale operations.

The change correctly forwards the bias parameter (op_params[1]) to the Vulkan compute shader, replacing the previous hardcoded zero. This aligns perfectly with the PR objective of extending scale operations to support affine transformations (x = s * a + b) and maintains consistency with similar updates across all other backends.

ggml/include/ggml.h (1)

1300-1312: LGTM! Well-structured API extension.

The new ggml_scale_bias and ggml_scale_bias_inplace functions are well-designed and follow the library's established patterns. The comment clearly explains the mathematical operation, parameter naming is intuitive, and both regular and in-place variants are provided consistently with other GGML operations.

ggml/src/ggml-opencl/ggml-opencl.cpp (1)

5605-5606: Kernel arg order update – double-check all back-ends

bias is passed as the 6-th kernel argument (index 5).
Ensure the corresponding OpenCL kernel (kernel_scale) and all other back-end scale kernels have been updated to accept this extra parameter in the same position; otherwise you’ll silently overwrite work-group sizes or run with an uninitialised argument.

ggml/src/ggml-metal/ggml-metal.m (2)

2258-2259: LGTM! Parameter extraction is correct.

The addition of bias parameter extraction is well-implemented and consistent with the existing scale parameter extraction pattern. The indexing (scale at [0], bias at [1]) follows the expected parameter layout for the new affine transformation functionality.


2276-2276: LGTM! Encoder parameter setting is correct.

The bias parameter is properly set as a byte buffer at the correct index (3) using the same pattern as the scale parameter. This maintains consistency with the Metal compute encoder API usage.

ggml/src/ggml-cpu/ops.cpp (1)

4674-4674: ggml_vec_mad1_f32 signature verified

Confirmed that ggml_vec_mad1_f32 is defined in ggml/src/ggml-cpu/vec.h as:

inline static void ggml_vec_mad1_f32(const int n, float *y, const float s, const float b) { … }

It matches the call in ops.cpp, so no changes are needed.

tests/test-backend-ops.cpp (5)

2371-2371: LGTM! Bias member variable added correctly.

The addition of the bias member variable properly extends the test_scale struct to support affine transformations.


2374-2374: LGTM! Parameter representation updated correctly.

The vars() method now correctly includes all four parameters (type, ne, scale, bias) in the string representation for debugging and logging purposes.


2379-2381: LGTM! Constructor parameters updated appropriately.

The constructor correctly adds the bias parameter with a sensible default value of 0.0f, maintaining backward compatibility while enabling the new bias functionality.


2388-2388: LGTM! Graph building updated to use new API.

The build_graph method correctly calls ggml_scale_bias() instead of the previous ggml_scale() function, properly passing both scale and bias parameters to support the extended functionality.


5049-5049: LGTM! New test case exercises scale with bias.

The addition of a test case with explicit scale (2.0f) and bias (1.0f) values properly validates the new scale-with-bias functionality, complementing the existing default test case.

ggml/src/ggml-cuda/scale.cu (6)

3-3: LGTM: Kernel signature correctly updated for bias support.

The addition of the bias parameter to the kernel signature is well-positioned and follows CUDA conventions.


10-10: LGTM: Affine transformation correctly implemented.

The computation dst[i] = scale * x[i] + bias properly implements the affine transformation and maintains backward compatibility when bias is zero.


13-13: LGTM: Host wrapper function signature updated consistently.

The function signature matches the kernel parameters, maintaining consistency across the implementation.


15-15: LGTM: Kernel launch updated with correct parameter passing.

The kernel launch correctly passes all parameters including the new bias parameter in the proper order.


30-30: LGTM: Function call updated with correct parameter passing.

The call to scale_f32_cuda correctly passes both scale and bias parameters, maintaining consistency with the updated function signature.


27-28: Verify SCALE operator populates both scale and bias in op_params

Please confirm that the CPU‐side implementation of the SCALE operator writes two floats into op_params—one for scale and one for bias—to match the GPU kernel’s read of [0] and [1]. A missing entry could lead to out-of-bounds reads or undefined behavior.

• File: ggml/src/ggml.c
Function: ggml_scale (or wherever GGML_OP_SCALE is set)
Check that it calls something like:

ggml_set_op_params_f32(result, 0, scale);
ggml_set_op_params_f32(result, 1, bias);

• If it uses memcpy on result->op_params, ensure there are two copies for both values.

ggml/src/ggml-sycl/ggml-sycl.cpp (3)

1698-1708: LGTM: Correct implementation of affine transformation

The kernel correctly implements the updated scale operation with bias support. The computation dst[i] = scale * x[i] + bias properly realizes the affine transformation (x = s * a + b) as intended.


1845-1855: LGTM: Consistent parameter threading

The wrapper function correctly includes the bias parameter and passes it through to the kernel. The implementation maintains consistency with the kernel interface updates.


2321-2324: Verify parameter array bounds and safety

The implementation correctly extracts scale and bias parameters and passes them to the SYCL function. However, consider verifying that:

  • dst->op_params is not null before dereferencing
  • The op_params array has at least 2 elements before accessing indices 0 and 1
  • The parameter types match the expected float format

The core logic is correct and consistent with the bias extension.

ggml/src/ggml.c (4)

3068-3085: Well-implemented bias parameter addition to scale operation.

The addition of the bias parameter b to ggml_scale_impl correctly extends the operation from x = s * a to x = s * a + b, enabling affine transformations. The parameter storage change from a single float to a 2-element array is properly handled, and the function signature change is clean.


3087-3099: Excellent backward compatibility preservation.

The existing ggml_scale and ggml_scale_inplace functions correctly maintain their original behavior by passing 0.0 as the bias parameter to the updated implementation. This ensures no breaking changes for existing code.


3101-3115: New API functions follow consistent patterns.

The new ggml_scale_bias and ggml_scale_bias_inplace functions properly follow the existing naming conventions and parameter patterns in the codebase. The implementation correctly delegates to ggml_scale_impl with the appropriate inplace flag.


5875-5875: Backward pass correctly updated for new parameter format.

The scale operation backward pass properly maintains existing behavior by passing 0.0 as the bias parameter, ensuring gradients are computed correctly for the affine transformation.

@@ -5777,7 +5795,7 @@ static void ggml_compute_backward(
} break;
case GGML_OP_MEAN: {
if (src0_needs_grads) {
ggml_add1_or_set(ctx, cgraph, isrc0, ggml_scale_impl(ctx, grad, 1.0f/src0->ne[0], false));
ggml_add1_or_set(ctx, cgraph, isrc0, ggml_scale_impl(ctx, grad, 1.0f/src0->ne[0], 0.0, false));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify consistency across all backend implementations.

The backward pass correctly maintains existing behavior by passing 0.0 as the bias parameter. However, ensure that all backend implementations (CPU, CUDA, Metal, OpenCL, SYCL, Vulkan, CANN) have been consistently updated to handle the new 2-element parameter array instead of a single float.


🏁 Script executed:

#!/bin/bash
# Verify that backend implementations handle the new parameter format
echo "Checking backend implementations for scale operation parameter handling..."

# Search for scale operation implementations in backends
rg -A 10 -B 5 "GGML_OP_SCALE" ggml/src/ggml-*/

# Look for parameter extraction patterns that might need updating
rg -A 5 "op_params.*scale\|scale.*op_params" ggml/src/ggml-*/

Length of output: 26206


Review CPU & other backends for updated SCALE op parameter handling

Let’s confirm that every frontend now reads both scale and bias from the 2-element op_params array:

  • ggml/src/ggml-cpu/ggml-cpu.c → verify ggml_compute_forward_scale(params, …) signature and body consume two floats
  • ggml/src/ggml-cuda/ggml-cuda.cu → inspect ggml_cuda_op_scale for two-param extraction
  • ggml/src/ggml-opencl/ggml-opencl.cpp → inspect ggml_cl_scale for two-param extraction
  • ggml/src/ggml-sycl/ggml-sycl.cpp → inspect ggml_sycl_scale for two-param extraction
  • ggml/src/ggml-metal/ggml-metal.m → inspect SCALE case to read both op_params[0] and [1]
  • ggml/src/ggml-vulkan/ggml-vulkan.cpp → ensure ggml_vk_scale pipeline push includes both params

🏁 Script executed:

#!/bin/bash
# Verify SCALE op uses both scale & bias params in each backend

grep -n "ggml_compute_forward_scale" ggml/src/ggml-cpu/ggml-cpu.c
grep -n "ggml_cuda_op_scale"      ggml/src/ggml-cuda/ggml-cuda.cu
grep -n "ggml_cl_scale"           ggml/src/ggml-opencl/ggml-opencl.cpp
grep -n "ggml_sycl_scale"         ggml/src/ggml-sycl/ggml-sycl.cpp
grep -n "case GGML_OP_SCALE"      ggml/src/ggml-metal/ggml-metal.m
grep -n "ggml_vk_scale"           ggml/src/ggml-vulkan/ggml-vulkan.cpp

Length of output: 1063


To confirm each backend correctly reads both scale and bias, let’s extract the SCALE implementations:


🏁 Script executed:

#!/bin/bash
# CPU: ggml_compute_forward_scale
sed -n '1770,1840p' ggml/src/ggml-cpu/ggml-cpu.c

# CUDA: ggml_cuda_op_scale
sed -n '2350,2450p' ggml/src/ggml-cuda/ggml-cuda.cu

# OpenCL: ggml_cl_scale
sed -n '5560,5660p' ggml/src/ggml-opencl/ggml-opencl.cpp

# SYCL: ggml_sycl_scale
sed -n '3500,3600p' ggml/src/ggml-sycl/ggml-sycl.cpp

# Metal: first GGML_OP_SCALE case
sed -n '1700,1800p' ggml/src/ggml-metal/ggml-metal.m

# Vulkan: ggml_vk_scale
sed -n '7480,7600p' ggml/src/ggml-vulkan/ggml-vulkan.cpp

# CANN: ggml_cann_scale
grep -R -n "ggml_cann_scale" -n ggml/src/ggml-cann
sed -n '1,200p' ggml/src/ggml-cann/ggml-cann.cpp

Length of output: 32299


To confirm each backend actually reads both scale and bias, let’s pull in their implementations:


🏁 Script executed:

#!/bin/bash
set -e

echo "CPU: locate compute_forward_scale definition"
grep -R -n "void ggml_compute_forward_scale" -n ggml/src/ggml-cpu/ggml-cpu.c || echo "not found"

echo
echo "CUDA: locate ggml_cuda_op_scale definition"
grep -R -n "void ggml_cuda_op_scale" -n ggml/src/ggml-cuda/ggml-cuda.cu || echo "not found"

echo
echo "SYCL: locate ggml_sycl_op_scale definition"
grep -R -n "void ggml_sycl_op_scale" -n ggml/src/ggml-sycl/ggml-sycl.cpp || echo "not found"

echo
echo "Metal: find kernel args for SCALE"
grep -R -n "kernel_scale" -n ggml/src/ggml-metal/ggml-metal.m || echo "not found"
grep -R -n "float scale" -n ggml/src/ggml-metal/ggml-metal.m || echo "not found"

echo
echo "CANN: inspect aclnn_ops implementation"
sed -n '340,430p' ggml/src/ggml-cann/aclnn_ops.cpp

Length of output: 4783


Ensure SCALE op consumes both scale & bias in every backend

We’ve added a second bias parameter in op_params, but only OpenCL, Metal, and Vulkan extract both elements. The following backends still need to be updated to read op_params[1] (bias) in addition to op_params[0] (scale):

  • CPU
    File: ggml/src/ggml-cpu/ggml-cpu.c
    Function: ggml_compute_forward_scale
    – Change from reading a single scale in params[0] to also fetching bias = params[1] and adding it after multiplication.

  • CUDA
    File: ggml/src/ggml-cuda/ggml-cuda.cu
    Function: ggml_cuda_op_scale
    – Extract both scale and bias from dst->op_params and incorporate bias in the kernel launch.

  • SYCL
    File: ggml/src/ggml-sycl/ggml-sycl.cpp
    Function: ggml_sycl_op_scale
    – Update to read bias alongside scale and pass both into the SYCL kernel.

  • CANN
    File: ggml/src/ggml-cann/aclnn_ops.cpp
    Function: ggml_cann_scale
    – Currently only copies one float (scale). Must memcpy a second float for bias and call the appropriate CANN op (e.g. Add after Muls) or use a single fused kernel that supports bias.

OpenCL, Metal, and Vulkan already handle [scale, bias] correctly.

🤖 Prompt for AI Agents
In ggml/src/ggml-cpu/ggml-cpu.c within the ggml_compute_forward_scale function,
update the code to read both scale and bias from params[0] and params[1], then
multiply the input by scale and add bias. In ggml/src/ggml-cuda/ggml-cuda.cu
inside ggml_cuda_op_scale, extract scale and bias from dst->op_params and modify
the CUDA kernel launch to apply both scale and bias. In
ggml/src/ggml-sycl/ggml-sycl.cpp in ggml_sycl_op_scale, similarly read both
scale and bias and pass them to the SYCL kernel to apply scaling and bias
addition. Finally, in ggml/src/ggml-cann/aclnn_ops.cpp in ggml_cann_scale,
update the memcpy to copy both scale and bias floats and adjust the CANN
operations to apply scale and then add bias, either by calling separate ops or
using a fused kernel that supports bias.

@ngxson ngxson closed this Jul 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant