Skip to content

⚡️ Speed up function gradient_descent by 20,699% #36

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

codeflash-ai[bot]
Copy link

@codeflash-ai codeflash-ai bot commented Jun 24, 2025

📄 20,699% (206.99x) speedup for gradient_descent in src/numpy_pandas/statistical_functions.py

⏱️ Runtime : 11.7 seconds 56.1 milliseconds (best of 102 runs)

📝 Explanation and details

Here's a much faster version of your program. The main optimization is replacing all explicit Python loops over NumPy arrays with fast vectorized NumPy operations. This will drastically reduce runtime and memory overhead. All comments are preserved since the logic of the overall function is unchanged—only the internal implementation is made more efficient.

Key changes for performance.

  • Predictions: Instead of looping to compute the dot product row-by-row, we use X.dot(weights), which is fully vectorized and uses optimized BLAS routines.
  • Gradient: The double for-loop to accumulate gradient is replaced with the matrix multiplication X.T.dot(errors), then divided by the number of samples.
  • Weights Update: Vectorized subtraction.

This approach will be orders of magnitude faster for realistic data sizes, as almost all time is now spent in highly optimized NumPy C code rather than slow Python loops. The output will be numerically equivalent to your original function.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 37 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import numpy as np
# imports
import pytest  # used for our unit tests
from src.numpy_pandas.statistical_functions import gradient_descent

# unit tests

# ----------- Basic Test Cases -----------

def test_basic_single_feature_perfect_fit():
    # Single feature, perfect fit: y = 2x, should converge to weight 2
    X = np.array([[1], [2], [3], [4]])
    y = np.array([2, 4, 6, 8])
    codeflash_output = gradient_descent(X, y, learning_rate=0.1, iterations=500); w = codeflash_output # 2.82ms -> 2.27ms (24.6% faster)

def test_basic_two_features_perfect_fit():
    # Two features, perfect fit: y = 3*x1 + 5*x2
    X = np.array([[1, 2], [2, 1], [3, 0], [0, 3]])
    y = np.array([13, 11, 9, 15])
    codeflash_output = gradient_descent(X, y, learning_rate=0.05, iterations=1000); w = codeflash_output # 9.58ms -> 4.05ms (137% faster)

def test_basic_with_bias_feature():
    # Linear regression with bias (intercept) as a feature
    X = np.array([[1, 1], [1, 2], [1, 3], [1, 4]])  # first column = bias
    y = np.array([3, 5, 7, 9])  # y = 1*1 + 2*x
    codeflash_output = gradient_descent(X, y, learning_rate=0.1, iterations=1000); w = codeflash_output # 9.57ms -> 3.96ms (142% faster)

def test_basic_non_integer_weights():
    # y = 0.5*x1 - 1.5*x2
    X = np.array([[2, 1], [4, 2], [6, 3], [8, 4]])
    y = np.array([0.5, 1.0, 1.5, 2.0])
    codeflash_output = gradient_descent(X, y, learning_rate=0.01, iterations=2000); w = codeflash_output # 18.5ms -> 7.55ms (145% faster)
    # Since the system is underdetermined, check if prediction error is small
    pred = X @ w
    mse = np.mean((pred - y) ** 2)

# ----------- Edge Test Cases -----------

def test_edge_zero_iterations():
    # Should return zero weights if no iterations are run
    X = np.array([[1, 2], [3, 4]])
    y = np.array([1, 1])
    codeflash_output = gradient_descent(X, y, learning_rate=0.1, iterations=0); w = codeflash_output # 1.58μs -> 1.17μs (35.6% faster)

def test_edge_zero_learning_rate():
    # Should return zero weights if learning rate is zero (no update)
    X = np.array([[1, 2], [3, 4]])
    y = np.array([1, 1])
    codeflash_output = gradient_descent(X, y, learning_rate=0.0, iterations=100); w = codeflash_output # 610μs -> 405μs (50.5% faster)

def test_edge_single_sample():
    # Single sample, multiple features
    X = np.array([[2, 3, 4]])
    y = np.array([20])
    codeflash_output = gradient_descent(X, y, learning_rate=0.01, iterations=500); w = codeflash_output # 2.85ms -> 2.00ms (42.6% faster)
    # Should fit exactly unless learning rate is too high
    pred = X @ w

def test_edge_single_feature_single_sample():
    # Single feature, single sample
    X = np.array([[5]])
    y = np.array([15])
    codeflash_output = gradient_descent(X, y, learning_rate=0.01, iterations=500); w = codeflash_output # 1.42ms -> 2.23ms (36.4% slower)
    pred = X @ w

def test_edge_all_zero_features():
    # All features are zero; gradient should be zero, weights unchanged
    X = np.zeros((10, 3))
    y = np.arange(10)
    codeflash_output = gradient_descent(X, y, learning_rate=0.1, iterations=100); w = codeflash_output # 2.42ms -> 342μs (605% faster)

def test_edge_all_zero_targets():
    # All targets are zero; weights should converge to zero
    X = np.random.rand(10, 3)
    y = np.zeros(10)
    codeflash_output = gradient_descent(X, y, learning_rate=0.1, iterations=500); w = codeflash_output # 11.9ms -> 1.57ms (658% faster)

def test_edge_large_learning_rate_divergence():
    # Large learning rate may cause divergence; check for non-nan/inf
    X = np.array([[1, 2], [2, 3], [3, 4]])
    y = np.array([5, 8, 11])
    codeflash_output = gradient_descent(X, y, learning_rate=10, iterations=100); w = codeflash_output # 792μs -> 418μs (89.3% faster)

def test_edge_negative_learning_rate():
    # Negative learning rate should cause weights to diverge in the wrong direction
    X = np.array([[1], [2]])
    y = np.array([2, 4])
    codeflash_output = gradient_descent(X, y, learning_rate=-0.1, iterations=50); w = codeflash_output # 191μs -> 230μs (17.0% slower)



def test_large_scale_many_samples():
    # 1000 samples, 3 features
    np.random.seed(0)
    X = np.random.rand(1000, 3)
    true_w = np.array([1.5, -2.0, 0.5])
    y = X @ true_w + 0.1 * np.random.randn(1000)
    codeflash_output = gradient_descent(X, y, learning_rate=0.05, iterations=500); w = codeflash_output # 1.08s -> 2.58ms (41854% faster)

def test_large_scale_many_features():
    # 10 samples, 50 features
    np.random.seed(1)
    X = np.random.rand(10, 50)
    true_w = np.linspace(-1, 1, 50)
    y = X @ true_w
    codeflash_output = gradient_descent(X, y, learning_rate=0.01, iterations=800); w = codeflash_output # 290ms -> 2.68ms (10739% faster)

def test_large_scale_noisy_targets():
    # 500 samples, 5 features, noisy targets
    np.random.seed(2)
    X = np.random.rand(500, 5)
    true_w = np.array([2, -1, 0.5, 0, 1])
    y = X @ true_w + np.random.normal(0, 0.2, 500)
    codeflash_output = gradient_descent(X, y, learning_rate=0.05, iterations=700); w = codeflash_output # 1.24s -> 3.27ms (37716% faster)

def test_large_scale_performance():
    # Check that function completes in reasonable time for large inputs
    import time
    np.random.seed(3)
    X = np.random.rand(900, 8)
    true_w = np.array([1, -1, 0.5, 2, 0, -0.5, 1.5, -2])
    y = X @ true_w + np.random.normal(0, 0.1, 900)
    start = time.time()
    codeflash_output = gradient_descent(X, y, learning_rate=0.03, iterations=300); w = codeflash_output # 1.51s -> 1.88ms (80579% faster)
    elapsed = time.time() - start
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import numpy as np
# imports
import pytest  # used for our unit tests
from src.numpy_pandas.statistical_functions import gradient_descent

# unit tests

# -------------------------------
# Basic Test Cases
# -------------------------------

def test_single_feature_perfect_fit():
    # Single feature, data fits y = 2x exactly
    X = np.array([[1], [2], [3]])
    y = np.array([2, 4, 6])
    codeflash_output = gradient_descent(X, y, learning_rate=0.1, iterations=500); weights = codeflash_output # 2.38ms -> 2.26ms (5.17% faster)

def test_two_features_perfect_fit():
    # Two features, y = 1*x1 + 3*x2
    X = np.array([[1, 0], [0, 1], [1, 1]])
    y = np.array([1, 3, 4])
    codeflash_output = gradient_descent(X, y, learning_rate=0.1, iterations=500); weights = codeflash_output # 3.90ms -> 2.03ms (92.2% faster)

def test_bias_term():
    # Add a bias term (column of ones), y = 5 + 2*x
    X = np.array([[1, 1], [1, 2], [1, 3]])
    y = np.array([7, 9, 11])
    codeflash_output = gradient_descent(X, y, learning_rate=0.1, iterations=500); weights = codeflash_output # 3.90ms -> 2.03ms (92.2% faster)

def test_convergence_with_small_learning_rate():
    # Should still converge with small learning rate, but slower
    X = np.array([[1], [2], [3]])
    y = np.array([2, 4, 6])
    codeflash_output = gradient_descent(X, y, learning_rate=0.001, iterations=1000); weights = codeflash_output # 4.69ms -> 4.50ms (4.22% faster)

def test_negative_learning_rate():
    # Negative learning rate should diverge
    X = np.array([[1], [2]])
    y = np.array([2, 4])
    codeflash_output = gradient_descent(X, y, learning_rate=-0.1, iterations=10); weights = codeflash_output # 41.9μs -> 50.2μs (16.6% slower)

# -------------------------------
# Edge Test Cases
# -------------------------------

def test_zero_iterations():
    # No training, weights should remain zero
    X = np.array([[1, 2], [3, 4]])
    y = np.array([5, 6])
    codeflash_output = gradient_descent(X, y, learning_rate=0.1, iterations=0); weights = codeflash_output # 1.46μs -> 1.12μs (29.7% faster)

def test_zero_learning_rate():
    # No update, weights should remain zero
    X = np.array([[1, 2], [3, 4]])
    y = np.array([5, 6])
    codeflash_output = gradient_descent(X, y, learning_rate=0, iterations=10); weights = codeflash_output # 65.2μs -> 47.0μs (38.7% faster)

def test_empty_X_and_y():
    # No data, should return zero weights
    X = np.empty((0, 2))
    y = np.empty((0,))
    codeflash_output = gradient_descent(X, y, learning_rate=0.1, iterations=10); weights = codeflash_output # 59.4μs -> 51.4μs (15.6% faster)

def test_X_with_zero_features():
    # X has shape (m, 0), should return empty weights
    X = np.empty((5, 0))
    y = np.array([1, 2, 3, 4, 5])
    codeflash_output = gradient_descent(X, y, learning_rate=0.1, iterations=10); weights = codeflash_output # 20.0μs -> 41.2μs (51.5% slower)


def test_nan_in_X_or_y():
    # If X or y contains nan, output should also contain nan
    X = np.array([[np.nan, 2], [3, 4]])
    y = np.array([1, 2])
    codeflash_output = gradient_descent(X, y, learning_rate=0.1, iterations=10); weights = codeflash_output # 61.2μs -> 40.5μs (51.1% faster)
    X = np.array([[1, 2], [3, 4]])
    y = np.array([1, np.nan])
    codeflash_output = gradient_descent(X, y, learning_rate=0.1, iterations=10); weights = codeflash_output # 60.4μs -> 40.9μs (47.8% faster)

def test_infinite_in_X_or_y():
    # If X or y contains inf, output should also contain inf or nan
    X = np.array([[np.inf, 2], [3, 4]])
    y = np.array([1, 2])
    codeflash_output = gradient_descent(X, y, learning_rate=0.1, iterations=10); weights = codeflash_output # 66.1μs -> 42.9μs (54.2% faster)
    X = np.array([[1, 2], [3, 4]])
    y = np.array([1, np.inf])
    codeflash_output = gradient_descent(X, y, learning_rate=0.1, iterations=10); weights = codeflash_output # 63.8μs -> 43.7μs (46.0% faster)

def test_single_data_point():
    # Single data point, should fit perfectly if not underdetermined
    X = np.array([[3]])
    y = np.array([9])
    codeflash_output = gradient_descent(X, y, learning_rate=0.1, iterations=100); weights = codeflash_output # 283μs -> 453μs (37.4% slower)

def test_all_zero_features():
    # All features are zero, weights should remain zero
    X = np.zeros((10, 5))
    y = np.arange(10)
    codeflash_output = gradient_descent(X, y, learning_rate=0.1, iterations=10); weights = codeflash_output # 389μs -> 39.5μs (884% faster)

def test_all_zero_targets():
    # All targets are zero, weights should converge to zero
    X = np.random.rand(10, 3)
    y = np.zeros(10)
    codeflash_output = gradient_descent(X, y, learning_rate=0.1, iterations=100); weights = codeflash_output # 2.38ms -> 318μs (647% faster)

# -------------------------------
# Large Scale Test Cases
# -------------------------------

def test_large_number_of_samples():
    # Large m, small n
    np.random.seed(42)
    X = np.random.rand(1000, 2)
    true_w = np.array([2.0, -1.0])
    y = X @ true_w + 0.5
    codeflash_output = gradient_descent(X, y, learning_rate=0.05, iterations=200); weights = codeflash_output # 291ms -> 990μs (29332% faster)

def test_large_number_of_features():
    # Small m, large n
    np.random.seed(0)
    X = np.random.rand(10, 100)
    true_w = np.arange(1, 101) / 10.0
    y = X @ true_w
    codeflash_output = gradient_descent(X, y, learning_rate=0.01, iterations=300); weights = codeflash_output # 216ms -> 1.06ms (20342% faster)
    diff = np.abs(weights - true_w)

def test_large_m_and_n():
    # Large m and n, check performance and rough accuracy
    np.random.seed(1)
    X = np.random.rand(500, 50)
    true_w = np.linspace(1, 2, 50)
    y = X @ true_w
    codeflash_output = gradient_descent(X, y, learning_rate=0.01, iterations=300); weights = codeflash_output # 5.20s -> 4.01ms (129503% faster)
    diff = np.abs(weights - true_w)

def test_large_scale_with_noise():
    # Large data with added noise, weights should be close to true weights
    np.random.seed(123)
    X = np.random.rand(1000, 5)
    true_w = np.array([1.5, -2.0, 0.7, 3.3, -1.1])
    y = X @ true_w + np.random.normal(0, 0.1, 1000)
    codeflash_output = gradient_descent(X, y, learning_rate=0.05, iterations=300); weights = codeflash_output # 1.06s -> 1.89ms (56147% faster)
    diff = np.abs(weights - true_w)

def test_large_scale_all_zeros():
    # Large X and y all zeros, weights should remain zero
    X = np.zeros((1000, 10))
    y = np.zeros(1000)
    codeflash_output = gradient_descent(X, y, learning_rate=0.1, iterations=100); weights = codeflash_output # 700ms -> 778μs (89851% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from src.numpy_pandas.statistical_functions import gradient_descent

To edit these changes git checkout codeflash/optimize-gradient_descent-mc9t89qc and push.

Codeflash

Here's a much faster version of your program. The main optimization is replacing all explicit Python loops over NumPy arrays with fast vectorized NumPy operations. This will drastically reduce runtime and memory overhead. All comments are preserved since the logic of the overall function is unchanged—only the internal implementation is made more efficient.



### Key changes for performance.

- **Predictions**: Instead of looping to compute the dot product row-by-row, we use `X.dot(weights)`, which is fully vectorized and uses optimized BLAS routines.
- **Gradient**: The double for-loop to accumulate gradient is replaced with the matrix multiplication `X.T.dot(errors)`, then divided by the number of samples.
- **Weights Update**: Vectorized subtraction.

This approach will be **orders of magnitude faster** for realistic data sizes, as almost all time is now spent in highly optimized NumPy C code rather than slow Python loops. The output will be numerically equivalent to your original function.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Jun 24, 2025
@codeflash-ai codeflash-ai bot requested a review from KRRT7 June 24, 2025 00:51
@codeflash-ai codeflash-ai bot deleted the codeflash/optimize-gradient_descent-mc9t89qc branch June 27, 2025 01:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚡️ codeflash Optimization PR opened by Codeflash AI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant