Skip to content

Conversation

@pyranota
Copy link
Contributor

@pyranota pyranota commented Dec 4, 2025

No description provided.

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Dec 4, 2025

Deploying windmill with  Cloudflare Pages  Cloudflare Pages

Latest commit: 0f38984
Status: ✅  Deploy successful!
Preview URL: https://63522cd1.windmill.pages.dev
Branch Preview URL: https://separate-table-for-concurren.windmill.pages.dev

View logs

@pyranota pyranota changed the title feat: runnable settings feat(internal): runnable settings Dec 5, 2025
@pyranota
Copy link
Contributor Author

pyranota commented Dec 9, 2025

/updatesqlx

@windmill-internal-app
Copy link
Contributor

Starting sqlx update...

View workflow run

@windmill-internal-app
Copy link
Contributor

❌ SQLx update failed. Please check the workflow logs for details.

@pyranota
Copy link
Contributor Author

pyranota commented Dec 9, 2025

/updatesqlx

@windmill-internal-app
Copy link
Contributor

Starting sqlx update...

View workflow run

@windmill-internal-app
Copy link
Contributor

❌ SQLx update failed. Please check the workflow logs for details.

@pyranota pyranota marked this pull request as ready for review December 10, 2025 10:26
@claude
Copy link
Contributor

claude bot commented Dec 10, 2025

Claude finished @pyranota's task in 4m 55s —— View job


PR Review Complete: feat(internal): runnable settings

I've completed a comprehensive review of this PR which refactors concurrency and debouncing settings into separate database tables. Here's my detailed analysis:


Overview

This PR introduces a significant architectural change by moving concurrency and debouncing settings from inline columns in the script and flow tables into dedicated normalized tables. This is a well-designed refactoring that improves data organization and reduces duplication.

Key Changes:

  • New tables: concurrency_settings, debouncing_settings, runnable_settings, job_settings
  • New module: windmill-common/src/runnable_settings/
  • Updated ~50 files with 1,252 additions and 922 deletions
  • Backward compatibility layer for gradual rollout

Critical Issues

1. Hash Collision Risk (High Priority)

Location: backend/windmill-common/src/runnable_settings/mod.rs:98

The TODO comment correctly identifies a critical issue with the hashing approach. When new optional fields are added to settings structs in the future, Rust's derived Hash implementation will produce different hashes for:

  • Existing settings (no field)
  • New settings with field: None

This breaks the deduplication logic and could cause settings to be incorrectly linked or duplicated.

Recommendation: Implement a custom Hash that excludes None values or serialize to normalized JSON before hashing.

2. Missing Data Migration

Location: Migration files

The migration creates new tables but doesn't include logic to:

  1. Populate concurrency_settings and debouncing_settings from existing script.concurrent_limit, script.concurrency_time_window_s, etc.
  2. Populate runnable_settings to link existing scripts/flows to their settings

Impact: After migration, all existing settings will be lost unless there's a separate migration script not included in this PR.

Question: Is there a follow-up migration or is this intentional (relying on the backward compatibility layer)?

3. Missing Input Validation

Location: backend/windmill-common/src/runnable_settings/settings.rs:19

The TODO explicitly states "Add validation logic" but validation is missing for:

  • Negative values for delays, limits, time windows
  • Inconsistent states (e.g., concurrent_limit without concurrency_time_window_s)
  • Zero values that could cause errors

Recommendation: Add validation methods and call them before database insertion.


Design Concerns

4. Index Redundancy

Location: backend/migrations/20251204094847_external_runnables_settings.up.sql:37

The index runnable_settings_idx (runnable_id, runnable_kind, settings_kind) may be redundant with the PRIMARY KEY (runnable_id, settings_kind). The additional runnable_kind column in the index might not provide benefits unless you frequently filter by runnable_kind alone.

5. Dynamic SQL Generation

Location: backend/windmill-common/src/runnable_settings/mod.rs:68

The use of string formatting to build SQL queries:

sqlx::query_as::<Postgres, Self>(&format!("SELECT {} FROM {} ...", ...))

Concerns:

  • Cannot be verified at compile-time by SQLx
  • Each settings type generates different query strings, preventing PostgreSQL query plan reuse
  • Less type-safe than using sqlx::query! macro

Recommendation: Consider explicit query methods per settings type or ensure these queries are in .sqlx/ metadata.

6. Race Condition in Cache

Location: backend/windmill-common/src/runnable_settings/mod.rs:115

Multiple workers can simultaneously:

  1. Miss the cache
  2. Try to INSERT the same settings
  3. Both cache the value

The ON CONFLICT DO NOTHING clause prevents database errors, but this could lead to unnecessary work and potential cache inconsistencies if serialization fails for one worker.

Assessment: Probably acceptable given the conflict clause, but worth documenting.


Code Quality Issues

7. Backward Compatibility Complexity

The PR maintains dual code paths using min_version_supports_runnable_settings_v0(). This adds complexity:

Concerns:

  • Two code paths to maintain and test
  • Careful rollout coordination required (workers must update before API server)
  • No clear timeline for removing legacy path
  • Potential version mismatch issues during deployment

Recommendation: Document the rollout strategy and deprecation timeline clearly.

8. SQLx Query File Changes

Many .sqlx/query-*.json files were modified. Key observations:

Good:

  • Queries properly use LEFT JOINs to fetch settings
  • COALESCE used correctly to fall back to inline columns during transition
  • Explicit column selection (not SELECT *)

Concerns:

  • Complex JOIN patterns in queries like bd4db59447c0658b... may impact performance
  • Consider query performance testing under load

9. Test Coverage

Location: backend/tests/*.rs

Tests were updated to accommodate the new structure, but:

  • No new tests specifically for runnable_settings module
  • No tests for the hash collision scenario
  • No tests for the backward compatibility layer switching

Recommendation: Add unit tests for the new module, especially around hashing and validation.


Positive Aspects

✅ Well-Structured Code

The new runnable_settings module is well-organized with:

  • Clear trait-based abstraction (RunnableSettings trait)
  • Separation of concerns (settings.rs for types, mod.rs for logic)
  • Good use of const generics for settings metadata

✅ Normalized Database Design

Moving settings to separate tables is the right architectural decision:

  • Reduces duplication across scripts/flows
  • Enables better querying and analytics
  • Improves maintainability

✅ Backward Compatibility

The gradual rollout approach with version checks is thoughtful:

  • Allows safe deployment across distributed systems
  • Falls back to legacy columns when needed
  • Minimizes risk during migration

✅ Proper SQLx Usage

  • Used prepared statements consistently
  • Type-safe enum mappings for settings_kind and runnable_kind
  • Appropriate use of transactions

Security Considerations

No Major Security Issues Found

The code properly uses:

  • Parameterized queries (no SQL injection risk)
  • Proper permission checks maintained
  • Transaction isolation where needed

Performance Considerations

Query Performance

The new LEFT JOIN patterns add complexity:

LEFT JOIN runnable_settings rs_concurrency ON ...
LEFT JOIN concurrency_settings cs ON ...
LEFT JOIN runnable_settings rs_debouncing ON ...
LEFT JOIN debouncing_settings ds ON ...

Recommendations:

  1. Monitor query performance after deployment
  2. Consider materialized views if performance degrades
  3. Test with realistic data volumes
  4. The indexes on runnable_settings_idx and job_settings_idx should help, but verify with EXPLAIN ANALYZE

Cache Strategy

The use of make_static! macro for caching is appropriate:

  • Reduces database round-trips
  • Bounded cache sizes (1000 entries)
  • Per-worker caching reduces coordination overhead

Recommendations Summary

Must Fix Before Merge:

  1. Resolve hash collision issue - Either custom Hash implementation or documented migration strategy
  2. Add data migration - Script to populate new tables from existing data
  3. Add validation logic - Prevent invalid settings from being stored

Should Fix:

  1. Review index design for potential redundancy
  2. Add comprehensive unit tests for new module
  3. Document backward compatibility rollout plan

Nice to Have:

  1. Consider using sqlx::query! for type safety
  2. Add performance benchmarks for complex JOIN queries
  3. Document the race condition behavior in cache logic

Testing Instructions

To test this PR through the app interface:

Testing Concurrency Settings:
Navigate to any script in the workspace, open the script settings panel (gear icon), and locate the "Concurrency" section. Set a concurrent limit (e.g., "3") and a time window (e.g., "60" seconds), then save the script. Create a flow that calls this script multiple times in parallel using a "For Loop" module with parallel execution enabled. Run the flow and verify through the Runs page that no more than 3 instances of the script execute within any 60-second window, with excess jobs queued appropriately.

Testing Debouncing Settings:
Open a script's settings, navigate to the "Advanced" section, and configure debouncing with a delay (e.g., "10" seconds) and specify arguments to accumulate. Create a schedule that triggers this script rapidly (every 5 seconds) with varying inputs. Monitor the Runs page to verify that executions are debounced—the script should not run more frequently than the specified delay, and accumulated arguments should be collected and passed together when the debounced job finally executes.

Testing Settings Persistence:
After configuring concurrency or debouncing settings on scripts or flows, navigate away, then return to the settings page. Verify all settings are correctly persisted and displayed. Check that the settings apply correctly when running jobs both directly (via "Test" button) and through triggers (schedules, webhooks, or flows).

Testing Backward Compatibility:
If possible in a staging environment with mixed worker versions, verify that jobs continue to execute correctly with the appropriate concurrency and debouncing behavior regardless of whether workers are using the new runnable_settings tables or falling back to legacy inline columns.


Summary

This is a solid architectural improvement with good backward compatibility planning. The main concerns are around the hash collision risk, missing data migration, and lack of input validation. Once these critical issues are addressed, this will be a valuable refactoring that sets up Windmill for better extensibility of runnable settings in the future.

concurrency_key VARCHAR(255),
concurrent_limit INTEGER,
concurrency_time_window_s INTEGER
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing Index on Foreign Key Lookups

The concurrency_settings and debouncing_settings tables are referenced via settings_hash in the runnable_settings and job_settings tables, but there's no index on the hash column (which serves as the primary key).

While PRIMARY KEY automatically creates an index, consider adding explicit indexes if you anticipate high-volume lookups, especially for the debouncing_settings table which has 5 columns that might be frequently accessed via LEFT JOINs.

PRIMARY KEY (job_id, settings_kind)
);

CREATE INDEX IF NOT EXISTS runnable_settings_idx ON runnable_settings (runnable_id, runnable_kind, settings_kind);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index Design Consideration

The index runnable_settings_idx includes all three columns (runnable_id, runnable_kind, settings_kind), but this might be redundant with the PRIMARY KEY (runnable_id, settings_kind).

Consider whether you need both indexes, as:

  1. The PK already provides efficient lookups for (runnable_id, settings_kind)
  2. The additional index including runnable_kind might not provide significant benefits unless you frequently query by runnable_kind alone

This could be simplified or the index column order reconsidered based on actual query patterns.

Self: 'a,
{
async move {
sqlx::query_as::<Postgres, Self>(&format!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Query Performance: Dynamic SQL with String Formatting

This function builds SQL queries using string formatting:

sqlx::query_as::<Postgres, Self>(&format!(
    "SELECT {} FROM {} ds ...",
    Self::INCLUDE_FIELDS.iter().join(","),
    Self::SETTINGS_NAME,
))

Concerns:

  1. SQL Injection Risk (Low): While the values come from constants, this pattern is generally discouraged
  2. Query Plan Caching: Each settings type generates a different query string, preventing PostgreSQL from reusing query plans
  3. SQLx Compile-Time Verification: These queries cannot be verified at compile time with sqlx::query!

Recommendation: Consider using explicit query methods for each settings type, or at minimum, ensure these queries are added to the .sqlx/ metadata for offline verification.

// If it is not in cache
// this means we don't know if it is in db
// in that case, we INSERT to be sure
sqlx::query(&format!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concurrency Issue: Race Condition in Cache + DB Insert

There's a potential race condition here:

RUNNABLE_SETTINGS.get_or_insert_async(hash, async {
    sqlx::query(&format!("INSERT INTO {} ... ON CONFLICT (hash) DO NOTHING"))
        .execute(db).await?;
    serde_json::to_value(&self).map_err(error::Error::from)
}).await?;

Problem: Multiple workers could:

  1. Both miss the cache
  2. Both try to INSERT simultaneously
  3. One succeeds, one hits ON CONFLICT
  4. But both cache the value

This is mostly harmless due to ON CONFLICT DO NOTHING, but could lead to:

  • Unnecessary duplicate serialization work
  • Inconsistent cache state if serialization fails for one worker

Recommendation: This is probably acceptable given the ON CONFLICT clause, but consider documenting this behavior or using a distributed lock for critical settings updates.

Comment on lines +19 to +116
Flow,
App,
}
// TODO: Add validation logic.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing Validation Logic

The TODO comment on line 19 states:

// TODO: Add validation logic.

This is important for the DebouncingSettings and ConcurrencySettings structs. Without validation:

  1. Invalid Values: Negative delays, limits, or time windows could be stored
  2. Inconsistent State: concurrent_limit without concurrency_time_window_s (or vice versa)
  3. Edge Cases: Zero values that might cause division errors or infinite loops

Recommendation: Add validation methods like:

impl DebouncingSettings {
    pub fn validate(&self) -> Result<(), String> {
        if let Some(delay) = self.debounce_delay_s {
            if delay < 0 { return Err("delay must be positive".into()); }
        }
        // ... more validation
        Ok(())
    }
}

Call these validators before inserting settings into the database.

Signed-off-by: pyranota <[email protected]>
@pyranota pyranota force-pushed the separate-table-for-concurrency-and-debouncing-settings branch from 396cce3 to a4eb405 Compare December 11, 2025 12:33
pyranota and others added 21 commits December 11, 2025 19:44
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
… of github.com:windmill-labs/windmill into separate-table-for-concurrency-and-debouncing-settings
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
@pyranota
Copy link
Contributor Author

/ai review VERY carefully, think 4 times on every change

@claude
Copy link
Contributor

claude bot commented Dec 12, 2025

Claude finished @pyranota's task in 3m 41s —— View job


Extremely Thorough PR Review Complete

Reviewing PR #7298: feat(internal): runnable settings

Overview:
This PR refactors concurrency and debouncing settings from inline columns in script and flow tables into separate normalized database tables. This is a significant architectural change affecting 45 files with ~1,400 additions and ~750 deletions. The refactoring improves data normalization and reduces duplication.


🔴 CRITICAL ISSUES - MUST FIX

1. Hash Collision Vulnerability - HIGHEST PRIORITY

Location: backend/windmill-common/src/runnable_settings/settings.rs:114-115

The DebouncingSettings and ConcurrencySettings structs use Rust's derived Hash trait with Option<T> fields:

#[derive(Debug, Clone, Serialize, Deserialize, Default, Hash, PartialEq, sqlx::FromRow, sqlx::Type)]
pub struct DebouncingSettings {
    pub debounce_key: Option<String>,
    pub debounce_delay_s: Option<i32>,
    pub max_total_debouncing_time: Option<i32>,
    pub max_total_debounces_amount: Option<i32>,
    pub debounce_args_to_accumulate: Option<Vec<String>>,
}

The Problem:
When you add a new optional field in the future (e.g., pub new_field: Option<String>), Rust's derived Hash implementation will treat these as different:

  • Existing data (no new_field): Hash = X
  • New data with new_field: None: Hash = Y (different!)

This breaks the deduplication logic in insert_cached() because:

  1. The same settings will generate different hashes before/after the schema change
  2. Duplicate settings will be inserted instead of being deduplicated
  3. References to settings will be incorrect

Evidence:
The code at line 97-103 in mod.rs computes the hash:

let hash = {
    let mut h = DefaultHasher::new();
    self.hash(&mut h);  // ← Uses derived Hash
    Self::SETTINGS_NAME.hash(&mut h);
    h.finish() as i64
};

Impact: 🔴 CRITICAL

  • Data corruption: settings won't be correctly linked
  • Database bloat: duplicate settings rows
  • Concurrency/debouncing features may break
  • This WILL happen when you add fields in future versions

Solutions:

  1. Recommended: Implement custom Hash that serializes to normalized JSON (excluding null fields):

    impl Hash for DebouncingSettings {
        fn hash<H: Hasher>(&self, state: &mut H) {
            // Serialize to normalized JSON, sort keys, exclude nulls
            let json = serde_json::to_value(self).unwrap();
            // ... normalize and hash the JSON
        }
    }
  2. Alternative: Document this as a breaking change and require migration on schema changes

  3. Workaround: Never add optional fields (only add required fields with defaults), but this is very restrictive


2. Missing Data Migration - CRITICAL

Location: backend/migrations/20251204094847_external_runnables_settings.up.sql

The migration creates new tables but does NOT migrate existing data:

CREATE TABLE IF NOT EXISTS concurrency_settings(...);
CREATE TABLE IF NOT EXISTS debouncing_settings(...);
CREATE TABLE IF NOT EXISTS runnable_settings(...);
ALTER TABLE script ADD COLUMN runnable_settings_handle BIGINT DEFAULT NULL;

The Problem:
After this migration runs:

  • All existing scripts with concurrent_limit, concurrency_key, debounce_delay_s, etc. are NOT migrated to the new tables
  • The runnable_settings_handle column is NULL for all existing scripts
  • The backward compatibility layer will use the old inline columns, but...
  • New scripts/updates will use the new tables, creating inconsistency

Impact: 🔴 CRITICAL

  • Data loss risk: Existing settings are not preserved in the new structure
  • Inconsistent behavior between old and new scripts
  • Makes rollback difficult

Questions:

  1. Is there a separate data migration script not included in this PR?
  2. Is the plan to rely on the backward compatibility layer indefinitely?
  3. What happens during a rollout when workers are mixed versions?

Recommended Solution:
Add a data migration step:

-- After creating tables, migrate existing data
INSERT INTO concurrency_settings (hash, concurrency_key, concurrent_limit, concurrency_time_window_s)
SELECT DISTINCT 
    -- compute hash here
    concurrency_key, concurrent_limit, concurrency_time_window_s
FROM script 
WHERE concurrent_limit IS NOT NULL OR concurrency_key IS NOT NULL
ON CONFLICT (hash) DO NOTHING;

-- Similar for debouncing_settings
-- Then populate runnable_settings and update script.runnable_settings_handle

3. Missing Input Validation - HIGH PRIORITY

Location: backend/windmill-common/src/runnable_settings/settings.rs:112

Line 112 has a TODO:

// TODO: Add validation logic.
#[derive(Debug, Clone, Serialize, Deserialize, Default, Hash, PartialEq, sqlx::FromRow, sqlx::Type)]
pub struct DebouncingSettings { ... }

The Problem:
No validation prevents invalid settings from being stored:

  • Negative delays: debounce_delay_s: Some(-100)
  • Negative limits: concurrent_limit: Some(-5)
  • Zero time windows: concurrency_time_window_s: Some(0) → division by zero?
  • Inconsistent states: concurrent_limit: Some(5) but concurrency_time_window_s: None
  • Overflow: max_total_debounces_amount: Some(i32::MAX)

Impact: 🟡 HIGH

  • Runtime errors when invalid values are used
  • Undefined behavior in concurrency/debouncing logic
  • Poor user experience (errors occur late, not on input)

Recommended Solution:

impl DebouncingSettings {
    pub fn validate(&self) -> Result<(), String> {
        if let Some(delay) = self.debounce_delay_s {
            if delay < 0 {
                return Err("debounce_delay_s must be non-negative".into());
            }
        }
        if let Some(time) = self.max_total_debouncing_time {
            if time < 0 {
                return Err("max_total_debouncing_time must be non-negative".into());
            }
        }
        if let Some(amount) = self.max_total_debounces_amount {
            if amount < 0 {
                return Err("max_total_debounces_amount must be non-negative".into());
            }
        }
        Ok(())
    }
}

impl ConcurrencySettings {
    pub fn validate(&self) -> Result<(), String> {
        match (self.concurrent_limit, self.concurrency_time_window_s) {
            (Some(limit), Some(window)) => {
                if limit <= 0 {
                    return Err("concurrent_limit must be positive".into());
                }
                if window <= 0 {
                    return Err("concurrency_time_window_s must be positive".into());
                }
            }
            (Some(_), None) | (None, Some(_)) => {
                return Err("concurrent_limit and concurrency_time_window_s must both be set or both be None".into());
            }
            _ => {}
        }
        Ok(())
    }
}

Call these in insert_cached() before database insertion.


4. Potential SELECT * Usage (Workers Backward Compatibility)

Location: Multiple .sqlx query files

While reviewing the queries, I noticed the PR avoids SELECT * in most places (good!). However, per the Rust best practices documentation:

CRITICAL - Never Use SELECT * in Worker-Executed Queries: For any query that can potentially be executed by workers, always explicitly list the specific columns you need instead of using SELECT *.

Verification Needed:
Check that ALL queries executed by workers explicitly list columns. This is especially critical for:

  • Queries in windmill-worker/src/
  • Queries in windmill-queue/src/jobs.rs that workers might execute
  • Any query involving the new runnable_settings, concurrency_settings, debouncing_settings tables

Why This Matters:
When workers run an older version (common during rolling deploys), they'll fail if the database schema has new columns they don't know about.

Status: ✅ The queries I reviewed appear to follow this practice, but a comprehensive audit is recommended.


🟡 MAJOR CONCERNS - SHOULD FIX

5. Database Schema Design: Potential Index Redundancy

Location: backend/migrations/20251204094847_external_runnables_settings.up.sql:17-21

CREATE TABLE IF NOT EXISTS runnable_settings(
    hash               BIGINT PRIMARY KEY,
    debouncing_settings BIGINT DEFAULT NULL,
    concurrency_settings BIGINT DEFAULT NULL
);

Missing Indexes:

  1. No index on runnable_settings.debouncing_settings or runnable_settings.concurrency_settings
  2. These are foreign key references to debouncing_settings(hash) and concurrency_settings(hash)
  3. Queries will JOIN on these columns (see .sqlx files), but without indexes, this will be slow

Recommendation:

CREATE INDEX idx_runnable_settings_debouncing ON runnable_settings(debouncing_settings) WHERE debouncing_settings IS NOT NULL;
CREATE INDEX idx_runnable_settings_concurrency ON runnable_settings(concurrency_settings) WHERE concurrency_settings IS NOT NULL;

The partial indexes (with WHERE) will keep them small since many rows have NULL.


6. Dynamic SQL Generation - Compile-Time Safety Concern

Location: backend/windmill-common/src/runnable_settings/mod.rs:68-75, 112-122

The code uses string formatting to build SQL queries:

let r = sqlx::query_as::<Postgres, Self>(&format!(
    "SELECT {} FROM {} WHERE hash = $1",
    Self::INCLUDE_FIELDS.iter().join(","),
    Self::SETTINGS_NAME,
))

Concerns:

  1. No compile-time verification: SQLx cannot verify these queries at compile time
  2. Query plan cache pollution: Each settings type generates a different query string, preventing PostgreSQL from reusing query plans
  3. Maintainability: Easy to introduce SQL syntax errors

Evidence:
The .sqlx/ directory contains 1,215 query files, but these dynamically generated queries are NOT in there.

Recommendation:
Use explicit query methods for each settings type:

impl DebouncingSettings {
    fn get(hash: i64, db: &Pool<Postgres>) -> impl Future<Output = Result<Self, Error>> {
        async move {
            sqlx::query_as!(
                Self,
                "SELECT debounce_key, debounce_delay_s, max_total_debouncing_time, 
                        max_total_debounces_amount, debounce_args_to_accumulate 
                 FROM debouncing_settings WHERE hash = $1",
                hash
            )
            .fetch_one(db)
            .await
            .map_err(error::Error::from)
        }
    }
}

This enables compile-time verification and better query plan reuse.


7. Race Condition in Cache + DB Insert

Location: backend/windmill-common/src/runnable_settings/mod.rs:106-131

RUNNABLE_INDIVIDUAL_SETTINGS
    .get_or_insert_async(hash, async move {
        // Insert to DB
        self.bind_arguments(sqlx::query(&sql).bind(hash))
            .execute(db)
            .await?;
        
        // Cache the value
        serde_json::to_value(&self).map_err(error::Error::from)
    })
    .await?;

The Race:

  1. Worker A: Cache miss, starts INSERT
  2. Worker B: Cache miss, starts INSERT (simultaneously)
  3. Worker A: INSERT succeeds
  4. Worker B: INSERT hits ON CONFLICT (hash) DO NOTHING, succeeds silently
  5. Worker A: Serializes and caches value
  6. Worker B: Serializes and caches value
  7. Both workers did unnecessary serialization work

Impact: 🟡 MEDIUM

  • Not a correctness issue (thanks to ON CONFLICT DO NOTHING)
  • Performance waste: Duplicate serialization, duplicate DB round-trips
  • Could be problematic under high concurrency

Assessment:
This is probably acceptable given:

  • The ON CONFLICT clause prevents data corruption
  • The cache prevents subsequent duplicates
  • Settings insertions are relatively rare

Optional Improvement:
Document this behavior or use a distributed lock for critical paths.


8. Backward Compatibility Complexity

Location: Multiple files

The PR maintains dual code paths:

  • Old path: Uses inline columns (script.concurrent_limit, etc.)
  • New path: Uses normalized tables

Controlled by: min_version_supports_runnable_settings_v0() checks

Evidence:

// scripts.rs:850-876
let (guarded_concurrent_limit, ...) = 
    if min_version_supports_runnable_settings_v0().await {
        Default::default()  // Use new tables
    } else {
        (ns.concurrency_settings.concurrent_limit.clone(), ...)  // Use old columns
    };

Concerns:

  1. Two code paths to maintain: Double the testing surface
  2. Version coordination required: Workers must update before API server (or vice versa?)
  3. No deprecation timeline: When will the old path be removed?
  4. Rollout complexity: Mixed-version deployments are tricky

Recommendations:

  1. Document the rollout plan:

    • Step 1: Deploy migration (creates tables, adds columns)
    • Step 2: Deploy workers (they'll use old columns via backward compat)
    • Step 3: Deploy API server (starts using new tables)
    • Step 4: Wait for all workers to update
    • Step 5: Run data migration script
    • Step 6: Remove backward compatibility code
  2. Add metrics: Track which code path is being used

  3. Set deprecation date: E.g., "Remove old path in v1.600.0"


🟢 POSITIVE ASPECTS

✅ Well-Structured Code

  • The new runnable_settings module is cleanly organized
  • Good use of Rust traits (RunnableSettingsTrait)
  • Proper separation of concerns between mod.rs and settings.rs

✅ Normalized Database Design

  • Moving settings to separate tables is architecturally sound
  • Reduces duplication (same settings can be shared)
  • Easier to add new settings types in the future

✅ Backward Compatibility Attempt

  • The gradual rollout approach with version checks shows careful planning
  • Fallback to legacy columns ensures safe deployment
  • ON CONFLICT DO NOTHING prevents data corruption

✅ Proper SQLx Usage

  • Uses parameterized queries (no SQL injection risk)
  • Type-safe enum mappings
  • Appropriate use of transactions in critical paths

✅ Test Updates

  • All 24 test file changes properly update to use runnable_settings:: instead of jobs::
  • Tests compile (evidenced by the CI passing after fixes)

🟠 MINOR ISSUES & RECOMMENDATIONS

9. Missing Unit Tests

The PR updates existing tests but doesn't add new tests for:

  • RunnableSettingsTrait implementations
  • Hash collision scenarios
  • Backward compatibility switching logic
  • prefetch_cached() methods

Recommendation: Add tests in backend/windmill-common/src/runnable_settings/mod.rs:

#[cfg(test)]
mod tests {
    use super::*;
    
    #[test]
    fn test_hash_stability() {
        let settings1 = DebouncingSettings {
            debounce_delay_s: Some(10),
            ..Default::default()
        };
        let settings2 = DebouncingSettings {
            debounce_delay_s: Some(10),
            ..Default::default()
        };
        
        let hash1 = compute_hash(&settings1);
        let hash2 = compute_hash(&settings2);
        assert_eq!(hash1, hash2);
    }
    
    // Add tests for validation, cache behavior, etc.
}

10. Documentation Gaps

  • No README or module-level docs explaining the new architecture
  • No migration guide for developers
  • No explanation of the hash-based deduplication strategy

Recommendation: Add module docs:

//! # Runnable Settings Module
//! 
//! This module provides normalized storage for script/flow settings like
//! concurrency limits and debouncing configuration.
//! 
//! ## Architecture
//! 
//! Settings are deduplicated using content-based hashing:
//! - Same settings → same hash → single DB row
//! - Scripts/flows reference settings via hash
//! 
//! ## Migration Path
//! 
//! Version 1.592.0 introduces this new system. Older versions use inline
//! columns. The `min_version_supports_runnable_settings_v0()` function
//! determines which path to use.

11. Potential Performance Bottleneck

Location: backend/windmill-api/src/scripts.rs (multiple places)

The prefetch_cached() pattern is used in hot paths:

let script = not_found_if_none(script_o, "Script", path)?
    .prefetch_cached(&db)  // ← Additional DB queries
    .await?;

Analysis:

  • Every script fetch now does 1-3 additional queries (runnable_settings → debouncing_settings, concurrency_settings)
  • These are cached, so subsequent fetches are fast
  • But the first fetch per worker is slower

Recommendation:

  • Monitor query performance in production
  • Consider JOIN-based fetching instead of sequential queries:
    SELECT script.*, cs.*, ds.*
    FROM script
    LEFT JOIN runnable_settings rs ON script.runnable_settings_handle = rs.hash
    LEFT JOIN concurrency_settings cs ON rs.concurrency_settings = cs.hash
    LEFT JOIN debouncing_settings ds ON rs.debouncing_settings = ds.hash
    WHERE script.path = $1

12. .sqlx Query File Maintenance

Location: backend/.sqlx/ directory

The PR modifies/adds many .sqlx files. These are SQLx's compile-time verification metadata.

Observations:

  • ✅ Queries properly use LEFT JOINs to fetch settings
  • ✅ Explicit column selection (not SELECT *)
  • ✅ COALESCE used to fall back to inline columns during transition

Recommendation:
After deployment, run cargo sqlx prepare to regenerate all query metadata and ensure consistency.


📊 SUMMARY

Critical Issues (Must Fix Before Merge):

  1. 🔴 Hash collision vulnerability - Will break in future when fields are added
  2. 🔴 Missing data migration - Existing settings not migrated to new tables
  3. 🟡 Missing validation - Invalid values can be stored

Major Concerns (Should Fix):

  1. 🟡 Database indexes needed for foreign key references
  2. 🟡 Dynamic SQL reduces compile-time safety
  3. 🟡 Backward compatibility adds significant complexity

Strengths:

  • ✅ Well-architected normalization
  • ✅ Clean module structure
  • ✅ Thoughtful backward compatibility
  • ✅ Proper test updates

🎯 RECOMMENDATIONS

Before Merge:

  1. Fix the hash collision issue (critical) - Implement custom Hash or document breaking change policy
  2. Add data migration script (critical) - Migrate existing settings to new tables
  3. Add validation (high) - Prevent invalid settings from being stored
  4. Add database indexes - For foreign key columns in runnable_settings

After Merge:

  1. Add comprehensive unit tests for the new module
  2. Monitor query performance in production
  3. Document the rollout plan and deprecation timeline
  4. Run cargo sqlx prepare to update query metadata

🤔 QUESTIONS FOR THE AUTHOR

  1. Data Migration: Is there a separate script to migrate existing settings? If not, how will existing scripts' settings be preserved?

  2. Hash Collision: Have you considered the hash collision issue? What's the plan when optional fields are added in the future?

  3. Rollout Strategy: What's the deployment order? Workers first, or API server first? How long will the backward compatibility layer remain?

  4. Performance: Have you benchmarked the query performance impact of the additional JOINs?

  5. Validation: What's the plan for adding validation? Should it be done in this PR or a follow-up?


This is a substantial and generally well-executed refactoring. The main concerns are around data integrity (hash collisions, validation) and migration strategy. Once these are addressed, this will be a solid improvement to the codebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants