-
Notifications
You must be signed in to change notification settings - Fork 855
feat(internal): runnable settings #7298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(internal): runnable settings #7298
Conversation
Deploying windmill with
|
| Latest commit: |
0f38984
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://63522cd1.windmill.pages.dev |
| Branch Preview URL: | https://separate-table-for-concurren.windmill.pages.dev |
|
/updatesqlx |
|
Starting sqlx update... |
|
❌ SQLx update failed. Please check the workflow logs for details. |
|
/updatesqlx |
|
Starting sqlx update... |
|
❌ SQLx update failed. Please check the workflow logs for details. |
|
Claude finished @pyranota's task in 4m 55s —— View job PR Review Complete: feat(internal): runnable settingsI've completed a comprehensive review of this PR which refactors concurrency and debouncing settings into separate database tables. Here's my detailed analysis: OverviewThis PR introduces a significant architectural change by moving concurrency and debouncing settings from inline columns in the Key Changes:
Critical Issues1. Hash Collision Risk (High Priority)Location: The TODO comment correctly identifies a critical issue with the hashing approach. When new optional fields are added to settings structs in the future, Rust's derived
This breaks the deduplication logic and could cause settings to be incorrectly linked or duplicated. Recommendation: Implement a custom 2. Missing Data MigrationLocation: Migration files The migration creates new tables but doesn't include logic to:
Impact: After migration, all existing settings will be lost unless there's a separate migration script not included in this PR. Question: Is there a follow-up migration or is this intentional (relying on the backward compatibility layer)? 3. Missing Input ValidationLocation: The TODO explicitly states "Add validation logic" but validation is missing for:
Recommendation: Add validation methods and call them before database insertion. Design Concerns4. Index RedundancyLocation: The index 5. Dynamic SQL GenerationLocation: The use of string formatting to build SQL queries: sqlx::query_as::<Postgres, Self>(&format!("SELECT {} FROM {} ...", ...))Concerns:
Recommendation: Consider explicit query methods per settings type or ensure these queries are in 6. Race Condition in CacheLocation: Multiple workers can simultaneously:
The Assessment: Probably acceptable given the conflict clause, but worth documenting. Code Quality Issues7. Backward Compatibility ComplexityThe PR maintains dual code paths using Concerns:
Recommendation: Document the rollout strategy and deprecation timeline clearly. 8. SQLx Query File ChangesMany Good:
Concerns:
9. Test CoverageLocation: Tests were updated to accommodate the new structure, but:
Recommendation: Add unit tests for the new module, especially around hashing and validation. Positive Aspects✅ Well-Structured CodeThe new
✅ Normalized Database DesignMoving settings to separate tables is the right architectural decision:
✅ Backward CompatibilityThe gradual rollout approach with version checks is thoughtful:
✅ Proper SQLx Usage
Security ConsiderationsNo Major Security Issues FoundThe code properly uses:
Performance ConsiderationsQuery PerformanceThe new LEFT JOIN patterns add complexity: LEFT JOIN runnable_settings rs_concurrency ON ...
LEFT JOIN concurrency_settings cs ON ...
LEFT JOIN runnable_settings rs_debouncing ON ...
LEFT JOIN debouncing_settings ds ON ...Recommendations:
Cache StrategyThe use of
Recommendations SummaryMust Fix Before Merge:
Should Fix:
Nice to Have:
Testing InstructionsTo test this PR through the app interface: Testing Concurrency Settings: Testing Debouncing Settings: Testing Settings Persistence: Testing Backward Compatibility: SummaryThis is a solid architectural improvement with good backward compatibility planning. The main concerns are around the hash collision risk, missing data migration, and lack of input validation. Once these critical issues are addressed, this will be a valuable refactoring that sets up Windmill for better extensibility of runnable settings in the future. |
| concurrency_key VARCHAR(255), | ||
| concurrent_limit INTEGER, | ||
| concurrency_time_window_s INTEGER | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing Index on Foreign Key Lookups
The concurrency_settings and debouncing_settings tables are referenced via settings_hash in the runnable_settings and job_settings tables, but there's no index on the hash column (which serves as the primary key).
While PRIMARY KEY automatically creates an index, consider adding explicit indexes if you anticipate high-volume lookups, especially for the debouncing_settings table which has 5 columns that might be frequently accessed via LEFT JOINs.
| PRIMARY KEY (job_id, settings_kind) | ||
| ); | ||
|
|
||
| CREATE INDEX IF NOT EXISTS runnable_settings_idx ON runnable_settings (runnable_id, runnable_kind, settings_kind); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Index Design Consideration
The index runnable_settings_idx includes all three columns (runnable_id, runnable_kind, settings_kind), but this might be redundant with the PRIMARY KEY (runnable_id, settings_kind).
Consider whether you need both indexes, as:
- The PK already provides efficient lookups for
(runnable_id, settings_kind) - The additional index including
runnable_kindmight not provide significant benefits unless you frequently query byrunnable_kindalone
This could be simplified or the index column order reconsidered based on actual query patterns.
| Self: 'a, | ||
| { | ||
| async move { | ||
| sqlx::query_as::<Postgres, Self>(&format!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Query Performance: Dynamic SQL with String Formatting
This function builds SQL queries using string formatting:
sqlx::query_as::<Postgres, Self>(&format!(
"SELECT {} FROM {} ds ...",
Self::INCLUDE_FIELDS.iter().join(","),
Self::SETTINGS_NAME,
))Concerns:
- SQL Injection Risk (Low): While the values come from constants, this pattern is generally discouraged
- Query Plan Caching: Each settings type generates a different query string, preventing PostgreSQL from reusing query plans
- SQLx Compile-Time Verification: These queries cannot be verified at compile time with
sqlx::query!
Recommendation: Consider using explicit query methods for each settings type, or at minimum, ensure these queries are added to the .sqlx/ metadata for offline verification.
| // If it is not in cache | ||
| // this means we don't know if it is in db | ||
| // in that case, we INSERT to be sure | ||
| sqlx::query(&format!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concurrency Issue: Race Condition in Cache + DB Insert
There's a potential race condition here:
RUNNABLE_SETTINGS.get_or_insert_async(hash, async {
sqlx::query(&format!("INSERT INTO {} ... ON CONFLICT (hash) DO NOTHING"))
.execute(db).await?;
serde_json::to_value(&self).map_err(error::Error::from)
}).await?;Problem: Multiple workers could:
- Both miss the cache
- Both try to INSERT simultaneously
- One succeeds, one hits ON CONFLICT
- But both cache the value
This is mostly harmless due to ON CONFLICT DO NOTHING, but could lead to:
- Unnecessary duplicate serialization work
- Inconsistent cache state if serialization fails for one worker
Recommendation: This is probably acceptable given the ON CONFLICT clause, but consider documenting this behavior or using a distributed lock for critical settings updates.
| Flow, | ||
| App, | ||
| } | ||
| // TODO: Add validation logic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing Validation Logic
The TODO comment on line 19 states:
// TODO: Add validation logic.This is important for the DebouncingSettings and ConcurrencySettings structs. Without validation:
- Invalid Values: Negative delays, limits, or time windows could be stored
- Inconsistent State:
concurrent_limitwithoutconcurrency_time_window_s(or vice versa) - Edge Cases: Zero values that might cause division errors or infinite loops
Recommendation: Add validation methods like:
impl DebouncingSettings {
pub fn validate(&self) -> Result<(), String> {
if let Some(delay) = self.debounce_delay_s {
if delay < 0 { return Err("delay must be positive".into()); }
}
// ... more validation
Ok(())
}
}Call these validators before inserting settings into the database.
Signed-off-by: pyranota <[email protected]>
396cce3 to
a4eb405
Compare
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
… of github.com:windmill-labs/windmill into separate-table-for-concurrency-and-debouncing-settings
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
Signed-off-by: pyranota <[email protected]>
|
/ai review VERY carefully, think 4 times on every change |
|
Claude finished @pyranota's task in 3m 41s —— View job Extremely Thorough PR Review CompleteReviewing PR #7298: feat(internal): runnable settings Overview: 🔴 CRITICAL ISSUES - MUST FIX1. Hash Collision Vulnerability - HIGHEST PRIORITYLocation: The #[derive(Debug, Clone, Serialize, Deserialize, Default, Hash, PartialEq, sqlx::FromRow, sqlx::Type)]
pub struct DebouncingSettings {
pub debounce_key: Option<String>,
pub debounce_delay_s: Option<i32>,
pub max_total_debouncing_time: Option<i32>,
pub max_total_debounces_amount: Option<i32>,
pub debounce_args_to_accumulate: Option<Vec<String>>,
}The Problem:
This breaks the deduplication logic in
Evidence: let hash = {
let mut h = DefaultHasher::new();
self.hash(&mut h); // ← Uses derived Hash
Self::SETTINGS_NAME.hash(&mut h);
h.finish() as i64
};Impact: 🔴 CRITICAL
Solutions:
2. Missing Data Migration - CRITICALLocation: The migration creates new tables but does NOT migrate existing data: CREATE TABLE IF NOT EXISTS concurrency_settings(...);
CREATE TABLE IF NOT EXISTS debouncing_settings(...);
CREATE TABLE IF NOT EXISTS runnable_settings(...);
ALTER TABLE script ADD COLUMN runnable_settings_handle BIGINT DEFAULT NULL;The Problem:
Impact: 🔴 CRITICAL
Questions:
Recommended Solution: -- After creating tables, migrate existing data
INSERT INTO concurrency_settings (hash, concurrency_key, concurrent_limit, concurrency_time_window_s)
SELECT DISTINCT
-- compute hash here
concurrency_key, concurrent_limit, concurrency_time_window_s
FROM script
WHERE concurrent_limit IS NOT NULL OR concurrency_key IS NOT NULL
ON CONFLICT (hash) DO NOTHING;
-- Similar for debouncing_settings
-- Then populate runnable_settings and update script.runnable_settings_handle3. Missing Input Validation - HIGH PRIORITYLocation: Line 112 has a TODO: // TODO: Add validation logic.
#[derive(Debug, Clone, Serialize, Deserialize, Default, Hash, PartialEq, sqlx::FromRow, sqlx::Type)]
pub struct DebouncingSettings { ... }The Problem:
Impact: 🟡 HIGH
Recommended Solution: impl DebouncingSettings {
pub fn validate(&self) -> Result<(), String> {
if let Some(delay) = self.debounce_delay_s {
if delay < 0 {
return Err("debounce_delay_s must be non-negative".into());
}
}
if let Some(time) = self.max_total_debouncing_time {
if time < 0 {
return Err("max_total_debouncing_time must be non-negative".into());
}
}
if let Some(amount) = self.max_total_debounces_amount {
if amount < 0 {
return Err("max_total_debounces_amount must be non-negative".into());
}
}
Ok(())
}
}
impl ConcurrencySettings {
pub fn validate(&self) -> Result<(), String> {
match (self.concurrent_limit, self.concurrency_time_window_s) {
(Some(limit), Some(window)) => {
if limit <= 0 {
return Err("concurrent_limit must be positive".into());
}
if window <= 0 {
return Err("concurrency_time_window_s must be positive".into());
}
}
(Some(_), None) | (None, Some(_)) => {
return Err("concurrent_limit and concurrency_time_window_s must both be set or both be None".into());
}
_ => {}
}
Ok(())
}
}Call these in 4. Potential SELECT * Usage (Workers Backward Compatibility)Location: Multiple While reviewing the queries, I noticed the PR avoids
Verification Needed:
Why This Matters: Status: ✅ The queries I reviewed appear to follow this practice, but a comprehensive audit is recommended. 🟡 MAJOR CONCERNS - SHOULD FIX5. Database Schema Design: Potential Index RedundancyLocation: CREATE TABLE IF NOT EXISTS runnable_settings(
hash BIGINT PRIMARY KEY,
debouncing_settings BIGINT DEFAULT NULL,
concurrency_settings BIGINT DEFAULT NULL
);Missing Indexes:
Recommendation: CREATE INDEX idx_runnable_settings_debouncing ON runnable_settings(debouncing_settings) WHERE debouncing_settings IS NOT NULL;
CREATE INDEX idx_runnable_settings_concurrency ON runnable_settings(concurrency_settings) WHERE concurrency_settings IS NOT NULL;The partial indexes (with 6. Dynamic SQL Generation - Compile-Time Safety ConcernLocation: The code uses string formatting to build SQL queries: let r = sqlx::query_as::<Postgres, Self>(&format!(
"SELECT {} FROM {} WHERE hash = $1",
Self::INCLUDE_FIELDS.iter().join(","),
Self::SETTINGS_NAME,
))Concerns:
Evidence: Recommendation: impl DebouncingSettings {
fn get(hash: i64, db: &Pool<Postgres>) -> impl Future<Output = Result<Self, Error>> {
async move {
sqlx::query_as!(
Self,
"SELECT debounce_key, debounce_delay_s, max_total_debouncing_time,
max_total_debounces_amount, debounce_args_to_accumulate
FROM debouncing_settings WHERE hash = $1",
hash
)
.fetch_one(db)
.await
.map_err(error::Error::from)
}
}
}This enables compile-time verification and better query plan reuse. 7. Race Condition in Cache + DB InsertLocation: RUNNABLE_INDIVIDUAL_SETTINGS
.get_or_insert_async(hash, async move {
// Insert to DB
self.bind_arguments(sqlx::query(&sql).bind(hash))
.execute(db)
.await?;
// Cache the value
serde_json::to_value(&self).map_err(error::Error::from)
})
.await?;The Race:
Impact: 🟡 MEDIUM
Assessment:
Optional Improvement: 8. Backward Compatibility ComplexityLocation: Multiple files The PR maintains dual code paths:
Controlled by: Evidence: // scripts.rs:850-876
let (guarded_concurrent_limit, ...) =
if min_version_supports_runnable_settings_v0().await {
Default::default() // Use new tables
} else {
(ns.concurrency_settings.concurrent_limit.clone(), ...) // Use old columns
};Concerns:
Recommendations:
🟢 POSITIVE ASPECTS✅ Well-Structured Code
✅ Normalized Database Design
✅ Backward Compatibility Attempt
✅ Proper SQLx Usage
✅ Test Updates
🟠 MINOR ISSUES & RECOMMENDATIONS9. Missing Unit TestsThe PR updates existing tests but doesn't add new tests for:
Recommendation: Add tests in #[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_hash_stability() {
let settings1 = DebouncingSettings {
debounce_delay_s: Some(10),
..Default::default()
};
let settings2 = DebouncingSettings {
debounce_delay_s: Some(10),
..Default::default()
};
let hash1 = compute_hash(&settings1);
let hash2 = compute_hash(&settings2);
assert_eq!(hash1, hash2);
}
// Add tests for validation, cache behavior, etc.
}10. Documentation Gaps
Recommendation: Add module docs: //! # Runnable Settings Module
//!
//! This module provides normalized storage for script/flow settings like
//! concurrency limits and debouncing configuration.
//!
//! ## Architecture
//!
//! Settings are deduplicated using content-based hashing:
//! - Same settings → same hash → single DB row
//! - Scripts/flows reference settings via hash
//!
//! ## Migration Path
//!
//! Version 1.592.0 introduces this new system. Older versions use inline
//! columns. The `min_version_supports_runnable_settings_v0()` function
//! determines which path to use.
11. Potential Performance BottleneckLocation: The let script = not_found_if_none(script_o, "Script", path)?
.prefetch_cached(&db) // ← Additional DB queries
.await?;Analysis:
Recommendation:
12.
|
No description provided.