Upgrades to frequency-reducer: improved logic, multithreading, added cluster profiles, tests #4657

jmguzik · 2025-08-21T14:33:08Z

No description provided.

…cluster profiles Signed-off-by: Jakub Guzik <[email protected]>

openshift-ci · 2025-08-21T14:33:52Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jmguzik

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [jmguzik]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2025-08-21T16:30:54Z

@jmguzik: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/unit	`8cf9a4b`	link	true	`/test unit`
ci/prow/images	`8cf9a4b`	link	true	`/test images`
ci/prow/security	`8cf9a4b`	link	false	`/test security`
ci/prow/breaking-changes	`8cf9a4b`	link	false	`/test breaking-changes`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-bot · 2025-11-20T01:01:10Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

coderabbitai · 2025-11-20T01:01:31Z

Walkthrough

The PR introduces cluster-profile filtering, parallel processing infrastructure, and yearly cron generation logic to the frequency-reducer component. New configuration parsing, worker pool dispatch, and per-test filtering mechanisms are added alongside comprehensive unit tests.

Changes

Cohort / File(s)	Summary
Cluster-profile filtering and parallel processing infrastructure `cmd/branchingconfigmanagers/frequency-reducer/main.go`	Introduces `ClusterProfilesConfig` struct for YAML-based cluster profile configuration. Adds `loadClusterProfilesConfig` for parsing profiles into an allowed-profiles map. Implements `processConfigurationsInParallel` for concurrent configuration processing with worker pool pattern. Adds `configJob` struct to encapsulate job parameters and `processConfiguration` worker function. Extends `updateIntervalFieldsForMatchedSteps` signature to accept `allowedClusterProfiles` parameter. Implements `shouldProcessTest` for per-test cluster profile filtering logic.
Cron expression generation and scheduling utilities `cmd/branchingconfigmanagers/frequency-reducer/main.go`	Adds `isExecutedAtMostOncePerYear` helper to validate yearly execution patterns in cron expressions. Introduces `generateYearlyCron` function for creating yearly cron expressions. Integrates yearly cron logic into configuration processing paths with version-based conditional logic.
Comprehensive unit test coverage `cmd/branchingconfigmanagers/frequency-reducer/main_test.go`	New test file providing extensive coverage for version extraction, scheduling logic verification, cron generation and validation, interval/cron field updates, options validation, command-line argument parsing, test processing determination, cluster profile configuration loading, and cluster-profile-filtered updates across multiple scenarios and edge cases.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~70 minutes

Concurrency patterns: processConfigurationsInParallel introduces worker pool logic; verify thread-safety of error aggregation and job dispatch mechanisms
Cron expression logic: generateYearlyCron and isExecutedAtMostOncePerYear manipulate scheduling expressions; validate correctness of generated cron patterns and edge cases
Cluster profile filtering: Cross-cutting concern affecting multiple functions; ensure shouldProcessTest filtering is consistently applied and doesn't introduce logical gaps
Function signature changes: updateIntervalFieldsForMatchedSteps now accepts allowedClusterProfiles; verify all call sites properly pass the parameter and filtering intent is preserved
Test coverage adequacy: Confirm new parallel processing paths and error scenarios in the test file provide sufficient coverage for production reliability

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.5.0)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

cmd/branchingconfigmanagers/frequency-reducer/main.go (1)

373-452: High-frequency crons incorrectly treated as invalid; iteration caps prevent throttling

The review comment is accurate. Callers on lines 290, 299, 308 handle errors by logging a warning and continuing without updating the cron schedule. Valid high-frequency expressions like 0 * * * * (hourly) or */5 * * * * (every 5 minutes) will exceed the iteration caps:

Hourly over 1 year: 8,760 executions >> 400 cap

Every 5 minutes over 1 month: ~8,928 executions >> 100 cap

This causes isExecutedAtMostOncePerYear and isExecutedAtMostXTimesAMonth to return errors, triggering the skip-throttling path. Tests currently cover only low-frequency expressions (@daily, @Weekly, @monthly, @Yearly), missing coverage for the problematic cases.

Recommend increasing caps or returning (false, nil) on cap exceeded so throttling still applies.

🧹 Nitpick comments (1)

cmd/branchingconfigmanagers/frequency-reducer/main.go (1)
232-267: modified_tests log field doesn’t reflect actual modifications

modifiedTests is computed as the count of tests with non‑nil Cron or Interval after processing:
modifiedTests := 0
for _, test := range output.Configuration.Tests {
    if test.Cron != nil || test.Interval != nil {
        modifiedTests++
    }
}
This counts all scheduled tests, not the subset whose schedule was changed by this run, so the "modified_tests" field can be misleading for observability.

If you care about accurate reporting, consider tracking per‑test before/after values (or a changed flag returned from updateIntervalFieldsForMatchedSteps) and incrementing only when a schedule actually changes.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between bd9bd83 and 8cf9a4b.

📒 Files selected for processing (2)

cmd/branchingconfigmanagers/frequency-reducer/main.go (8 hunks)
cmd/branchingconfigmanagers/frequency-reducer/main_test.go (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

cmd/branchingconfigmanagers/frequency-reducer/main_test.go
cmd/branchingconfigmanagers/frequency-reducer/main.go

🔇 Additional comments (2)

cmd/branchingconfigmanagers/frequency-reducer/main.go (2)
270-355: Cluster-profile filtering is only applied when a filter is provided

In updateIntervalFieldsForMatchedSteps, tests are skipped when allowedClusterProfiles is non‑nil and shouldProcessTest returns false:
if allowedClusterProfiles != nil && !shouldProcessTest(test, allowedClusterProfiles) {
    continue
}
When allowedClusterProfiles is nil, all eligible tests (org + name conditions) are processed. That matches shouldProcessTest’s contract and the tests but is worth confirming as intended, since passing an empty config file is treated as an error earlier and terminates the run.

If you ever need “explicitly allow none”, you’d currently have to pass an empty (non‑nil) map rather than omitting the flag.

469-479: Yearly cron generation logic is consistent with validation helpers

generateYearlyCron picks random month ∈ [1,12] and day ∈ [1,28]:
month := rand.Intn(12) + 1
day := rand.Intn(28) + 1
// ...
return fmt.Sprintf("%d %d %d %d *", minute, hour, day, month)
This guarantees exactly one firing per year and avoids month-length edge cases, which aligns with isExecutedAtMostOncePerYear. No issues here; the tests should be stable as long as math/rand’s default global RNG remains in use.

coderabbitai · 2025-11-20T01:07:41Z

cmd/branchingconfigmanagers/frequency-reducer/main_test.go

+			name: "invalid YAML",
+			fileContent: `cluster_profiles:
+  - aws
+  - gcp
+invalid: yaml: content`,
+			expectError: true,
+			expected:    nil,


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

“invalid YAML” test case is using syntactically valid YAML

The fileContent for the "invalid YAML" case:

cluster_profiles: - aws - gcp invalid: yaml: content

is valid YAML for gopkg.in/yaml.v2 (it’s just an extra key whose value contains a colon). loadClusterProfilesConfig will likely parse this successfully and return a non-empty ClusterProfiles, so expectError: true will cause the test to fail.

If you intend to exercise the “unmarshal fails” path, use a truly malformed snippet (e.g. unclosed sequence or clearly broken syntax), or adjust expectError to false and rename the case to something like “extra keys are ignored”.

I'll verify this review comment by examining the test file and the function implementation.

--- 🏁 Script executed: ```shell #!/bin/bash # Find and display the test file content around lines 721-727 find . -name "main_test.go" -path "*/frequency-reducer/*" -type f -exec cat {} \; | head -n 750 | tail -n 80

Length of output: 1778

🏁 Script executed:

#!/bin/bash # Search for the loadClusterProfilesConfig function definition rg -n "func.*loadClusterProfilesConfig" --type go

Length of output: 131

🏁 Script executed:

#!/bin/bash # Search for the full implementation of loadClusterProfilesConfig ast-grep --pattern $'func loadClusterProfilesConfig($_) $$$'

Length of output: 2525

🏁 Script executed:

#!/bin/bash # Look for the complete test function definition rg -n "func.*TestLoadClusterProfilesConfig" --type go -A 100

Length of output: 8783

Fix the "invalid YAML" test case—it uses syntactically valid YAML and will fail

The YAML snippet at lines 722–725 is syntactically valid for gopkg.in/yaml.v2. The line invalid: yaml: content is a valid key-value pair (the value is a string containing a colon); extra struct fields are silently ignored during unmarshaling. Since cluster_profiles is populated with two items, the function will succeed and return a non-empty map, but the test expects expectError: true, causing it to fail with "Expected error but got none" (line 751).

Either use truly malformed YAML (e.g., unclosed sequence or broken indentation), or rename the case to "extra keys are ignored" and set expectError: false.

🤖 Prompt for AI Agents

In cmd/branchingconfigmanagers/frequency-reducer/main_test.go around lines 721–727 the "invalid YAML" test is wrong: the YAML shown is syntactically valid so unmarshalling succeeds and the test incorrectly expects an error. Fix by either (A) making the YAML truly malformed (for example replace the snippet with a clearly broken YAML like an unclosed sequence or bad indentation) so unmarshalling fails and expectError stays true, or (B) rename the case to reflect that extra keys are ignored and set expectError to false (and adjust expected accordingly).

coderabbitai · 2025-11-20T01:07:41Z

cmd/branchingconfigmanagers/frequency-reducer/main.go

+	jobsChan := make(chan configJob, len(jobs))
+	errorsChan := make(chan error, o.maxThreads)
+
+	var errors []error
+	var errorMutex sync.Mutex
+	var errorWg sync.WaitGroup
+	errorWg.Add(1)
+
+	go func() {
+		defer errorWg.Done()
+		for err := range errorsChan {
+			errorMutex.Lock()
+			errors = append(errors, err)
+			errorMutex.Unlock()
+		}
+	}()
+
+	var wg sync.WaitGroup
+	var processedCount int64
+	var processedMutex sync.Mutex
+
+	for i := 0; i < o.maxThreads; i++ {
+		wg.Add(1)
+		go func(workerID int) {
+			defer wg.Done()
+			workerProcessedCount := 0
+			logrus.Infof("Worker %d started", workerID)
+
+			for job := range jobsChan {
+				if err := processConfiguration(job, ocpVersion, workerID, allowedClusterProfiles); err != nil {
+					select {
+					case errorsChan <- err:
+					default:
+						logrus.WithError(err).Errorf("Worker %d failed to process configuration, error channel full", workerID)
+					}
+				}
+
+				workerProcessedCount++
+				processedMutex.Lock()
+				processedCount++
+				currentProcessed := processedCount
+				processedMutex.Unlock()
+
+				if currentProcessed%100 == 0 || currentProcessed == int64(len(jobs)) {
+					logrus.Infof("Progress: %d/%d configurations processed (%.1f%%)",
+						currentProcessed, len(jobs), float64(currentProcessed)/float64(len(jobs))*100)
+				}
+			}
+
+			logrus.Infof("Worker %d finished processing %d configurations", workerID, workerProcessedCount)
+		}(i)
+	}
+
+	logrus.Infof("Processing %d configurations with %d threads", len(jobs), o.maxThreads)
+	for _, job := range jobs {
+		jobsChan <- job
+	}
+	close(jobsChan)
+
+	wg.Wait()
+
+	close(errorsChan)
+	errorWg.Wait()
+
+	errorMutex.Lock()
+	finalErrors := make([]error, len(errors))
+	copy(finalErrors, errors)
+	errorMutex.Unlock()
+
+	successCount := len(jobs) - len(finalErrors)
+	logrus.Infof("Processing completed: %d successful, %d errors out of %d total configurations",
+		successCount, len(finalErrors), len(jobs))
+
+	if len(finalErrors) > 0 {
+		logrus.Errorf("Failed to process %d configurations", len(finalErrors))
+		return utilerrors.NewAggregate(finalErrors)
+	}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Bounded error channel can silently drop configuration errors

errorsChan is sized to o.maxThreads, and workers send errors with a non-blocking select:

errorsChan := make(chan error, o.maxThreads) // ... select { case errorsChan <- err: default: logrus.WithError(err).Errorf("Worker %d failed to process configuration, error channel full", workerID) }

If many configurations fail quickly (e.g., systemic issue), the channel can fill and subsequent errors are dropped. The final aggregate then under-reports which configs failed, even though the process exits non‑zero.

Consider either:

Buffering for all potential errors: make(chan error, len(jobs)) and using a blocking send, or

Dropping aggregation and returning after the first failure, if that’s acceptable.

This will make the aggregated error set and the “X errors out of Y” summary reliable.

--- 🏁 Script executed: ```shell #!/bin/bash # Find and inspect the actual file and surrounding context fd -type f -name "main.go" -path "*frequency-reducer*" | head -1 | xargs -I {} cat {} | head -300

Length of output: 170

🏁 Script executed:

#!/bin/bash # Search for o.maxThreads initialization to understand typical values rg -n "maxThreads\s*=" cmd/branchingconfigmanagers/frequency-reducer/ -A 2 -B 2

Length of output: 0

🏁 Script executed:

#!/bin/bash # Verify error channel usage patterns rg -n "errorsChan" cmd/branchingconfigmanagers/frequency-reducer/ -C 3

Length of output: 2293

🏁 Script executed:

#!/bin/bash # Check how len(jobs) compares to maxThreads in typical usage rg -n "len$jobs$" cmd/branchingconfigmanagers/frequency-reducer/ -C 2

Length of output: 2140

🏁 Script executed:

#!/bin/bash # Look for the processConfigurationsInParallel function signature to understand parameters rg -n "processConfigurationsInParallel|func.*Parallel" cmd/branchingconfigmanagers/frequency-reducer/ -A 5

Length of output: 1195

--- 🏁 Script executed: ```shell #!/bin/bash # Find options struct and maxThreads field rg -n "type options struct|maxThreads" cmd/branchingconfigmanagers/frequency-reducer/main.go -B 1 -A 1

Length of output: 953

🏁 Script executed:

#!/bin/bash # Find where maxThreads is set (flags, defaults, etc) rg -n "maxThreads" cmd/branchingconfigmanagers/frequency-reducer/main.go

Length of output: 453

🏁 Script executed:

#!/bin/bash # Check if there's a command-line flag or default for maxThreads rg -n "flagset|flag\.|MaxThreads|threads" cmd/branchingconfigmanagers/frequency-reducer/main.go -C 2

Length of output: 776

Bounded error channel silently drops errors when many configurations fail

errorsChan is sized to o.maxThreads (defaulting to runtime.NumCPU(), typically 4–128), but can receive errors from many more jobs. When the buffer fills, the non-blocking select defaults to logging only:

errorsChan := make(chan error, o.maxThreads) // line 151 // ... select { case errorsChan <- err: default: logrus.WithError(err).Errorf("Worker %d failed to process configuration, error channel full", workerID) }

If dozens of configurations fail rapidly, the channel fills and subsequent errors bypass the aggregation slice entirely. The final summary (lines 219–221) then under-reports actual failures, even though the process correctly exits non-zero.

Fix: Either buffer for all potential errors (make(chan error, len(jobs)) with blocking send), or return on the first failure if that's acceptable.

🤖 Prompt for AI Agents

In cmd/branchingconfigmanagers/frequency-reducer/main.go around lines 150 to 226, the bounded errorsChan (buffer o.maxThreads) silently drops errors when many jobs fail; change errorsChan to be sized to len(jobs) (make(chan error, len(jobs))) so it can hold all potential errors and remove the non-blocking select in workers so they send errors with a normal blocking send (case -> plain send) to ensure every error is captured, or alternatively implement an immediate fail-fast by returning on the first worker error if that behavior is acceptable.

Upgrades to frequency-reducer: improved logic, multithreading, added …

8cf9a4b

…cluster profiles Signed-off-by: Jakub Guzik <[email protected]>

openshift-ci bot requested review from Prucek and hector-vido August 21, 2025 14:33

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 21, 2025

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 20, 2025

coderabbitai bot reviewed Nov 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Upgrades to frequency-reducer: improved logic, multithreading, added cluster profiles, tests #4657

Upgrades to frequency-reducer: improved logic, multithreading, added cluster profiles, tests #4657

Uh oh!

jmguzik commented Aug 21, 2025

Uh oh!

openshift-ci bot commented Aug 21, 2025

Uh oh!

openshift-ci bot commented Aug 21, 2025

Uh oh!

openshift-bot commented Nov 20, 2025

Uh oh!

coderabbitai bot commented Nov 20, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 20, 2025

Uh oh!

coderabbitai bot Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Upgrades to frequency-reducer: improved logic, multithreading, added cluster profiles, tests #4657

Are you sure you want to change the base?

Upgrades to frequency-reducer: improved logic, multithreading, added cluster profiles, tests #4657

Uh oh!

Conversation

jmguzik commented Aug 21, 2025

Uh oh!

openshift-ci bot commented Aug 21, 2025

Uh oh!

openshift-ci bot commented Aug 21, 2025

Uh oh!

openshift-bot commented Nov 20, 2025

Uh oh!

coderabbitai bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Nov 20, 2025 •

edited

Loading