Skip to content

Conversation

@jmguzik
Copy link
Contributor

@jmguzik jmguzik commented Aug 21, 2025

No description provided.

@openshift-ci openshift-ci bot requested review from Prucek and hector-vido August 21, 2025 14:33
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 21, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jmguzik

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 21, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 21, 2025

@jmguzik: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/unit 8cf9a4b link true /test unit
ci/prow/images 8cf9a4b link true /test images
ci/prow/security 8cf9a4b link false /test security
ci/prow/breaking-changes 8cf9a4b link false /test breaking-changes

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 20, 2025
@coderabbitai
Copy link

coderabbitai bot commented Nov 20, 2025

Walkthrough

The PR introduces cluster-profile filtering, parallel processing infrastructure, and yearly cron generation logic to the frequency-reducer component. New configuration parsing, worker pool dispatch, and per-test filtering mechanisms are added alongside comprehensive unit tests.

Changes

Cohort / File(s) Summary
Cluster-profile filtering and parallel processing infrastructure
cmd/branchingconfigmanagers/frequency-reducer/main.go
Introduces ClusterProfilesConfig struct for YAML-based cluster profile configuration. Adds loadClusterProfilesConfig for parsing profiles into an allowed-profiles map. Implements processConfigurationsInParallel for concurrent configuration processing with worker pool pattern. Adds configJob struct to encapsulate job parameters and processConfiguration worker function. Extends updateIntervalFieldsForMatchedSteps signature to accept allowedClusterProfiles parameter. Implements shouldProcessTest for per-test cluster profile filtering logic.
Cron expression generation and scheduling utilities
cmd/branchingconfigmanagers/frequency-reducer/main.go
Adds isExecutedAtMostOncePerYear helper to validate yearly execution patterns in cron expressions. Introduces generateYearlyCron function for creating yearly cron expressions. Integrates yearly cron logic into configuration processing paths with version-based conditional logic.
Comprehensive unit test coverage
cmd/branchingconfigmanagers/frequency-reducer/main_test.go
New test file providing extensive coverage for version extraction, scheduling logic verification, cron generation and validation, interval/cron field updates, options validation, command-line argument parsing, test processing determination, cluster profile configuration loading, and cluster-profile-filtered updates across multiple scenarios and edge cases.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~70 minutes

  • Concurrency patterns: processConfigurationsInParallel introduces worker pool logic; verify thread-safety of error aggregation and job dispatch mechanisms
  • Cron expression logic: generateYearlyCron and isExecutedAtMostOncePerYear manipulate scheduling expressions; validate correctness of generated cron patterns and edge cases
  • Cluster profile filtering: Cross-cutting concern affecting multiple functions; ensure shouldProcessTest filtering is consistently applied and doesn't introduce logical gaps
  • Function signature changes: updateIntervalFieldsForMatchedSteps now accepts allowedClusterProfiles; verify all call sites properly pass the parameter and filtering intent is preserved
  • Test coverage adequacy: Confirm new parallel processing paths and error scenarios in the test file provide sufficient coverage for production reliability
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.5.0)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cmd/branchingconfigmanagers/frequency-reducer/main.go (1)

373-452: High-frequency crons incorrectly treated as invalid; iteration caps prevent throttling

The review comment is accurate. Callers on lines 290, 299, 308 handle errors by logging a warning and continuing without updating the cron schedule. Valid high-frequency expressions like 0 * * * * (hourly) or */5 * * * * (every 5 minutes) will exceed the iteration caps:

  • Hourly over 1 year: 8,760 executions >> 400 cap
  • Every 5 minutes over 1 month: ~8,928 executions >> 100 cap

This causes isExecutedAtMostOncePerYear and isExecutedAtMostXTimesAMonth to return errors, triggering the skip-throttling path. Tests currently cover only low-frequency expressions (@daily, @Weekly, @monthly, @Yearly), missing coverage for the problematic cases.

Recommend increasing caps or returning (false, nil) on cap exceeded so throttling still applies.

🧹 Nitpick comments (1)
cmd/branchingconfigmanagers/frequency-reducer/main.go (1)

232-267: modified_tests log field doesn’t reflect actual modifications

modifiedTests is computed as the count of tests with non‑nil Cron or Interval after processing:

modifiedTests := 0
for _, test := range output.Configuration.Tests {
    if test.Cron != nil || test.Interval != nil {
        modifiedTests++
    }
}

This counts all scheduled tests, not the subset whose schedule was changed by this run, so the "modified_tests" field can be misleading for observability.

If you care about accurate reporting, consider tracking per‑test before/after values (or a changed flag returned from updateIntervalFieldsForMatchedSteps) and incrementing only when a schedule actually changes.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between bd9bd83 and 8cf9a4b.

📒 Files selected for processing (2)
  • cmd/branchingconfigmanagers/frequency-reducer/main.go (8 hunks)
  • cmd/branchingconfigmanagers/frequency-reducer/main_test.go (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • cmd/branchingconfigmanagers/frequency-reducer/main_test.go
  • cmd/branchingconfigmanagers/frequency-reducer/main.go
🔇 Additional comments (2)
cmd/branchingconfigmanagers/frequency-reducer/main.go (2)

270-355: Cluster-profile filtering is only applied when a filter is provided

In updateIntervalFieldsForMatchedSteps, tests are skipped when allowedClusterProfiles is non‑nil and shouldProcessTest returns false:

if allowedClusterProfiles != nil && !shouldProcessTest(test, allowedClusterProfiles) {
    continue
}

When allowedClusterProfiles is nil, all eligible tests (org + name conditions) are processed. That matches shouldProcessTest’s contract and the tests but is worth confirming as intended, since passing an empty config file is treated as an error earlier and terminates the run.

If you ever need “explicitly allow none”, you’d currently have to pass an empty (non‑nil) map rather than omitting the flag.


469-479: Yearly cron generation logic is consistent with validation helpers

generateYearlyCron picks random month ∈ [1,12] and day ∈ [1,28]:

month := rand.Intn(12) + 1
day := rand.Intn(28) + 1
// ...
return fmt.Sprintf("%d %d %d %d *", minute, hour, day, month)

This guarantees exactly one firing per year and avoids month-length edge cases, which aligns with isExecutedAtMostOncePerYear. No issues here; the tests should be stable as long as math/rand’s default global RNG remains in use.

Comment on lines +721 to +727
name: "invalid YAML",
fileContent: `cluster_profiles:
- aws
- gcp
invalid: yaml: content`,
expectError: true,
expected: nil,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

“invalid YAML” test case is using syntactically valid YAML

The fileContent for the "invalid YAML" case:

cluster_profiles:
  - aws
  - gcp
invalid: yaml: content

is valid YAML for gopkg.in/yaml.v2 (it’s just an extra key whose value contains a colon). loadClusterProfilesConfig will likely parse this successfully and return a non-empty ClusterProfiles, so expectError: true will cause the test to fail.

If you intend to exercise the “unmarshal fails” path, use a truly malformed snippet (e.g. unclosed sequence or clearly broken syntax), or adjust expectError to false and rename the case to something like “extra keys are ignored”.


I'll verify this review comment by examining the test file and the function implementation.


---

🏁 Script executed:

```shell
#!/bin/bash
# Find and display the test file content around lines 721-727
find . -name "main_test.go" -path "*/frequency-reducer/*" -type f -exec cat {} \; | head -n 750 | tail -n 80

Length of output: 1778


🏁 Script executed:

#!/bin/bash
# Search for the loadClusterProfilesConfig function definition
rg -n "func.*loadClusterProfilesConfig" --type go

Length of output: 131


🏁 Script executed:

#!/bin/bash
# Search for the full implementation of loadClusterProfilesConfig
ast-grep --pattern $'func loadClusterProfilesConfig($_) $$$'

Length of output: 2525


🏁 Script executed:

#!/bin/bash
# Look for the complete test function definition
rg -n "func.*TestLoadClusterProfilesConfig" --type go -A 100

Length of output: 8783


Fix the "invalid YAML" test case—it uses syntactically valid YAML and will fail

The YAML snippet at lines 722–725 is syntactically valid for gopkg.in/yaml.v2. The line invalid: yaml: content is a valid key-value pair (the value is a string containing a colon); extra struct fields are silently ignored during unmarshaling. Since cluster_profiles is populated with two items, the function will succeed and return a non-empty map, but the test expects expectError: true, causing it to fail with "Expected error but got none" (line 751).

Either use truly malformed YAML (e.g., unclosed sequence or broken indentation), or rename the case to "extra keys are ignored" and set expectError: false.

🤖 Prompt for AI Agents
In cmd/branchingconfigmanagers/frequency-reducer/main_test.go around lines
721–727 the "invalid YAML" test is wrong: the YAML shown is syntactically valid
so unmarshalling succeeds and the test incorrectly expects an error. Fix by
either (A) making the YAML truly malformed (for example replace the snippet with
a clearly broken YAML like an unclosed sequence or bad indentation) so
unmarshalling fails and expectError stays true, or (B) rename the case to
reflect that extra keys are ignored and set expectError to false (and adjust
expected accordingly).

Comment on lines +150 to +226
jobsChan := make(chan configJob, len(jobs))
errorsChan := make(chan error, o.maxThreads)

var errors []error
var errorMutex sync.Mutex
var errorWg sync.WaitGroup
errorWg.Add(1)

go func() {
defer errorWg.Done()
for err := range errorsChan {
errorMutex.Lock()
errors = append(errors, err)
errorMutex.Unlock()
}
}()

var wg sync.WaitGroup
var processedCount int64
var processedMutex sync.Mutex

for i := 0; i < o.maxThreads; i++ {
wg.Add(1)
go func(workerID int) {
defer wg.Done()
workerProcessedCount := 0
logrus.Infof("Worker %d started", workerID)

for job := range jobsChan {
if err := processConfiguration(job, ocpVersion, workerID, allowedClusterProfiles); err != nil {
select {
case errorsChan <- err:
default:
logrus.WithError(err).Errorf("Worker %d failed to process configuration, error channel full", workerID)
}
}

workerProcessedCount++
processedMutex.Lock()
processedCount++
currentProcessed := processedCount
processedMutex.Unlock()

if currentProcessed%100 == 0 || currentProcessed == int64(len(jobs)) {
logrus.Infof("Progress: %d/%d configurations processed (%.1f%%)",
currentProcessed, len(jobs), float64(currentProcessed)/float64(len(jobs))*100)
}
}

logrus.Infof("Worker %d finished processing %d configurations", workerID, workerProcessedCount)
}(i)
}

logrus.Infof("Processing %d configurations with %d threads", len(jobs), o.maxThreads)
for _, job := range jobs {
jobsChan <- job
}
close(jobsChan)

wg.Wait()

close(errorsChan)
errorWg.Wait()

errorMutex.Lock()
finalErrors := make([]error, len(errors))
copy(finalErrors, errors)
errorMutex.Unlock()

successCount := len(jobs) - len(finalErrors)
logrus.Infof("Processing completed: %d successful, %d errors out of %d total configurations",
successCount, len(finalErrors), len(jobs))

if len(finalErrors) > 0 {
logrus.Errorf("Failed to process %d configurations", len(finalErrors))
return utilerrors.NewAggregate(finalErrors)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Bounded error channel can silently drop configuration errors

errorsChan is sized to o.maxThreads, and workers send errors with a non-blocking select:

errorsChan := make(chan error, o.maxThreads)
// ...
select {
case errorsChan <- err:
default:
    logrus.WithError(err).Errorf("Worker %d failed to process configuration, error channel full", workerID)
}

If many configurations fail quickly (e.g., systemic issue), the channel can fill and subsequent errors are dropped. The final aggregate then under-reports which configs failed, even though the process exits non‑zero.

Consider either:

  • Buffering for all potential errors: make(chan error, len(jobs)) and using a blocking send, or
  • Dropping aggregation and returning after the first failure, if that’s acceptable.

This will make the aggregated error set and the “X errors out of Y” summary reliable.



---

🏁 Script executed:

```shell
#!/bin/bash
# Find and inspect the actual file and surrounding context
fd -type f -name "main.go" -path "*frequency-reducer*" | head -1 | xargs -I {} cat {} | head -300

Length of output: 170


🏁 Script executed:

#!/bin/bash
# Search for o.maxThreads initialization to understand typical values
rg -n "maxThreads\s*=" cmd/branchingconfigmanagers/frequency-reducer/ -A 2 -B 2

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Verify error channel usage patterns
rg -n "errorsChan" cmd/branchingconfigmanagers/frequency-reducer/ -C 3

Length of output: 2293


🏁 Script executed:

#!/bin/bash
# Check how len(jobs) compares to maxThreads in typical usage
rg -n "len\(jobs\)" cmd/branchingconfigmanagers/frequency-reducer/ -C 2

Length of output: 2140


🏁 Script executed:

#!/bin/bash
# Look for the processConfigurationsInParallel function signature to understand parameters
rg -n "processConfigurationsInParallel|func.*Parallel" cmd/branchingconfigmanagers/frequency-reducer/ -A 5

Length of output: 1195



---

🏁 Script executed:

```shell
#!/bin/bash
# Find options struct and maxThreads field
rg -n "type options struct|maxThreads" cmd/branchingconfigmanagers/frequency-reducer/main.go -B 1 -A 1

Length of output: 953


🏁 Script executed:

#!/bin/bash
# Find where maxThreads is set (flags, defaults, etc)
rg -n "maxThreads" cmd/branchingconfigmanagers/frequency-reducer/main.go

Length of output: 453


🏁 Script executed:

#!/bin/bash
# Check if there's a command-line flag or default for maxThreads
rg -n "flagset|flag\.|MaxThreads|threads" cmd/branchingconfigmanagers/frequency-reducer/main.go -C 2

Length of output: 776


Bounded error channel silently drops errors when many configurations fail

errorsChan is sized to o.maxThreads (defaulting to runtime.NumCPU(), typically 4–128), but can receive errors from many more jobs. When the buffer fills, the non-blocking select defaults to logging only:

errorsChan := make(chan error, o.maxThreads)  // line 151
// ...
select {
case errorsChan <- err:
default:
    logrus.WithError(err).Errorf("Worker %d failed to process configuration, error channel full", workerID)
}

If dozens of configurations fail rapidly, the channel fills and subsequent errors bypass the aggregation slice entirely. The final summary (lines 219–221) then under-reports actual failures, even though the process correctly exits non-zero.

Fix: Either buffer for all potential errors (make(chan error, len(jobs)) with blocking send), or return on the first failure if that's acceptable.

🤖 Prompt for AI Agents
In cmd/branchingconfigmanagers/frequency-reducer/main.go around lines 150 to
226, the bounded errorsChan (buffer o.maxThreads) silently drops errors when
many jobs fail; change errorsChan to be sized to len(jobs) (make(chan error,
len(jobs))) so it can hold all potential errors and remove the non-blocking
select in workers so they send errors with a normal blocking send (case -> plain
send) to ensure every error is captured, or alternatively implement an immediate
fail-fast by returning on the first worker error if that behavior is acceptable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants