-
Notifications
You must be signed in to change notification settings - Fork 290
Upgrades to frequency-reducer: improved logic, multithreading, added cluster profiles, tests #4657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…cluster profiles Signed-off-by: Jakub Guzik <[email protected]>
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jmguzik The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@jmguzik: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
WalkthroughThe PR introduces cluster-profile filtering, parallel processing infrastructure, and yearly cron generation logic to the frequency-reducer component. New configuration parsing, worker pool dispatch, and per-test filtering mechanisms are added alongside comprehensive unit tests. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~70 minutes
✨ Finishing touches
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.5.0)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cmd/branchingconfigmanagers/frequency-reducer/main.go (1)
373-452: High-frequency crons incorrectly treated as invalid; iteration caps prevent throttlingThe review comment is accurate. Callers on lines 290, 299, 308 handle errors by logging a warning and continuing without updating the cron schedule. Valid high-frequency expressions like
0 * * * *(hourly) or*/5 * * * *(every 5 minutes) will exceed the iteration caps:
- Hourly over 1 year: 8,760 executions >> 400 cap
- Every 5 minutes over 1 month: ~8,928 executions >> 100 cap
This causes
isExecutedAtMostOncePerYearandisExecutedAtMostXTimesAMonthto return errors, triggering the skip-throttling path. Tests currently cover only low-frequency expressions (@daily, @Weekly, @monthly, @Yearly), missing coverage for the problematic cases.Recommend increasing caps or returning
(false, nil)on cap exceeded so throttling still applies.
🧹 Nitpick comments (1)
cmd/branchingconfigmanagers/frequency-reducer/main.go (1)
232-267:modified_testslog field doesn’t reflect actual modifications
modifiedTestsis computed as the count of tests with non‑nilCronorIntervalafter processing:modifiedTests := 0 for _, test := range output.Configuration.Tests { if test.Cron != nil || test.Interval != nil { modifiedTests++ } }This counts all scheduled tests, not the subset whose schedule was changed by this run, so the
"modified_tests"field can be misleading for observability.If you care about accurate reporting, consider tracking per‑test before/after values (or a
changedflag returned fromupdateIntervalFieldsForMatchedSteps) and incrementing only when a schedule actually changes.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (2)
cmd/branchingconfigmanagers/frequency-reducer/main.go(8 hunks)cmd/branchingconfigmanagers/frequency-reducer/main_test.go(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
cmd/branchingconfigmanagers/frequency-reducer/main_test.gocmd/branchingconfigmanagers/frequency-reducer/main.go
🔇 Additional comments (2)
cmd/branchingconfigmanagers/frequency-reducer/main.go (2)
270-355: Cluster-profile filtering is only applied when a filter is providedIn
updateIntervalFieldsForMatchedSteps, tests are skipped whenallowedClusterProfilesis non‑nil andshouldProcessTestreturns false:if allowedClusterProfiles != nil && !shouldProcessTest(test, allowedClusterProfiles) { continue }When
allowedClusterProfilesisnil, all eligible tests (org + name conditions) are processed. That matchesshouldProcessTest’s contract and the tests but is worth confirming as intended, since passing an empty config file is treated as an error earlier and terminates the run.If you ever need “explicitly allow none”, you’d currently have to pass an empty (non‑nil) map rather than omitting the flag.
469-479: Yearly cron generation logic is consistent with validation helpers
generateYearlyCronpicks randommonth ∈ [1,12]andday ∈ [1,28]:month := rand.Intn(12) + 1 day := rand.Intn(28) + 1 // ... return fmt.Sprintf("%d %d %d %d *", minute, hour, day, month)This guarantees exactly one firing per year and avoids month-length edge cases, which aligns with
isExecutedAtMostOncePerYear. No issues here; the tests should be stable as long asmath/rand’s default global RNG remains in use.
| name: "invalid YAML", | ||
| fileContent: `cluster_profiles: | ||
| - aws | ||
| - gcp | ||
| invalid: yaml: content`, | ||
| expectError: true, | ||
| expected: nil, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
“invalid YAML” test case is using syntactically valid YAML
The fileContent for the "invalid YAML" case:
cluster_profiles:
- aws
- gcp
invalid: yaml: contentis valid YAML for gopkg.in/yaml.v2 (it’s just an extra key whose value contains a colon). loadClusterProfilesConfig will likely parse this successfully and return a non-empty ClusterProfiles, so expectError: true will cause the test to fail.
If you intend to exercise the “unmarshal fails” path, use a truly malformed snippet (e.g. unclosed sequence or clearly broken syntax), or adjust expectError to false and rename the case to something like “extra keys are ignored”.
I'll verify this review comment by examining the test file and the function implementation.
---
🏁 Script executed:
```shell
#!/bin/bash
# Find and display the test file content around lines 721-727
find . -name "main_test.go" -path "*/frequency-reducer/*" -type f -exec cat {} \; | head -n 750 | tail -n 80
Length of output: 1778
🏁 Script executed:
#!/bin/bash
# Search for the loadClusterProfilesConfig function definition
rg -n "func.*loadClusterProfilesConfig" --type goLength of output: 131
🏁 Script executed:
#!/bin/bash
# Search for the full implementation of loadClusterProfilesConfig
ast-grep --pattern $'func loadClusterProfilesConfig($_) $$$'Length of output: 2525
🏁 Script executed:
#!/bin/bash
# Look for the complete test function definition
rg -n "func.*TestLoadClusterProfilesConfig" --type go -A 100Length of output: 8783
Fix the "invalid YAML" test case—it uses syntactically valid YAML and will fail
The YAML snippet at lines 722–725 is syntactically valid for gopkg.in/yaml.v2. The line invalid: yaml: content is a valid key-value pair (the value is a string containing a colon); extra struct fields are silently ignored during unmarshaling. Since cluster_profiles is populated with two items, the function will succeed and return a non-empty map, but the test expects expectError: true, causing it to fail with "Expected error but got none" (line 751).
Either use truly malformed YAML (e.g., unclosed sequence or broken indentation), or rename the case to "extra keys are ignored" and set expectError: false.
🤖 Prompt for AI Agents
In cmd/branchingconfigmanagers/frequency-reducer/main_test.go around lines
721–727 the "invalid YAML" test is wrong: the YAML shown is syntactically valid
so unmarshalling succeeds and the test incorrectly expects an error. Fix by
either (A) making the YAML truly malformed (for example replace the snippet with
a clearly broken YAML like an unclosed sequence or bad indentation) so
unmarshalling fails and expectError stays true, or (B) rename the case to
reflect that extra keys are ignored and set expectError to false (and adjust
expected accordingly).
| jobsChan := make(chan configJob, len(jobs)) | ||
| errorsChan := make(chan error, o.maxThreads) | ||
|
|
||
| var errors []error | ||
| var errorMutex sync.Mutex | ||
| var errorWg sync.WaitGroup | ||
| errorWg.Add(1) | ||
|
|
||
| go func() { | ||
| defer errorWg.Done() | ||
| for err := range errorsChan { | ||
| errorMutex.Lock() | ||
| errors = append(errors, err) | ||
| errorMutex.Unlock() | ||
| } | ||
| }() | ||
|
|
||
| var wg sync.WaitGroup | ||
| var processedCount int64 | ||
| var processedMutex sync.Mutex | ||
|
|
||
| for i := 0; i < o.maxThreads; i++ { | ||
| wg.Add(1) | ||
| go func(workerID int) { | ||
| defer wg.Done() | ||
| workerProcessedCount := 0 | ||
| logrus.Infof("Worker %d started", workerID) | ||
|
|
||
| for job := range jobsChan { | ||
| if err := processConfiguration(job, ocpVersion, workerID, allowedClusterProfiles); err != nil { | ||
| select { | ||
| case errorsChan <- err: | ||
| default: | ||
| logrus.WithError(err).Errorf("Worker %d failed to process configuration, error channel full", workerID) | ||
| } | ||
| } | ||
|
|
||
| workerProcessedCount++ | ||
| processedMutex.Lock() | ||
| processedCount++ | ||
| currentProcessed := processedCount | ||
| processedMutex.Unlock() | ||
|
|
||
| if currentProcessed%100 == 0 || currentProcessed == int64(len(jobs)) { | ||
| logrus.Infof("Progress: %d/%d configurations processed (%.1f%%)", | ||
| currentProcessed, len(jobs), float64(currentProcessed)/float64(len(jobs))*100) | ||
| } | ||
| } | ||
|
|
||
| logrus.Infof("Worker %d finished processing %d configurations", workerID, workerProcessedCount) | ||
| }(i) | ||
| } | ||
|
|
||
| logrus.Infof("Processing %d configurations with %d threads", len(jobs), o.maxThreads) | ||
| for _, job := range jobs { | ||
| jobsChan <- job | ||
| } | ||
| close(jobsChan) | ||
|
|
||
| wg.Wait() | ||
|
|
||
| close(errorsChan) | ||
| errorWg.Wait() | ||
|
|
||
| errorMutex.Lock() | ||
| finalErrors := make([]error, len(errors)) | ||
| copy(finalErrors, errors) | ||
| errorMutex.Unlock() | ||
|
|
||
| successCount := len(jobs) - len(finalErrors) | ||
| logrus.Infof("Processing completed: %d successful, %d errors out of %d total configurations", | ||
| successCount, len(finalErrors), len(jobs)) | ||
|
|
||
| if len(finalErrors) > 0 { | ||
| logrus.Errorf("Failed to process %d configurations", len(finalErrors)) | ||
| return utilerrors.NewAggregate(finalErrors) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Bounded error channel can silently drop configuration errors
errorsChan is sized to o.maxThreads, and workers send errors with a non-blocking select:
errorsChan := make(chan error, o.maxThreads)
// ...
select {
case errorsChan <- err:
default:
logrus.WithError(err).Errorf("Worker %d failed to process configuration, error channel full", workerID)
}If many configurations fail quickly (e.g., systemic issue), the channel can fill and subsequent errors are dropped. The final aggregate then under-reports which configs failed, even though the process exits non‑zero.
Consider either:
- Buffering for all potential errors:
make(chan error, len(jobs))and using a blocking send, or - Dropping aggregation and returning after the first failure, if that’s acceptable.
This will make the aggregated error set and the “X errors out of Y” summary reliable.
---
🏁 Script executed:
```shell
#!/bin/bash
# Find and inspect the actual file and surrounding context
fd -type f -name "main.go" -path "*frequency-reducer*" | head -1 | xargs -I {} cat {} | head -300
Length of output: 170
🏁 Script executed:
#!/bin/bash
# Search for o.maxThreads initialization to understand typical values
rg -n "maxThreads\s*=" cmd/branchingconfigmanagers/frequency-reducer/ -A 2 -B 2Length of output: 0
🏁 Script executed:
#!/bin/bash
# Verify error channel usage patterns
rg -n "errorsChan" cmd/branchingconfigmanagers/frequency-reducer/ -C 3Length of output: 2293
🏁 Script executed:
#!/bin/bash
# Check how len(jobs) compares to maxThreads in typical usage
rg -n "len\(jobs\)" cmd/branchingconfigmanagers/frequency-reducer/ -C 2Length of output: 2140
🏁 Script executed:
#!/bin/bash
# Look for the processConfigurationsInParallel function signature to understand parameters
rg -n "processConfigurationsInParallel|func.*Parallel" cmd/branchingconfigmanagers/frequency-reducer/ -A 5Length of output: 1195
---
🏁 Script executed:
```shell
#!/bin/bash
# Find options struct and maxThreads field
rg -n "type options struct|maxThreads" cmd/branchingconfigmanagers/frequency-reducer/main.go -B 1 -A 1
Length of output: 953
🏁 Script executed:
#!/bin/bash
# Find where maxThreads is set (flags, defaults, etc)
rg -n "maxThreads" cmd/branchingconfigmanagers/frequency-reducer/main.goLength of output: 453
🏁 Script executed:
#!/bin/bash
# Check if there's a command-line flag or default for maxThreads
rg -n "flagset|flag\.|MaxThreads|threads" cmd/branchingconfigmanagers/frequency-reducer/main.go -C 2Length of output: 776
Bounded error channel silently drops errors when many configurations fail
errorsChan is sized to o.maxThreads (defaulting to runtime.NumCPU(), typically 4–128), but can receive errors from many more jobs. When the buffer fills, the non-blocking select defaults to logging only:
errorsChan := make(chan error, o.maxThreads) // line 151
// ...
select {
case errorsChan <- err:
default:
logrus.WithError(err).Errorf("Worker %d failed to process configuration, error channel full", workerID)
}If dozens of configurations fail rapidly, the channel fills and subsequent errors bypass the aggregation slice entirely. The final summary (lines 219–221) then under-reports actual failures, even though the process correctly exits non-zero.
Fix: Either buffer for all potential errors (make(chan error, len(jobs)) with blocking send), or return on the first failure if that's acceptable.
🤖 Prompt for AI Agents
In cmd/branchingconfigmanagers/frequency-reducer/main.go around lines 150 to
226, the bounded errorsChan (buffer o.maxThreads) silently drops errors when
many jobs fail; change errorsChan to be sized to len(jobs) (make(chan error,
len(jobs))) so it can hold all potential errors and remove the non-blocking
select in workers so they send errors with a normal blocking send (case -> plain
send) to ensure every error is captured, or alternatively implement an immediate
fail-fast by returning on the first worker error if that behavior is acceptable.
No description provided.