Skip to content

Conversation

@Dyanngg
Copy link
Contributor

@Dyanngg Dyanngg commented Sep 5, 2025

Fixes #7309

Disclamier
This PR and its description are mostly generated through Cursor with the claude-4-sonnet model. It took 6 tokens for me to prompt it the problem, provide context, challenge its initial solution (which did not work), improve on implementation details and add UT and e2e test. I have proof-read and modified the generated code as well before pushing for review. The fix is also tested in a local kind cluster to verify that concurrent Tier creation at the same priority is rejected.


Tier Priority Race Condition Fix

Problem Description

The original Tier validating admission webhook had a race condition when checking for overlapping priorities. The issue occurred because:

  1. Validating webhook is invoked for mytier1 with priority X → success
  2. Webhook returns success and releases any locks
  3. Validating webhook is invoked for mytier2 with same priority X → success (informer not yet updated)
  4. Both webhooks return success, K8s API server proceeds with creation
  5. mytier1 is created in etcd
  6. mytier2 is created with the same priority X
  7. Watchers are notified and the antrea-controller's informer is updated with both tiers

The key insight is that the webhook validation completes before the actual Kubernetes resource creation, creating a window where multiple tiers with the same priority can be validated simultaneously.

Solution: Priority Reservation Until Creation Completion

The solution implements a priority reservation system that reserves priorities during validation and only releases them when the Tier is actually created and detected by the informer.

Key Components

1. tierPriorityTracker

type tierPriorityTracker struct {
    mu sync.Mutex
    pendingPriorities map[int32]chan struct{}
    validationTimeout time.Duration
}
  • Purpose: Tracks priorities currently being validated/created
  • Thread-safe: Uses mutex to protect concurrent access
  • Timeout: Prevents indefinite blocking (30 second default)

2. Priority Reservation Mechanism

func (t *tierPriorityTracker) reservePriority(priority int32) (func(), bool)
  • Reserve: Claims a priority for exclusive validation
  • Wait: If priority is already reserved, waits for completion
  • Release: Returns a function to release the reservation
  • Timeout: Fails if waiting too long for another operation

3. Enhanced Validation Flow

func (t *tierValidator) createValidateWithTracker(curObj interface{}, userInfo authenticationv1.UserInfo, priorityTracker *tierPriorityTracker) ([]string, string, bool)
  • Serialized: Only one validation per priority at a time
  • Race-free: Informer check happens while priority is reserved
  • Clean: Automatically releases priority when done

How It Prevents the Race Condition

Before (Race Condition Possible)

Time    mytier1 (priority=100)           mytier2 (priority=100)
----    ----------------------           ----------------------
T1      Validation starts
T2      Check informer → no conflict
T3      Validation succeeds & returns    Validation starts
T4      K8s creates mytier1              Check informer → no conflict (!)
T5      Informer updated with mytier1    Validation succeeds & returns
T6                                       K8s creates mytier2 → DUPLICATE!

After (Race Condition Prevented)

Time    mytier1 (priority=100)           mytier2 (priority=100)
----    ----------------------           ----------------------
T1      Wait for priority 100 available
T2      Check informer → no conflict
T3      Reserve priority 100 ✓           Wait for priority 100 available
T4      Validation succeeds & returns    → BLOCKED (priority reserved)
T5      K8s creates mytier1              
T6      Informer detects mytier1         
T7      Release priority 100 ✓           Priority 100 now available
T8                                       Check informer → CONFLICT DETECTED!
T9                                       Validation FAILS → No duplicate

Implementation Details

1. Architecture

The priorityTracker is now a field of the resourceValidator base struct, but only initialized for tierValidator:

type resourceValidator struct {
    networkPolicyController *NetworkPolicyController
    // priorityTracker is only used by tierValidator to track priority reservations.
    // Other validators leave this as nil.
    priorityTracker *tierPriorityTracker
}

// Only tierValidator gets a priorityTracker
tv := tierValidator(resourceValidator{
    networkPolicyController: networkPolicyController,
    priorityTracker:         newTierPriorityTracker(),
})

2. Single Validation Method

The tierValidator now has a single createValidate method that uses its own priorityTracker:

func (t *tierValidator) createValidate(curObj interface{}, userInfo authenticationv1.UserInfo) ([]string, string, bool) {
    // Uses t.priorityTracker directly
    if !t.priorityTracker.waitForPriorityAvailable(priority) {
        return nil, fmt.Sprintf("timeout waiting for priority %d to become available", priority), false
    }
    // ... rest of validation logic
}

3. Informer Integration

Priority reservations are released when the informer detects the actual Tier creation:

// In apiserver.go
c.networkPolicyController.SetupTierEventHandlersForValidator(v)

// Event handler finds the tierValidator and releases reservation when Tier is actually created
for _, tierVal := range npValidator.tierValidators {
    if tv, ok := tierVal.(*tierValidator); ok {
        n.tierInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
            AddFunc: func(obj interface{}) {
                if tier, ok := obj.(*secv1beta1.Tier); ok {
                    tv.OnTierCreate(tier) // Releases priority reservation directly
                }
            },
        })
    }
}

Benefits

  1. Race Condition Eliminated: Serialized validation prevents concurrent priority conflicts
  2. Minimal Performance Impact: Only affects concurrent requests for the same priority
  3. Timeout Protection: Prevents indefinite blocking with configurable timeout
  4. Backward Compatible: Existing code continues to work
  5. Clean Architecture: Separation of concerns with dedicated tracker component
  6. Observability: Detailed logging for debugging and monitoring

Configuration

Timeout Adjustment

The validation timeout can be adjusted if needed:

tracker := newTierPriorityTracker()
tracker.validationTimeout = 60 * time.Second // Increase to 60 seconds

Testing Scenarios

1. Concurrent Same Priority (Should Fail)

# Terminal 1
kubectl apply -f tier1-priority100.yaml &

# Terminal 2 (immediately)
kubectl apply -f tier2-priority100.yaml &

Expected: One succeeds, one fails with priority overlap error

2. Concurrent Different Priorities (Should Succeed)

# Terminal 1
kubectl apply -f tier1-priority100.yaml &

# Terminal 2 (immediately)
kubectl apply -f tier2-priority200.yaml &

Expected: Both succeed

3. Timeout Scenario

If a validation takes longer than 30 seconds (very unlikely), the waiting request will timeout and fail gracefully.

Migration Notes

  • No Breaking Changes: Existing code continues to work
  • Automatic: The fix is automatically active once deployed
  • No Configuration Required: Works out of the box with sensible defaults
  • Performance: Negligible impact on normal operations

This implementation provides a robust solution to the Tier priority race condition while maintaining backward compatibility and system performance.

@Dyanngg Dyanngg requested review from antoninbas and tnqn September 5, 2025 20:40
@Dyanngg Dyanngg force-pushed the tier-priority-race-fix branch from 33bcf24 to 25eb037 Compare September 5, 2025 22:44
@Dyanngg Dyanngg changed the title [Draft] Fix concurrent Tier creation at the same priority Fix concurrent Tier creation at the same priority Sep 8, 2025
@Dyanngg Dyanngg added this to the Antrea v2.5 release milestone Sep 8, 2025
@Dyanngg Dyanngg added the area/network-policy/lifecycle Issues or PRs related to the network policy lifecycle. label Sep 8, 2025
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

started review, will continue later

Signed-off-by: Dyanngg <[email protected]>
@Dyanngg Dyanngg force-pushed the tier-priority-race-fix branch from 2934159 to 123fb5a Compare September 16, 2025 21:13
@Dyanngg Dyanngg requested a review from antoninbas September 17, 2025 18:32
@Dyanngg Dyanngg requested a review from antoninbas November 6, 2025 18:15
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly small comments

if existing, exists := t.pendingPriorities[priority]; exists {
// Check if the existing reservation has timed out
if time.Since(existing.createdAt) > t.creationTimeout {
klog.V(2).InfoS("Tier priority reservation has timed out, allowing new reservation", "priority", priority, "tier", existing.tierName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for logging this as V(2) when the other logs (which seem to be of equal importance) are logged with V(4)?

delete(t.pendingPriorities, priority)
klog.V(4).InfoS("Released priority reservation for Tier", "priority", priority, "tier", tierName)
} else {
klog.InfoS("Attempted to release priority for a different Tier", "priority", priority, "tier", tierName, "existingTier", existing.tierName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this log, but can you think of an example situation where this could happen?

func (v *NetworkPolicyValidator) Run(stopCh <-chan struct{}) {
// Start cleanup routine for expired reservations from the tierValidator
for _, tv := range v.tierValidators {
go tv.startCleanupRoutine(stopCh)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you call Run in a goroutine in s.AddPostStartHook, so you expect it to be blocking, so I wouldn't call tv.startCleanupRoutine in yet another goroutine

BTW, would it make sense to just replace v.tierValidators with a single v.tierValidator?

Comment on lines +608 to +611
if t.priorityTracker == nil {
klog.ErrorS(nil, "Priority tracker is nil, cannot start cleanup routine")
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove that, it shouldn't be possible by design

case <-ticker.C:
t.priorityTracker.cleanupExpiredReservations()
case <-stopCh:
klog.InfoS("Stopping Tier priority tracker cleanup routine")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have a Stopping log, I think you should have a corresponding Starting log at the beginning of the function

// OnTierCreate should be called when a new Tier is detected by the informer.
// This releases the priority reservation for the created Tier.
func (t *tierValidator) OnTierCreate(tier *crdv1beta1.Tier) {
if tier != nil && t.priorityTracker != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these nilness checks seem overly defensive here, I think there is no confusion that these values cannot possibly be nil

Comment on lines +1150 to +1151
// The priorityTracker should always be available for tierValidator
if t.priorityTracker == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


t.Run("CleanupExpiredReservations", func(t *testing.T) {
tracker := newTierPriorityTracker()
tracker.creationTimeout = 50 * time.Millisecond // Short timeout for testing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend injecting a fake clock here, like we do in similar unit tests

tier2Name := "concurrent-tier-2"

// Use channels to coordinate simultaneous creation attempts
startCh := make(chan struct{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this channel provides much value compared to just staring the 2 goroutines and letting them call CreateTier right away.

if err1 == nil && err2 == nil {
k8sUtils.DeleteTier(tier1Name)
k8sUtils.DeleteTier(tier2Name)
failOnError(fmt.Errorf("Tiers were both created with the same priority when applied simultaneously"), t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a convoluted use of failOnError, compared to just having t.Fail :)

@antoninbas antoninbas added the action/release-note Indicates a PR that should be included in release notes. label Nov 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

action/release-note Indicates a PR that should be included in release notes. area/network-policy/lifecycle Issues or PRs related to the network policy lifecycle.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multiple Tiers with the same priority can be created

2 participants