Fix concurrent Tier creation at the same priority #7405

Dyanngg · 2025-09-05T20:40:46Z

Disclamier
This PR and its description are mostly generated through Cursor with the claude-4-sonnet model. It took 6 tokens for me to prompt it the problem, provide context, challenge its initial solution (which did not work), improve on implementation details and add UT and e2e test. I have proof-read and modified the generated code as well before pushing for review. The fix is also tested in a local kind cluster to verify that concurrent Tier creation at the same priority is rejected.

Tier Priority Race Condition Fix

Problem Description

The original Tier validating admission webhook had a race condition when checking for overlapping priorities. The issue occurred because:

Validating webhook is invoked for mytier1 with priority X → success
Webhook returns success and releases any locks
Validating webhook is invoked for mytier2 with same priority X → success (informer not yet updated)
Both webhooks return success, K8s API server proceeds with creation
mytier1 is created in etcd
mytier2 is created with the same priority X
Watchers are notified and the antrea-controller's informer is updated with both tiers

The key insight is that the webhook validation completes before the actual Kubernetes resource creation, creating a window where multiple tiers with the same priority can be validated simultaneously.

Solution: Priority Reservation Until Creation Completion

The solution implements a priority reservation system that reserves priorities during validation and only releases them when the Tier is actually created and detected by the informer.

Key Components

1. `tierPriorityTracker`

type tierPriorityTracker struct {
    mu sync.Mutex
    pendingPriorities map[int32]chan struct{}
    validationTimeout time.Duration
}

Purpose: Tracks priorities currently being validated/created
Thread-safe: Uses mutex to protect concurrent access
Timeout: Prevents indefinite blocking (30 second default)

2. Priority Reservation Mechanism

func (t *tierPriorityTracker) reservePriority(priority int32) (func(), bool)

Reserve: Claims a priority for exclusive validation
Wait: If priority is already reserved, waits for completion
Release: Returns a function to release the reservation
Timeout: Fails if waiting too long for another operation

3. Enhanced Validation Flow

func (t *tierValidator) createValidateWithTracker(curObj interface{}, userInfo authenticationv1.UserInfo, priorityTracker *tierPriorityTracker) ([]string, string, bool)

Serialized: Only one validation per priority at a time
Race-free: Informer check happens while priority is reserved
Clean: Automatically releases priority when done

How It Prevents the Race Condition

Before (Race Condition Possible)

Time    mytier1 (priority=100)           mytier2 (priority=100)
----    ----------------------           ----------------------
T1      Validation starts
T2      Check informer → no conflict
T3      Validation succeeds & returns    Validation starts
T4      K8s creates mytier1              Check informer → no conflict (!)
T5      Informer updated with mytier1    Validation succeeds & returns
T6                                       K8s creates mytier2 → DUPLICATE!

After (Race Condition Prevented)

Time    mytier1 (priority=100)           mytier2 (priority=100)
----    ----------------------           ----------------------
T1      Wait for priority 100 available
T2      Check informer → no conflict
T3      Reserve priority 100 ✓           Wait for priority 100 available
T4      Validation succeeds & returns    → BLOCKED (priority reserved)
T5      K8s creates mytier1              
T6      Informer detects mytier1         
T7      Release priority 100 ✓           Priority 100 now available
T8                                       Check informer → CONFLICT DETECTED!
T9                                       Validation FAILS → No duplicate

Implementation Details

1. Architecture

The priorityTracker is now a field of the resourceValidator base struct, but only initialized for tierValidator:

type resourceValidator struct {
    networkPolicyController *NetworkPolicyController
    // priorityTracker is only used by tierValidator to track priority reservations.
    // Other validators leave this as nil.
    priorityTracker *tierPriorityTracker
}

// Only tierValidator gets a priorityTracker
tv := tierValidator(resourceValidator{
    networkPolicyController: networkPolicyController,
    priorityTracker:         newTierPriorityTracker(),
})

2. Single Validation Method

The tierValidator now has a single createValidate method that uses its own priorityTracker:

func (t *tierValidator) createValidate(curObj interface{}, userInfo authenticationv1.UserInfo) ([]string, string, bool) {
    // Uses t.priorityTracker directly
    if !t.priorityTracker.waitForPriorityAvailable(priority) {
        return nil, fmt.Sprintf("timeout waiting for priority %d to become available", priority), false
    }
    // ... rest of validation logic
}

3. Informer Integration

Priority reservations are released when the informer detects the actual Tier creation:

// In apiserver.go
c.networkPolicyController.SetupTierEventHandlersForValidator(v)

// Event handler finds the tierValidator and releases reservation when Tier is actually created
for _, tierVal := range npValidator.tierValidators {
    if tv, ok := tierVal.(*tierValidator); ok {
        n.tierInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
            AddFunc: func(obj interface{}) {
                if tier, ok := obj.(*secv1beta1.Tier); ok {
                    tv.OnTierCreate(tier) // Releases priority reservation directly
                }
            },
        })
    }
}

Benefits

Race Condition Eliminated: Serialized validation prevents concurrent priority conflicts
Minimal Performance Impact: Only affects concurrent requests for the same priority
Timeout Protection: Prevents indefinite blocking with configurable timeout
Backward Compatible: Existing code continues to work
Clean Architecture: Separation of concerns with dedicated tracker component
Observability: Detailed logging for debugging and monitoring

Configuration

Timeout Adjustment

The validation timeout can be adjusted if needed:

tracker := newTierPriorityTracker()
tracker.validationTimeout = 60 * time.Second // Increase to 60 seconds

Testing Scenarios

1. Concurrent Same Priority (Should Fail)

# Terminal 1
kubectl apply -f tier1-priority100.yaml &

# Terminal 2 (immediately)
kubectl apply -f tier2-priority100.yaml &

Expected: One succeeds, one fails with priority overlap error

2. Concurrent Different Priorities (Should Succeed)

# Terminal 1
kubectl apply -f tier1-priority100.yaml &

# Terminal 2 (immediately)
kubectl apply -f tier2-priority200.yaml &

Expected: Both succeed

3. Timeout Scenario

If a validation takes longer than 30 seconds (very unlikely), the waiting request will timeout and fail gracefully.

Migration Notes

No Breaking Changes: Existing code continues to work
Automatic: The fix is automatically active once deployed
No Configuration Required: Works out of the box with sensible defaults
Performance: Negligible impact on normal operations

This implementation provides a robust solution to the Tier priority race condition while maintaining backward compatibility and system performance.

Signed-off-by: Dyanngg <[email protected]>

antoninbas

started review, will continue later

pkg/apiserver/apiserver.go