Skip to content

Conversation

@modulitos
Copy link
Contributor

@modulitos modulitos commented Oct 21, 2024

Issue #, if available:
#174

Description of changes:
Enhances the implementation introduced in #236 so that we can fetch missing service accounts from the apiserver. Retains the existing guarantees that we won't fetch multiple service accounts concurrently, minimizing load on the apiserver.

This feature is still in shadow mode (off by default).

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@modulitos modulitos requested a review from a team as a code owner October 21, 2024 02:57
@modulitos modulitos force-pushed the notifications-enhancement branch 2 times, most recently from 5af4ac8 to 5020bec Compare October 21, 2024 04:00
@modulitos modulitos force-pushed the notifications-enhancement branch 2 times, most recently from ed8a585 to 0178602 Compare November 14, 2024 22:26
}

go func() {
for req := range saFetchRequests {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would mean we are making only one request at a time to apiserver right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapped this in a goroutine - thanks for the catch 💯

defer cancel()

klog.V(5).Infof("fetching SA: %s", req.CacheKey())
saList, err := getter.ServiceAccounts(req.Namespace).List(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why List? why can't we use Get?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this to Get

@modulitos modulitos force-pushed the notifications-enhancement branch from a37069c to ccbb720 Compare November 18, 2024 19:45
@modulitos modulitos force-pushed the notifications-enhancement branch from ccbb720 to 1b19f6f Compare November 26, 2024 01:31
@modulitos modulitos force-pushed the notifications-enhancement branch from 2c19173 to 9a8d1ad Compare November 26, 2024 22:12
// Use the STS WebIdentity method if set
request := cache.Request{Namespace: pod.Namespace, Name: pod.Spec.ServiceAccountName, RequestNotification: true}
gracePeriodEnabled := m.saLookupGraceTime > 0
request := cache.Request{Namespace: pod.Namespace, Name: pod.Spec.ServiceAccountName, RequestNotification: gracePeriodEnabled}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this change to toggle RequestNotification only when the grace period is enabled.

Previously it was basically a no-op when the feature is disabled and RequestNotification is true, but now we're fetching from the API server when it's true. So we only want it set when the feature is enabled.

if !found {
notifier = make(chan struct{})
n.handlers[req.CacheKey()] = notifier
n.fetchRequests <- &req
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We control the APIServer request rate through the size of the channel but it has two downsides:

  1. The APIServer request is actually not rate limited. There is no request handling rate limiter in the channel consumption and it is possible that there could be > 100 requests (for different namespace/name) sent in the same second given the channel consumer initiates a new go routine to submit a request to APIServer
  2. If due to some reasons the channel consumer dies or the queue is full, the create function will hang until the channel has some capacity. It could unexpectedly delay pod creation for arbitrary time.

A better choice could be use a larger channel size to minimize the chance of channel write blocking, and implement a more robust channel consumer which limit the consumption rate. In case of extremely high volumes of requests queued in the channel and the API requests could not be sent in time, the result would be either be the cache is synced before grace period and pod is mutated, or cache is not synced and the pod is not mutated. But no prolonged delay to pod creation or excessive requests to the APIServer

@haoranleo
Copy link
Contributor

Moved the changes to #252

@haoranleo haoranleo closed this Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants