-
Notifications
You must be signed in to change notification settings - Fork 1.2k
fix: Ensure consistent breaker state for unhealthy hosts with infligh… #15811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hi @elijah-rou. Thanks for your PR. I'm waiting for a knative member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
…t requests add x/exp capacity and inflight are uint64 change min kpa for test
096a5c8
to
188bf18
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: elijah-rou The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/ok-to-test |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
I've created a PR to bump serving to v1.24 - #15856 |
This default min of two is there to prevent a single point of failure when a single activator goes down or becomes unhealthy. We shouldn't change it. Note when multiple activators front a knative revision the target pods are sharded. |
This PR references the discussion in the CNCF knative-serving Slack channel, which can be found here: https://cloud-native.slack.com/archives/C04LMU0AX60/p1740420658746939
Issue
There has been a long-standing issue in KNative, with Issues dating back to 2022 of the Load Balancer erroneously sending requests to pods which are already busy, even though their set max concurrency should not permit the request to be sent through to that pod. I believe I have found (at least one of) the causes.
In specific scenarios, particularly long-running requests, the breaker state for a podTracker will be functionally reset if the revision backend temporarily registers the pod as "unhealthy". There is no explicit logic that dictates this behaviour. Rather, this is a result of how
throttler.go
updates the state for its podTrackers:serving/pkg/activator/net/throttler.go
Line 335 in c09ff6c
To summarise, the new state is determined by completely remaking its []podTrackers list from scratch, using information it retrieves from the respective revision backend. Only the healthy hosts from the revision backend are used to make this podTrackers list, and therefore, any unhealthy podTrackers are effectively removed. This is the case even if the "unhealthy" pod is currently busy with a request. If the pod then becomes healthy timely, the revision backend will report it, and the throttler will re-add a brand new podTracker for the pod, effectively removing any previous state held by the podTracker (ie setting InFlight to 0). The pod therefore becomes a valid target for the loadBalancing policy, even though in reality it is still busy with a request.
Proposed Changes
This PR seeks to address this issue fairly simply, by not relinquishing the state of the podTrackers until a host is both unhealthy AND finished with in-flight requests. Technically, this amounts to the following:
map[string]*podTracker
as opposed to a[]*podTracker
. This allows us to easily manage viable podTrackers by referencing them with theirdest
at any point. It has the added benefit on not re-creating the map on everyassignSlice
call.Breaker
interface to have newPending
method. This involves moving the current definition forInFlight
to be the inflight value associated with the semaphore (similarly toCapacity
), and creating a newPending
method which references a newpending
attribute (the oldinFlight
attribute) of a Breaker.PodTracker
struct with a new attributehealthy
, which a load balancing policy can use to skip unhealthy podTrackers.inFlight
requests as determined by the current state of the respective podTracker's breaker semaphore-inflight as "unhealthy". If a podTracker is both unhealthy and has no in-flights, the tracker can be removed.nil
(which can now be possible due to us removing the reference for the podTracker inupdateThrottlerState
There are 5 more things to note in this PR that are tangential/important but not directly related to the functionality:
Capacity
andInFlight
calls to returnuint64
as indicated by a TODO commentgolang.org/x/exp/maps
, in order to do map manipulation more easily. (I believe this can be removed though when the project updates its go version)nil
, the RandomChoice policy may now not succeed. I have updated it to continue to try until it randomly selects a podTracker that is notnil
. I'm not married to this implementation, we can definitely do it another way. I am aware the unit test for this is currently broken, but I will await feedback to change the actual implementation here.The implementation of this functionality is open to debate. I am happy to refine this towards an implementation that the core maintainers are more amenable to.