-
Notifications
You must be signed in to change notification settings - Fork 122
v1.0 InferencePool API Review #1173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release-0.5
Are you sure you want to change the base?
v1.0 InferencePool API Review #1173
Conversation
…-sigs#1160) Bumps [golang.org/x/sync](https://github.com/golang/sync) from 0.15.0 to 0.16.0. - [Commits](golang/sync@v0.15.0...v0.16.0) --- updated-dependencies: - dependency-name: golang.org/x/sync dependency-version: 0.16.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This commit introduces a new pluggable framework for request queuing within the EPP Flow Control layer. This change establishes the core interfaces and initial implementations needed for sophisticated request management, prioritization, and fairness. The key components of this framework are: - **`framework.SafeQueue` Interface**: A new contract for concurrent-safe queue implementations. It defines a standard set of behaviors for adding, removing, peeking, and managing items, ensuring that all queue plugins are interchangeable. - **Queue Plugin Implementations**: - **`listqueue`**: A simple, efficient FIFO queue based on `container/list`. Ideal for basic, fair queuing workloads. - **`maxminheap`**: A priority queue based on a max-min heap, allowing for O(1) access to both the highest and lowest priority items. This is suitable for advanced policies that require configurable ordering. - **Plugin Registration**: A factory pattern (`queue.MustRegisterQueue`) allows new queue implementations to be discovered and registered at runtime, making the system extensible. - **Comprehensive Testing**: - A new conformance test suite (`TestQueueConformance`) ensures that all registered queue plugins strictly adhere to the `SafeQueue` contract, covering lifecycle, ordering, edge cases, and concurrency. - A centralized benchmark suite (`BenchmarkQueues`) provides a fair, apples-to-apples performance comparison of all queue implementations across various workload patterns. - **Core Type Refinements**: The `types` package has been updated to support this new framework, including a refined `QueueItemAccessor` interface and a new `QueueItemHandle` for opaque, safe item manipulation. This framework decouples the core flow control logic from the specific queuing disciplines, enabling future work on advanced dispatch and displacement policies.
…gs#1157) Signed-off-by: Nir Rozenbaum <[email protected]>
* Conformance: Fixes the EPP ConfigMap namespace Signed-off-by: Daneyon Hansen <[email protected]> * Renames config file in rollout.md Signed-off-by: Daneyon Hansen <[email protected]> --------- Signed-off-by: Daneyon Hansen <[email protected]>
This commit introduces the `IntraFlowDispatchPolicy` framework, the second major component of the new pluggable flow control system. This framework decouples the logic for selecting a request from within a single flow's queue (temporal scheduling) from the underlying queue data structure. Key components include: - `framework.IntraFlowDispatchPolicy`: The core interface that defines the contract for selecting an item from a flow's queue. - `framework.FlowQueueAccessor`: A read-only interface that provides policies with safe access to queue state. - `RequiredQueueCapabilities`: A mechanism for policies to declare their queue requirements (e.g., FIFO, priority-ordered), which are validated by the registry. - A factory and registration system for discovering and instantiating policy plugins by name. - A comprehensive conformance test suite to validate the contract for all policy plugins. - A foundational `FCFS` (First-Come, First-Served) policy as the first reference implementation. This work builds directly on the `SafeQueue` framework, enabling the development of sophisticated, policy-driven request prioritization and scheduling.
Hi @capri-xiyue. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/hold |
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
5317094
to
4ffb5f6
Compare
/hold, this should not get merged |
77371fe
to
4ffb5f6
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: capri-xiyue The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
1 similar comment
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: capri-xiyue The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Note: This PR is NOT intended to merge, it is entirely for the purpose of API review. |
Keywords which can automatically close issues and at(@) or hashtag(#) mentions are not allowed in commit messages. The list of commits with invalid commit messages:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
why is it pushed to release 0.5 branch? |
This is not meant to merge, just for api review. As #1116 get merged, If I point it to main, there won't be any difference. |
// EndpointPickerConfig specifies the configuration needed by the proxy to discover and connect to the endpoint | ||
// picker service that picks endpoints for the requests routed to this pool. | ||
EndpointPickerConfig `json:",inline"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed during today's community meeting, we need to decide on refining the EndpointPickerConfig type. We originally chose this API structure to support surfacing config for future extensions, and inlining to simplify the UI. From my understanding, we cannot change the inline after the API goes GA, so we should either simplify the EPP config API surface or remove inlining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are removing this struct, we may want to rename extensionRef to make it explicit that this is an epp extension; also, do we want to make extensionRef a list (and add a type enum, with the only value possible now EPP) to allow potential expansion to other pool attached extensions?
// that should be included in the InferencePool. | ||
// In some cases, implementations may translate this field to a Service selector, so this matches the simple | ||
// map used for Service selectors instead of the full Kubernetes LabelSelector type. | ||
// If sepecified, it will be applied to match the model server pods in the same namespace as the InferencePool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, typo on "sepecified"
// In some cases, implementations may translate this field to a Service selector, so this matches the simple | ||
// map used for Service selectors instead of the full Kubernetes LabelSelector type. | ||
// If sepecified, it will be applied to match the model server pods in the same namespace as the InferencePool. | ||
// Cross namesoace selector is not supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, "namesoace"
// Cross namesoace selector is not supported. | ||
// | ||
// +kubebuilder:validation:Required | ||
Selector map[LabelKey]LabelValue `json:"selector"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related issue where there seems to be some use cases for using the full selector feature because the map fall short kubernetes/kubernetes#48528, just for reference since it is not going to be possible to evolve later, but I imagine that you have already discussed this
What type of PR is this?
/kind api-change
What this PR does / why we need it:
This PR is a diff of /apis from alpha (main branch) to v1.0 (release-1.0 branch). The InferencePool SPEC doesn't have any change except the group change from
inference.networking.x-k8s.io
to `inference.networking.k8s.ioNote: This PR is purely to facilitate review, it is not intended to merge.
To do the api review, please select the specific commit as the screenshot below so that you can just review the api related change

/assign @robscott