Skip to content

v1.0 InferencePool API Review #1173

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: release-0.5
Choose a base branch
from

Conversation

capri-xiyue
Copy link
Contributor

@capri-xiyue capri-xiyue commented Jul 16, 2025

What type of PR is this?
/kind api-change

What this PR does / why we need it:
This PR is a diff of /apis from alpha (main branch) to v1.0 (release-1.0 branch). The InferencePool SPEC doesn't have any change except the group change from inference.networking.x-k8s.io to `inference.networking.k8s.io

Note: This PR is purely to facilitate review, it is not intended to merge.

To do the api review, please select the specific commit as the screenshot below so that you can just review the api related change
Screenshot 2025-07-17 at 9 44 40 AM

/assign @robscott

dependabot bot and others added 5 commits July 15, 2025 08:20
…-sigs#1160)

Bumps [golang.org/x/sync](https://github.com/golang/sync) from 0.15.0 to 0.16.0.
- [Commits](golang/sync@v0.15.0...v0.16.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sync
  dependency-version: 0.16.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This commit introduces a new pluggable framework for request queuing within the EPP Flow Control layer. This change establishes the core interfaces and initial implementations needed for sophisticated request management, prioritization, and fairness.

The key components of this framework are:

- **`framework.SafeQueue` Interface**: A new contract for concurrent-safe queue implementations. It defines a standard set of behaviors for adding, removing, peeking, and managing items, ensuring that all queue plugins are interchangeable.

- **Queue Plugin Implementations**:
  - **`listqueue`**: A simple, efficient FIFO queue based on `container/list`. Ideal for basic, fair queuing workloads.
  - **`maxminheap`**: A priority queue based on a max-min heap, allowing for O(1) access to both the highest and lowest priority items. This is suitable for advanced policies that require configurable ordering.

- **Plugin Registration**: A factory pattern (`queue.MustRegisterQueue`) allows new queue implementations to be discovered and registered at runtime, making the system extensible.

- **Comprehensive Testing**:
  - A new conformance test suite (`TestQueueConformance`) ensures that all registered queue plugins strictly adhere to the `SafeQueue` contract, covering lifecycle, ordering, edge cases, and concurrency.
  - A centralized benchmark suite (`BenchmarkQueues`) provides a fair, apples-to-apples performance comparison of all queue implementations across various workload patterns.

- **Core Type Refinements**: The `types` package has been updated to support this new framework, including a refined `QueueItemAccessor` interface and a new `QueueItemHandle` for opaque, safe item manipulation.

This framework decouples the core flow control logic from the specific queuing disciplines, enabling future work on advanced dispatch and displacement policies.
* Conformance: Fixes the EPP ConfigMap namespace

Signed-off-by: Daneyon Hansen <[email protected]>

* Renames config file in rollout.md

Signed-off-by: Daneyon Hansen <[email protected]>

---------

Signed-off-by: Daneyon Hansen <[email protected]>
This commit introduces the `IntraFlowDispatchPolicy` framework, the second major component of the new pluggable flow control system. This framework decouples the logic for selecting a request from within a single flow's queue (temporal scheduling) from the underlying queue data structure.

Key components include:
- `framework.IntraFlowDispatchPolicy`: The core interface that defines the contract for selecting an item from a flow's queue.
- `framework.FlowQueueAccessor`: A read-only interface that provides policies with safe access to queue state.
- `RequiredQueueCapabilities`: A mechanism for policies to declare their queue requirements (e.g., FIFO, priority-ordered), which are validated by the registry.
- A factory and registration system for discovering and instantiating policy plugins by name.
- A comprehensive conformance test suite to validate the contract for all policy plugins.
- A foundational `FCFS` (First-Come, First-Served) policy as the first reference implementation.

This work builds directly on the `SafeQueue` framework, enabling the development of sophisticated, policy-driven request prioritization and scheduling.
@k8s-ci-robot k8s-ci-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 16, 2025
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 16, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @capri-xiyue. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jul 16, 2025
@capri-xiyue
Copy link
Contributor Author

/hold
Don't merge as it is just for api review.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 16, 2025
Copy link

netlify bot commented Jul 16, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 6c02159
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/687811ca8fb1b50008233e1f
😎 Deploy Preview https://deploy-preview-1173--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@capri-xiyue capri-xiyue force-pushed the capri-xiyue/capri-xiyue-v1-api-review branch from 5317094 to 4ffb5f6 Compare July 16, 2025 20:36
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 16, 2025
@capri-xiyue capri-xiyue reopened this Jul 16, 2025
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jul 16, 2025
@capri-xiyue
Copy link
Contributor Author

/hold, this should not get merged

@capri-xiyue capri-xiyue force-pushed the capri-xiyue/capri-xiyue-v1-api-review branch from 77371fe to 4ffb5f6 Compare July 16, 2025 20:53
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jul 16, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: capri-xiyue
Once this PR has been reviewed and has the lgtm label, please assign danehans for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

1 similar comment
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: capri-xiyue
Once this PR has been reviewed and has the lgtm label, please assign danehans for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jul 16, 2025
@robscott
Copy link
Member

Note: This PR is NOT intended to merge, it is entirely for the purpose of API review.

/cc @aojea @danwinship @thockin

@capri-xiyue capri-xiyue changed the base branch from main to release-0.5 July 17, 2025 16:41
@k8s-ci-robot k8s-ci-robot added do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 17, 2025
@capri-xiyue capri-xiyue changed the base branch from release-0.5 to main July 17, 2025 16:42
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jul 17, 2025
@capri-xiyue capri-xiyue changed the base branch from main to release-1.0 July 17, 2025 16:42
@k8s-ci-robot k8s-ci-robot added do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 17, 2025
@capri-xiyue capri-xiyue changed the base branch from release-1.0 to release-0.5 July 17, 2025 16:43
@k8s-ci-robot
Copy link
Contributor

Keywords which can automatically close issues and at(@) or hashtag(#) mentions are not allowed in commit messages.

The list of commits with invalid commit messages:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@nirrozenbaum
Copy link
Contributor

why is it pushed to release 0.5 branch?

@capri-xiyue
Copy link
Contributor Author

capri-xiyue commented Jul 17, 2025

why is it pushed to release 0.5 branch?

This is not meant to merge, just for api review. As #1116 get merged, If I point it to main, there won't be any difference.

Comment on lines +70 to +72
// EndpointPickerConfig specifies the configuration needed by the proxy to discover and connect to the endpoint
// picker service that picks endpoints for the requests routed to this pool.
EndpointPickerConfig `json:",inline"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed during today's community meeting, we need to decide on refining the EndpointPickerConfig type. We originally chose this API structure to support surfacing config for future extensions, and inlining to simplify the UI. From my understanding, we cannot change the inline after the API goes GA, so we should either simplify the EPP config API surface or remove inlining.

cc: @robscott @ahg-g @smarterclayton

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are removing this struct, we may want to rename extensionRef to make it explicit that this is an epp extension; also, do we want to make extensionRef a list (and add a type enum, with the only value possible now EPP) to allow potential expansion to other pool attached extensions?

// that should be included in the InferencePool.
// In some cases, implementations may translate this field to a Service selector, so this matches the simple
// map used for Service selectors instead of the full Kubernetes LabelSelector type.
// If sepecified, it will be applied to match the model server pods in the same namespace as the InferencePool.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, typo on "sepecified"

// In some cases, implementations may translate this field to a Service selector, so this matches the simple
// map used for Service selectors instead of the full Kubernetes LabelSelector type.
// If sepecified, it will be applied to match the model server pods in the same namespace as the InferencePool.
// Cross namesoace selector is not supported.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, "namesoace"

// Cross namesoace selector is not supported.
//
// +kubebuilder:validation:Required
Selector map[LabelKey]LabelValue `json:"selector"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related issue where there seems to be some use cases for using the full selector feature because the map fall short kubernetes/kubernetes#48528, just for reference since it is not going to be possible to evolve later, but I imagine that you have already discussed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants