Skip to content

v1.0 InferencePool API Review #1173

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: release-0.5
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
285 changes: 285 additions & 0 deletions config/crd/bases/inference.networking.k8s.io_inferencepools.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,285 @@
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.16.1
name: inferencepools.inference.networking.k8s.io
spec:
group: inference.networking.k8s.io
names:
kind: InferencePool
listKind: InferencePoolList
plural: inferencepools
singular: inferencepool
scope: Namespaced
versions:
- name: v1
schema:
openAPIV3Schema:
description: InferencePool is the Schema for the InferencePools API.
properties:
apiVersion:
description: |-
APIVersion defines the versioned schema of this representation of an object.
Servers should convert recognized schemas to the latest internal value, and
may reject unrecognized values.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
type: string
kind:
description: |-
Kind is a string value representing the REST resource this object represents.
Servers may infer this from the endpoint the client submits requests to.
Cannot be updated.
In CamelCase.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
type: string
metadata:
type: object
spec:
description: InferencePoolSpec defines the desired state of InferencePool
properties:
extensionRef:
description: Extension configures an endpoint picker as an extension
service.
properties:
failureMode:
default: FailClose
description: |-
Configures how the gateway handles the case when the extension is not responsive.
Defaults to failClose.
enum:
- FailOpen
- FailClose
type: string
group:
default: ""
description: |-
Group is the group of the referent.
The default value is "", representing the Core API group.
maxLength: 253
pattern: ^$|^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$
type: string
kind:
default: Service
description: |-
Kind is the Kubernetes resource kind of the referent. For example
"Service".

Defaults to "Service" when not specified.

ExternalName services can refer to CNAME DNS records that may live
outside of the cluster and as such are difficult to reason about in
terms of conformance. They also may not be safe to forward to (see
CVE-2021-25740 for more information). Implementations MUST NOT
support ExternalName Services.
maxLength: 63
minLength: 1
pattern: ^[a-zA-Z]([-a-zA-Z0-9]*[a-zA-Z0-9])?$
type: string
name:
description: Name is the name of the referent.
maxLength: 253
minLength: 1
type: string
portNumber:
description: |-
The port number on the service running the extension. When unspecified,
implementations SHOULD infer a default value of 9002 when the Kind is
Service.
format: int32
maximum: 65535
minimum: 1
type: integer
required:
- name
type: object
selector:
additionalProperties:
description: |-
LabelValue is the value of a label. This is used for validation
of maps. This matches the Kubernetes label validation rules:
* must be 63 characters or less (can be empty),
* unless empty, must begin and end with an alphanumeric character ([a-z0-9A-Z]),
* could contain dashes (-), underscores (_), dots (.), and alphanumerics between.

Valid values include:

* MyValue
* my.name
* 123-my-value
maxLength: 63
minLength: 0
pattern: ^(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?$
type: string
description: |-
Selector defines a map of labels to watch model server pods
that should be included in the InferencePool.
In some cases, implementations may translate this field to a Service selector, so this matches the simple
map used for Service selectors instead of the full Kubernetes LabelSelector type.
If sepecified, it will be applied to match the model server pods in the same namespace as the InferencePool.
Cross namesoace selector is not supported.
type: object
targetPortNumber:
description: |-
TargetPortNumber defines the port number to access the selected model servers.
The number must be in the range 1 to 65535.
format: int32
maximum: 65535
minimum: 1
type: integer
required:
- extensionRef
- selector
- targetPortNumber
type: object
status:
default:
parent:
- conditions:
- lastTransitionTime: "1970-01-01T00:00:00Z"
message: Waiting for controller
reason: Pending
status: Unknown
type: Accepted
parentRef:
kind: Status
name: default
description: Status defines the observed state of InferencePool.
properties:
parent:
description: |-
Parents is a list of parent resources (usually Gateways) that are
associated with the InferencePool, and the status of the InferencePool with respect to
each parent.

A maximum of 32 Gateways will be represented in this list. When the list contains
`kind: Status, name: default`, it indicates that the InferencePool is not
associated with any Gateway and a controller must perform the following:

- Remove the parent when setting the "Accepted" condition.
- Add the parent when the controller will no longer manage the InferencePool
and no other parents exist.
items:
description: PoolStatus defines the observed state of InferencePool
from a Gateway.
properties:
conditions:
default:
- lastTransitionTime: "1970-01-01T00:00:00Z"
message: Waiting for controller
reason: Pending
status: Unknown
type: Accepted
description: |-
Conditions track the state of the InferencePool.

Known condition types are:

* "Accepted"
* "ResolvedRefs"
items:
description: Condition contains details for one aspect of
the current state of this API Resource.
properties:
lastTransitionTime:
description: |-
lastTransitionTime is the last time the condition transitioned from one status to another.
This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable.
format: date-time
type: string
message:
description: |-
message is a human readable message indicating details about the transition.
This may be an empty string.
maxLength: 32768
type: string
observedGeneration:
description: |-
observedGeneration represents the .metadata.generation that the condition was set based upon.
For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date
with respect to the current state of the instance.
format: int64
minimum: 0
type: integer
reason:
description: |-
reason contains a programmatic identifier indicating the reason for the condition's last transition.
Producers of specific condition types may define expected values and meanings for this field,
and whether the values are considered a guaranteed API.
The value should be a CamelCase string.
This field may not be empty.
maxLength: 1024
minLength: 1
pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$
type: string
status:
description: status of the condition, one of True, False,
Unknown.
enum:
- "True"
- "False"
- Unknown
type: string
type:
description: type of condition in CamelCase or in foo.example.com/CamelCase.
maxLength: 316
pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$
type: string
required:
- lastTransitionTime
- message
- reason
- status
- type
type: object
maxItems: 8
type: array
x-kubernetes-list-map-keys:
- type
x-kubernetes-list-type: map
parentRef:
description: GatewayRef indicates the gateway that observed
state of InferencePool.
properties:
group:
default: gateway.networking.k8s.io
description: Group is the group of the referent.
maxLength: 253
pattern: ^$|^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$
type: string
kind:
default: Gateway
description: Kind is kind of the referent. For example "Gateway".
maxLength: 63
minLength: 1
pattern: ^[a-zA-Z]([-a-zA-Z0-9]*[a-zA-Z0-9])?$
type: string
name:
description: Name is the name of the referent.
maxLength: 253
minLength: 1
type: string
namespace:
description: |-
Namespace is the namespace of the referent. If not present,
the namespace of the referent is assumed to be the same as
the namespace of the referring object.
maxLength: 63
minLength: 1
pattern: ^[a-z0-9]([-a-z0-9]*[a-z0-9])?$
type: string
required:
- name
type: object
required:
- parentRef
type: object
maxItems: 32
type: array
type: object
type: object
served: true
storage: true
subresources:
status: {}
8 changes: 5 additions & 3 deletions config/manifests/inferencepool-resources.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Note: If you change this file, please also change the file used for e2e tests!
#
# https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/test/testdata/inferencepool-e2e.yaml
# Note: If you change this file, please also change:
# - ./test/testdata/inferencepool-e2e.yaml
# - ./conformance/resources/manifests/manifests.yaml
# - ./site-src/guides/inferencepool-rollout.md
---
apiVersion: inference.networking.x-k8s.io/v1alpha2
kind: InferencePool
metadata:
Expand Down
8 changes: 1 addition & 7 deletions conformance/resources/manifests/manifests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -213,9 +213,6 @@ spec:
- "9003"
- "-configFile"
- "/config/conformance-plugins.yaml"
env:
- name: USE_STREAMING
value: "true"
ports:
- containerPort: 9002
- containerPort: 9003
Expand Down Expand Up @@ -310,9 +307,6 @@ spec:
- "9003"
- "-configFile"
- "/config/conformance-plugins.yaml"
env:
- name: USE_STREAMING
value: "true"
ports:
- containerPort: 9002
- containerPort: 9003
Expand Down Expand Up @@ -342,7 +336,7 @@ apiVersion: v1
kind: ConfigMap
metadata:
name: plugins-config
namespace: default
namespace: gateway-conformance-app-backend
data:
conformance-plugins.yaml: |
apiVersion: inference.networking.x-k8s.io/v1alpha1
Expand Down
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ require (
github.com/stretchr/testify v1.10.0
go.uber.org/multierr v1.11.0
go.uber.org/zap v1.27.0
golang.org/x/sync v0.15.0
golang.org/x/sync v0.16.0
google.golang.org/grpc v1.73.0
google.golang.org/protobuf v1.36.6
k8s.io/api v0.33.2
Expand Down
4 changes: 2 additions & 2 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -245,8 +245,8 @@ golang.org/x/oauth2 v0.30.0/go.mod h1:B++QgG3ZKulg6sRPGD/mqlHQs5rB3Ml9erfeDY7xKl
golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20190911185100-cd5d95a43a6e/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20201020160332-67f06af15bc9/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.15.0 h1:KWH3jNZsfyT6xfAfKiz6MRNmd46ByHDYaZ7KSkCtdW8=
golang.org/x/sync v0.15.0/go.mod h1:1dzgHSNfp02xaA81J2MS99Qcpr2w7fw1gpm99rleRqA=
golang.org/x/sync v0.16.0 h1:ycBJEhp9p4vXvUZNszeOq0kGTPghopOL8q0fq3vstxw=
golang.org/x/sync v0.16.0/go.mod h1:1dzgHSNfp02xaA81J2MS99Qcpr2w7fw1gpm99rleRqA=
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20190412213103-97732733099d/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20200930185726-fdedc70b468f/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
Expand Down
32 changes: 32 additions & 0 deletions pkg/epp/flowcontrol/framework/doc.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
/*
Copyright 2025 The Kubernetes Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

// Package framework defines the core plugin interfaces for extending the `controller.FlowController`.
//
// It establishes the contracts that custom logic, such as queueing disciplines and dispatching policies, must adhere
// to. By building on these interfaces, the Flow Control system can be extended and customized without modifying the
// core controller logic.
//
// The primary contracts are:
// - `SafeQueue`: An interface for concurrent-safe queue implementations.
// - `IntraFlowDispatchPolicy`: An interface for policies that decide which item to select from within a single flow's
// queue.
// - `ItemComparator`: An interface vended by policies to make their internal item-ordering logic explicit and
// available to other components.
//
// These components are linked by `QueueCapability`, which allows policies to declare their queue requirements (e.g.,
// FIFO or priority-based ordering).
package framework
Loading