-
Notifications
You must be signed in to change notification settings - Fork 208
BBR pluggable framework proposal #1964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,251 @@ | ||||||
| # Pluggable Body-Based Routing (BBR) Framework | ||||||
|
|
||||||
| Author(s): @davidbreitgand @srampal | ||||||
|
|
||||||
| ## Proposal Status | ||||||
|
|
||||||
| ***Draft*** | ||||||
|
|
||||||
| ## Summary | ||||||
|
|
||||||
| The Gateway API Inference Extension (v1.2.1) includes an initial implementation of Body-Based Routing (BBR). Currently, BBR provides a single capability: it extracts the model name from the request body and adds it to the `X-Gateway-Model-Name` header. This header is then used to route the request to the appropriate InferencePool and its associated Endpoint Picker Extension (EPP) instances. | ||||||
|
|
||||||
| The current BBR implementation is limited and lacks extensibility. Similar to the [pluggability introduced in the scheduling subsystem](../0845-scheduler-architecture-proposal/README.md), BBR should support custom extensions without requiring modifications to the GIE code base. | ||||||
|
|
||||||
| This proposal introduces a plugin architecture for BBR that allows developers to implement custom logic. Plugins could be organized into a chain or DAG for ordered and concurrent execution. | ||||||
|
|
||||||
| See [this document](https://docs.google.com/document/d/1So9uRjZrLUHf7Rjv13xy_ip3_5HSI1cn1stS3EsXLWg/edit?tab=t.0#heading=h.55jwocr94axs) for additional context amd reference. | ||||||
|
|
||||||
| ## Goals | ||||||
|
|
||||||
| The pluggable BBR Framework aims at addressing the following goals | ||||||
|
|
||||||
| - Avoid monolithic architecture | ||||||
| - Mimic pluggability and configurability of the scheduling subsystem without coupling between the two | ||||||
| - Enable organizing plugins into a topology for ordered and concurrent execution | ||||||
| - Avoid redundant recurrent body parsing across plugins in a topology for the sake of performance | ||||||
| - Limit changes to the BBR feature to avoid any changes in the rest of the code base | ||||||
| - Follow best practices and experience from the Scheduling subsystem | ||||||
| pluggability effort. For example, extending the system to support the above | ||||||
| should be through implementing well defined `Plugin` interfaces and registering | ||||||
| them in the BBR subsystem; any configuration would be done in the | ||||||
| same way (e.g., code and/or configuration file) | ||||||
| - Reuse common code from EPP, such as `TypedName`, wherever make sense, but avoid reusing specialized code with non-BBR functionality to avoid abuse | ||||||
| - Enable extensible collection and registration of metrics using lessons from the pluggable scheduling sub-system | ||||||
| - Provide reference plugin implementations. | ||||||
|
|
||||||
| ## Non-Goals | ||||||
|
|
||||||
| - Modify existing GIE abstractions | ||||||
| - Fully align plugins, registries, and factories across BBR and EPP | ||||||
| - Dynamically reconfigure plugins and plugin topologies at runtime | ||||||
|
|
||||||
| ## Proposal | ||||||
|
|
||||||
| ### Overview | ||||||
|
|
||||||
| There is an embedded `BBRPlugin` interface building on the `Plugin` interface adopted from EPP. This interface should be implemented by any BBR plugin. Each pluigin is identified by its `TypedName` (adopted from EPP), where `TypedName().Type` gives the string representing the type of the plugin and `TypedName().Name()` returns the string representing the plugins implementation. BBR is refactored to implement the registered factory pattern. To that end, a `PluginRegistry` interface and its implementation are added to register `BBRPlugin` factories and concrete implementations created by the factories. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
nit: In EPP factories are a concrete implementation and not an interface. What is the reason for the factory abstraction?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Each concrete |
||||||
| In addition, a `PluginsChain` interface is defined to define an order of plugin executions. In the future, `PluginsChain` will be replaced by `PluginsDAG` to allow for more complex topological order and concurrency. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the first implementation it might be worthwhile to treat it as a single plugin solely for "body based routing" functionality. Chains and DAGs might be desireable once the BBR changes to support body based processing and not just routing. |
||||||
|
|
||||||
| `PluginsChain` only contains ordered `BBRPlugin` types registered in the `PluginRegistry`. `RequestPluginsChain` and `ResponsePluginsChain` are optionally configured for handling requests and responses respectively. If no configuration is provided, default `PluginsChain` instances will be configured automatically. | ||||||
|
|
||||||
| Depending on a `BBRPlugin` functionality and implementation, the plugin might require full or selective body parsing. To save the parsing overhead, if there is at least one `BBRPlugin` in the `PluginsChain` that requires full body parsing, the parsing is performed only once into a shared official appropriate `openai-go` struct (either `openai.CompletionNewParams` or `openai.ChatCompletionNewParams` depending on the request endpoint). This struct is shared for read-only to all plugins in the chain. Each `BBRplugin` receives the shared struct by value. If a plugin needs to mutate the body, in the initial implementation, it MUST work on its own copy, and the a mutated body is returned separately by each plugiin. | ||||||
|
|
||||||
| ### Suggested Components | ||||||
|
|
||||||
| The sketch of the proposed framework is shown in the figure below. | ||||||
| <img src="./images/pluggable-framework-architecture-sketch.png" alt="Components of the proposed framework" width="1000" /> | ||||||
|
|
||||||
| ### Suggested BBR Pluggable Framework Interfaces | ||||||
|
|
||||||
| ```go | ||||||
| // ------------------------------------ Defaults ------------------------------------------ | ||||||
| const DefaultPluginType = "MetadataExtractor" | ||||||
| const DefaultPluginImplementation = "simple-model-selector" | ||||||
|
|
||||||
| // BBRPlugin defines the interface for plugins in the BBR framework | ||||||
| type BBRPlugin interface { | ||||||
| plugins.Plugin | ||||||
|
|
||||||
| // RequiresFullParsing indicates whether full body parsing is required | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: consider using type assertion where presence of |
||||||
| // to facilitate efficient memory sharing across plugins in a chain. | ||||||
| RequiresFullParsing() bool | ||||||
|
|
||||||
| // Execute runs the plugin logic on the request body. | ||||||
| // A plugin's imnplementation logic CAN mutate the body of the message. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| // A plugin's implementation MUST return a map of headers | ||||||
| // If no headers are set by the implementation, the map must be empty | ||||||
| // A value of a header in an extended implementation NEED NOT to be identical to the value of that same header as would be set | ||||||
| // in a default implementation. | ||||||
| // Example: in the body of a request model is set to "semantic-model-selector", | ||||||
| // which, say, stands for "select a best model for this request at minimal cost" | ||||||
| // A plugin implementation of "semantic-model-selector" sets X-Gateway-Model-Name to any valid | ||||||
| // model name from the inventory of the backend models and also mutates the body accordingly | ||||||
| // In contrast, | ||||||
davidbreitgand marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
| Execute(requestBodyBytes []byte) ( | ||||||
| headers map[string]string, | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not clear from description if
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also, note that HTTP headers are
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I intentionally allow only one value for every HTTP header. If plugins operate in a topology, then when headers are merged by a special helper, an error is raised if one plugin wants to "overwrite" a header set by another plugin in the topology. This might change in the future implementation, but since you also suggest to limit the first PR to a single plugin, this might be immaterial for this PR. Each concrete plugin implementation should know what headers it sets. Therefore, there is no map of headers passed as a parameter. |
||||||
| mutatedBodyBytes []byte, | ||||||
| err error, | ||||||
| ) | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: prefer single line if possible |
||||||
| } | ||||||
|
|
||||||
|
|
||||||
| // placeholder for BBRPlugin constructors | ||||||
| type PluginFactoryFunc func() bbrplugins.BBRPlugin //concrete constructors are assigned to this type | ||||||
davidbreitgand marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
|
||||||
| // PluginRegistry defines operations for managing plugin factories and plugin instances | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is a rather wide interface. Consider removing functions or splitting into separate and indepenent interfaces.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this pretty much follows the footsteps of the EPP implementation except for the registry interface, which I will drop. This is how it is done in EPP import (
"encoding/json"
)
// Factory is the definition of the factory functions that are used to instantiate plugins
// specified in a configuration.
type FactoryFunc func(name string, parameters json.RawMessage, handle Handle) (Plugin, error)
// Register is a static function that can be called to register plugin factory functions.
func Register(pluginType string, factory FactoryFunc) {
Registry[pluginType] = factory
}
// Registry is a mapping from plugin name to Factory function
var Registry map[string]FactoryFunc = map[string]FactoryFunc{} |
||||||
| type PluginRegistry interface { | ||||||
davidbreitgand marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
| RegisterFactory(typeKey string, factory PluginFactoryFunc) error //constructors | ||||||
| RegisterPlugin(plugin bbrplugins.BBRPlugin) error //registers a plugin instance (the instance MUST be created via the factory first) | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Explain rationale for having both factories and plugin instance in the same interface/structure. |
||||||
| GetFactory(typeKey string) (PluginFactoryFunc, error) | ||||||
davidbreitgand marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
| GetPlugin(typeKey string) (bbrplugins.BBRPlugin, error) | ||||||
| GetFactories() map[string]PluginFactoryFunc | ||||||
| GetPlugins() map[string]bbrplugins.BBRPlugin | ||||||
| ListPlugins() []string | ||||||
| ListFactories() []string | ||||||
| CreatePlugin(typeKey string) (bbrplugins.BBRPlugin, error) | ||||||
| ContainsFactory(typeKey string) bool | ||||||
| ContainsPlugin(typeKey string) bool | ||||||
| String() string //human readable string for logging | ||||||
| } | ||||||
|
|
||||||
| // PluginsChain is used to define a specific order of execution of the BBRPlugin instances stored in the registry | ||||||
davidbreitgand marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
| // The BBRPlugin instances | ||||||
| type PluginsChain interface { | ||||||
davidbreitgand marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
| AddPlugin(typeKey string, registry PluginRegistry) error //to be added to the chain the plugin should be registered in the registry first | ||||||
| AddPluginAtInd(typeKey string, i int, r PluginRegistry) error //only affects the instance of the plugin chain | ||||||
| GetPlugin(index int, registry PluginRegistry) (bbrplugins.BBRPlugin, error) //retrieves i-th plugin as defined in the chain from the registry | ||||||
| Length() int | ||||||
| ParseChatCompletion(data []byte) (openai.ChatCompletionNewParams, error) //parses the bytes slice into an appropriate openai-go struct | ||||||
| ParseCompletion(data []byte) (openai.CompletionNewParams, error) //likewise | ||||||
|
Comment on lines
+119
to
+120
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why needed (as opposed to just Run)? |
||||||
| GetSharedMemory(which string) interface{} //returns an appropriate shared open-ai struct dependent on whether which | ||||||
| //corresponds to Completion or ChatCompletion endpoint requested in the body | ||||||
| Run(bodyBytes []byte, registry PluginRegistry) ([]byte, map[string]string, error) //return potentially mutated body and all headers map safely merged | ||||||
| String() string | ||||||
| } | ||||||
| //NOTE: for simplicity, in the initial PR, PluginsChain instance will be defined request only | ||||||
| ``` | ||||||
|
|
||||||
| ### Defaults | ||||||
|
|
||||||
| ```go | ||||||
|
|
||||||
| const ( | ||||||
| //A deafult plugin implementation of this plugin type will always be configured for request plugins chain | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
nit: please close comment lines with |
||||||
| //Even though BBRPlugin type is not (yet) a K8s resource, it's logically akin to `kind` | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not a limitation we have in EPP. Perhaps delay it until a more formal k8s Plugin type is defined in some AI gateway work stream?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Full alignment with EPP was not a design goal. I agree that eventually this will be decided in AI GW WG. I believe it is likely that eventually plugins will be K8s resources. Or maybe not. But in this implementation this is just a naming convention. Is this harmful to this implementation to have rules for plugin naming? If you think it is limiting, I can drop this. |
||||||
| //MUST start wit an upper case letter, use CamelNotation, only aplhanumericals after the first letter | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| PluginTypePattern = `^[A-Z][A-Za-z0-9]*$` | ||||||
| MaxPluginTypeLength = 63 | ||||||
| DefaultPluginType = "MetaDataExtractor" | ||||||
| // Even though BBRPlugin is not a K8s resource yet, let's make its naming compliant with K8s resource naming | ||||||
| // Allows: lowercase letters, digits, hyphens, dots. | ||||||
| // Must start and end with a lowercase alphanumeric character. | ||||||
| // Middle characters group can contain lowercase alphanumerics, hyphens, and dots | ||||||
| // Middle and rightmost groups are optional | ||||||
| PluginNamePattern = `^[a-z0-9]([-a-z0-9.]*[a-z0-9])?$` | ||||||
| DefaultPluginName = "simple-model-extractor" | ||||||
| MaxPluginNameLength = 253 | ||||||
| //Well-known custom header set to a model name | ||||||
| ModelHeader = "X-Gateway-Model-Name" | ||||||
| ) | ||||||
| ``` | ||||||
|
|
||||||
| ### Current BBR reimplementation as BBRPlugin | ||||||
davidbreitgand marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
|
||||||
| ```go | ||||||
| / ------------------------------------ DEFAULT PLUGIN IMPLEMENTATION ---------------------------------------------- | ||||||
|
|
||||||
| type simpleModelExtractor struct { //implements the MetadataExtractor interface | ||||||
| typedName plugins.TypedName | ||||||
| requiresFullParsing bool | ||||||
| } | ||||||
|
|
||||||
| // defaultMetaDataExtractor implements the MetadataExtractor interface and extracts only the mmodel name AS-IS | ||||||
| type defaultMetaDataExtractor struct { | ||||||
| typedName plugins.TypedName | ||||||
| requiresFullParsing bool //this field will be used to determine whether shared struct should be created in this chain | ||||||
| } | ||||||
|
|
||||||
| // NewSimpleModelExtractor is a factory that constructs SimpleModelExtractor plugin | ||||||
| // A developer who wishes to create her own implementation, will implement the BBRPlugin interface and | ||||||
| // use Registry and PluginsChain to register and execute the plugin (together with other plugins in a chain) | ||||||
| func NewDefaultMetaDataExtractor() BBRPlugin { | ||||||
| return &defaultMetaDataExtractor{ | ||||||
| typedName: plugins.TypedName{ | ||||||
| Type: DefaultPluginType, | ||||||
| Name: "simple-model-extractor", | ||||||
| }, | ||||||
| requiresFullParsing: false, | ||||||
| } | ||||||
| } | ||||||
|
|
||||||
| func (s *defaultMetaDataExtractor) RequiresFullParsing() bool { | ||||||
| return s.requiresFullParsing | ||||||
| } | ||||||
|
|
||||||
| func (s *defaultMetaDataExtractor) TypedName() plugins.TypedName { | ||||||
| return s.typedName | ||||||
| } | ||||||
|
|
||||||
| // Execute extracts the "model" from the JSON request body and sets X-Gateway-Model-Name header. | ||||||
| // This implementation intentionally ignores metaDataKeys and does not mutate the body. | ||||||
| // It expects the request body to be a JSON object containing a "model" field. | ||||||
| // A nil for metaDataKeysToHeaders map SHOULD be specified by a caller for clarity | ||||||
| // The metaDataKeysToHeaders is explicitly ignored in this implementation | ||||||
| // This implementation is simply refactoring of the default BBR implementation to work with the pluggable framework | ||||||
| func (s *defaultMetaDataExtractor) Execute(requestBodyBytes []byte) ( | ||||||
| headers map[string]string, | ||||||
| mutatedBodyBytes []byte, | ||||||
| err error) { | ||||||
|
|
||||||
| type RequestBody struct { | ||||||
| Model string `json:"model"` | ||||||
| } | ||||||
|
|
||||||
| h := make(map[string]string) | ||||||
|
|
||||||
| var requestBody RequestBody | ||||||
|
|
||||||
| if err := json.Unmarshal(requestBodyBytes, &requestBody); err != nil { | ||||||
| // return original body on decode failure | ||||||
| return nil, requestBodyBytes, err | ||||||
| } | ||||||
|
|
||||||
| if requestBody.Model == "" { | ||||||
| return nil, requestBodyBytes, fmt.Errorf("missing required field: model") | ||||||
| } | ||||||
|
|
||||||
| // ModelHeader is a constant defined in ./pkg/bbr/plugins/interfaces | ||||||
| h[ModelHeader] = requestBody.Model | ||||||
|
|
||||||
| // Body is not mutated in this implementation hence returning original requestBodyBytes. This is intentional. | ||||||
| return h, requestBodyBytes, nil | ||||||
| } | ||||||
|
|
||||||
| func (s *defaultMetaDataExtractor) String() string { | ||||||
| return fmt.Sprintf(("BBRPlugin{%v/requiresFullParsing=%v}"), s.TypedName(), s.requiresFullParsing) | ||||||
| } | ||||||
| ``` | ||||||
|
|
||||||
| ### Implementation Phases | ||||||
|
|
||||||
| The pluggable framework will be implemented iteratively over several phases. | ||||||
|
|
||||||
| 1. Introduce `BBRPlugin` `MetadataExtractor`, interface, registry, plugins chain, sample plugin implementation (`SimpleModelExtraction`) and its factory. Plugin configuration will be implemented via environment variables set in helm chart | ||||||
| 1. Introduce a second plugin interface, `ModelSelector` and sample plugin implementation | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Q: for the initial task (various ways to do body based routing), a |
||||||
| 1. Introduce shared struct (shared among the plugins of a plugins chain) | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. that does what?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||||||
| 1. Introduce an interface for guardrail plugin, introduce simple reference implementation, experiment with plugins chains on request and response messages | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. suggets making this out of scope for the implementation focusing on model selection in BBR and not other features.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will exclude guardrails from the proposal for simplicity and crisper focus, even though I can imagine a use case when one first has to run a semantic model selection plugin and then a guardrail plugin (s), which is/are specifically tailored to the selected model. The scope will only be on model selection, either via metadata extraction as-is, or more complicated logic like in issue #1812, or semantic model selection. |
||||||
| 1. Refactor metrics as needed to work with the new pluggable framework | ||||||
| 1. Implement configuration via manifests similar to those in EPP | ||||||
| 1. Implement `PluginsDAG` to allow for more complex topological order and concurrency. | ||||||
| 1. Continously learn lessons from this implementation and scheduling framework to improve the implementation | ||||||
| 1. Aim at aligning and cross-polination with the [AI GW WG]("https://github.com/kubernetes-sigs/wg-ai-gateway"). | ||||||
|
|
||||||
| ## Open Questions | ||||||
|
|
||||||
| 1. More elaborate shared memory architecture for the best performance | ||||||
davidbreitgand marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
| 1. TBA | ||||||
|
|
||||||
| ## Note | ||||||
|
|
||||||
| The proposed interfaces can slightly change from those implemented in the initial [PR 1981](https://github.com/kubernetes-sigs/gateway-api-inference-extension/pull/1981) | ||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider grouping goals into sections (e.g., splitting goals into required and optional, generalizing into categories).
We might be able to support ordered and concurrent execution at a later stage, when a concrete requirement is defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Segregating to
immediateandextended