Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
251 changes: 251 additions & 0 deletions docs/proposals/1964-pluggable-bbr-framework/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,251 @@
# Pluggable Body-Based Routing (BBR) Framework

Author(s): @davidbreitgand @srampal

## Proposal Status

***Draft***

## Summary

The Gateway API Inference Extension (v1.2.1) includes an initial implementation of Body-Based Routing (BBR). Currently, BBR provides a single capability: it extracts the model name from the request body and adds it to the `X-Gateway-Model-Name` header. This header is then used to route the request to the appropriate InferencePool and its associated Endpoint Picker Extension (EPP) instances.

The current BBR implementation is limited and lacks extensibility. Similar to the [pluggability introduced in the scheduling subsystem](../0845-scheduler-architecture-proposal/README.md), BBR should support custom extensions without requiring modifications to the GIE code base.

This proposal introduces a plugin architecture for BBR that allows developers to implement custom logic. Plugins could be organized into a chain or DAG for ordered and concurrent execution.

See [this document](https://docs.google.com/document/d/1So9uRjZrLUHf7Rjv13xy_ip3_5HSI1cn1stS3EsXLWg/edit?tab=t.0#heading=h.55jwocr94axs) for additional context amd reference.

## Goals

The pluggable BBR Framework aims at addressing the following goals

- Avoid monolithic architecture
- Mimic pluggability and configurability of the scheduling subsystem without coupling between the two
- Enable organizing plugins into a topology for ordered and concurrent execution
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider grouping goals into sections (e.g., splitting goals into required and optional, generalizing into categories).
We might be able to support ordered and concurrent execution at a later stage, when a concrete requirement is defined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Segregating to immediate and extended

- Avoid redundant recurrent body parsing across plugins in a topology for the sake of performance
- Limit changes to the BBR feature to avoid any changes in the rest of the code base
- Follow best practices and experience from the Scheduling subsystem
pluggability effort. For example, extending the system to support the above
should be through implementing well defined `Plugin` interfaces and registering
them in the BBR subsystem; any configuration would be done in the
same way (e.g., code and/or configuration file)
- Reuse common code from EPP, such as `TypedName`, wherever make sense, but avoid reusing specialized code with non-BBR functionality to avoid abuse
- Enable extensible collection and registration of metrics using lessons from the pluggable scheduling sub-system
- Provide reference plugin implementations.

## Non-Goals

- Modify existing GIE abstractions
- Fully align plugins, registries, and factories across BBR and EPP
- Dynamically reconfigure plugins and plugin topologies at runtime

## Proposal

### Overview

There is an embedded `BBRPlugin` interface building on the `Plugin` interface adopted from EPP. This interface should be implemented by any BBR plugin. Each pluigin is identified by its `TypedName` (adopted from EPP), where `TypedName().Type` gives the string representing the type of the plugin and `TypedName().Name()` returns the string representing the plugins implementation. BBR is refactored to implement the registered factory pattern. To that end, a `PluginRegistry` interface and its implementation are added to register `BBRPlugin` factories and concrete implementations created by the factories.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
There is an embedded `BBRPlugin` interface building on the `Plugin` interface adopted from EPP. This interface should be implemented by any BBR plugin. Each pluigin is identified by its `TypedName` (adopted from EPP), where `TypedName().Type` gives the string representing the type of the plugin and `TypedName().Name()` returns the string representing the plugins implementation. BBR is refactored to implement the registered factory pattern. To that end, a `PluginRegistry` interface and its implementation are added to register `BBRPlugin` factories and concrete implementations created by the factories.
There is an embedded `BBRPlugin` interface building on the `Plugin` interface adopted from EPP. This interface should be implemented by any BBR plugin. Each plugin is identified by its `TypedName` (adopted from EPP), where `TypedName().Type` gives the string representing the type of the plugin and `TypedName().Name()` returns the string representing the plugin's instance name. BBR is refactored to implement the registered factory pattern. To that end, a `PluginRegistry` interface and its implementation are added to register `BBRPlugin` factories and concrete implementations created by the factories.

nit: In EPP factories are a concrete implementation and not an interface. What is the reason for the factory abstraction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each concrete BBRPlugin interface implementation is created by its factory. There is no special interface for factories. To simplify and better align with EPP, I will explicitly state that extending PluginRegistry itself in third-party extensions is not a design goal and, hence, I will remove the PluginRegistry interface and, by the same token, PluginsChain interface and only provide a concrete implementation for the registration mechanisms.

In addition, a `PluginsChain` interface is defined to define an order of plugin executions. In the future, `PluginsChain` will be replaced by `PluginsDAG` to allow for more complex topological order and concurrency.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first implementation it might be worthwhile to treat it as a single plugin solely for "body based routing" functionality. Chains and DAGs might be desireable once the BBR changes to support body based processing and not just routing.
In EPP, we had initially supported a single profile and multiple profiles (e.g., for P/D disagg) were supported by wrapping multiple scheduler instances behind a single instance implementation.


`PluginsChain` only contains ordered `BBRPlugin` types registered in the `PluginRegistry`. `RequestPluginsChain` and `ResponsePluginsChain` are optionally configured for handling requests and responses respectively. If no configuration is provided, default `PluginsChain` instances will be configured automatically.

Depending on a `BBRPlugin` functionality and implementation, the plugin might require full or selective body parsing. To save the parsing overhead, if there is at least one `BBRPlugin` in the `PluginsChain` that requires full body parsing, the parsing is performed only once into a shared official appropriate `openai-go` struct (either `openai.CompletionNewParams` or `openai.ChatCompletionNewParams` depending on the request endpoint). This struct is shared for read-only to all plugins in the chain. Each `BBRplugin` receives the shared struct by value. If a plugin needs to mutate the body, in the initial implementation, it MUST work on its own copy, and the a mutated body is returned separately by each plugiin.

### Suggested Components

The sketch of the proposed framework is shown in the figure below.
<img src="./images/pluggable-framework-architecture-sketch.png" alt="Components of the proposed framework" width="1000" />

### Suggested BBR Pluggable Framework Interfaces

```go
// ------------------------------------ Defaults ------------------------------------------
const DefaultPluginType = "MetadataExtractor"
const DefaultPluginImplementation = "simple-model-selector"

// BBRPlugin defines the interface for plugins in the BBR framework
type BBRPlugin interface {
plugins.Plugin

// RequiresFullParsing indicates whether full body parsing is required
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider using type assertion where presence of RequiresFullParsing() indicates true and it's lack indicates that full body parsing is not required.
The default implmentation would be RequiresFullParsing() { }

// to facilitate efficient memory sharing across plugins in a chain.
RequiresFullParsing() bool

// Execute runs the plugin logic on the request body.
// A plugin's imnplementation logic CAN mutate the body of the message.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// A plugin's imnplementation logic CAN mutate the body of the message.
// A plugin's implementation logic CAN mutate the body of the message.

// A plugin's implementation MUST return a map of headers
// If no headers are set by the implementation, the map must be empty
// A value of a header in an extended implementation NEED NOT to be identical to the value of that same header as would be set
// in a default implementation.
// Example: in the body of a request model is set to "semantic-model-selector",
// which, say, stands for "select a best model for this request at minimal cost"
// A plugin implementation of "semantic-model-selector" sets X-Gateway-Model-Name to any valid
// model name from the inventory of the backend models and also mutates the body accordingly
// In contrast,
Execute(requestBodyBytes []byte) (
headers map[string]string,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear from description if headers is in/out. How would the plugin get the input headers given If no headers are set by the implementation, the map must be empty

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, note that HTTP headers are map[string][]string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally allow only one value for every HTTP header. If plugins operate in a topology, then when headers are merged by a special helper, an error is raised if one plugin wants to "overwrite" a header set by another plugin in the topology. This might change in the future implementation, but since you also suggest to limit the first PR to a single plugin, this might be immaterial for this PR.

Each concrete plugin implementation should know what headers it sets. Therefore, there is no map of headers passed as a parameter.

mutatedBodyBytes []byte,
err error,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: prefer single line if possible (headers map[string]string, mutatedBodyBytes []byte, err error).
also, avoid named returns unless it's required to disambiguate.

}


// placeholder for BBRPlugin constructors
type PluginFactoryFunc func() bbrplugins.BBRPlugin //concrete constructors are assigned to this type

// PluginRegistry defines operations for managing plugin factories and plugin instances
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a rather wide interface. Consider removing functions or splitting into separate and indepenent interfaces.

Copy link
Contributor Author

@davidbreitgand davidbreitgand Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this pretty much follows the footsteps of the EPP implementation except for the registry interface, which I will drop. This is how it is done in EPP
package plugins. It defines a similar broad interface.

import (
	"encoding/json"
)

// Factory is the definition of the factory functions that are used to instantiate plugins
// specified in a configuration.
type FactoryFunc func(name string, parameters json.RawMessage, handle Handle) (Plugin, error)

// Register is a static function that can be called to register plugin factory functions.
func Register(pluginType string, factory FactoryFunc) {
	Registry[pluginType] = factory
}

// Registry is a mapping from plugin name to Factory function
var Registry map[string]FactoryFunc = map[string]FactoryFunc{}

type PluginRegistry interface {
RegisterFactory(typeKey string, factory PluginFactoryFunc) error //constructors
RegisterPlugin(plugin bbrplugins.BBRPlugin) error //registers a plugin instance (the instance MUST be created via the factory first)
Copy link
Contributor

@elevran elevran Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain rationale for having both factories and plugin instance in the same interface/structure.
In rest of EPP we separate factories and instances into different collections.

GetFactory(typeKey string) (PluginFactoryFunc, error)
GetPlugin(typeKey string) (bbrplugins.BBRPlugin, error)
GetFactories() map[string]PluginFactoryFunc
GetPlugins() map[string]bbrplugins.BBRPlugin
ListPlugins() []string
ListFactories() []string
CreatePlugin(typeKey string) (bbrplugins.BBRPlugin, error)
ContainsFactory(typeKey string) bool
ContainsPlugin(typeKey string) bool
String() string //human readable string for logging
}

// PluginsChain is used to define a specific order of execution of the BBRPlugin instances stored in the registry
// The BBRPlugin instances
type PluginsChain interface {
AddPlugin(typeKey string, registry PluginRegistry) error //to be added to the chain the plugin should be registered in the registry first
AddPluginAtInd(typeKey string, i int, r PluginRegistry) error //only affects the instance of the plugin chain
GetPlugin(index int, registry PluginRegistry) (bbrplugins.BBRPlugin, error) //retrieves i-th plugin as defined in the chain from the registry
Length() int
ParseChatCompletion(data []byte) (openai.ChatCompletionNewParams, error) //parses the bytes slice into an appropriate openai-go struct
ParseCompletion(data []byte) (openai.CompletionNewParams, error) //likewise
Comment on lines +119 to +120
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why needed (as opposed to just Run)?

GetSharedMemory(which string) interface{} //returns an appropriate shared open-ai struct dependent on whether which
//corresponds to Completion or ChatCompletion endpoint requested in the body
Run(bodyBytes []byte, registry PluginRegistry) ([]byte, map[string]string, error) //return potentially mutated body and all headers map safely merged
String() string
}
//NOTE: for simplicity, in the initial PR, PluginsChain instance will be defined request only
```

### Defaults

```go

const (
//A deafult plugin implementation of this plugin type will always be configured for request plugins chain
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//A deafult plugin implementation of this plugin type will always be configured for request plugins chain
// A default plugin implementation of this plugin type will always be configured for request plugins chain.

nit: please close comment lines with . when the next comment starts a new sentence (upper case...)

//Even though BBRPlugin type is not (yet) a K8s resource, it's logically akin to `kind`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a limitation we have in EPP. Perhaps delay it until a more formal k8s Plugin type is defined in some AI gateway work stream?

Copy link
Contributor Author

@davidbreitgand davidbreitgand Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full alignment with EPP was not a design goal. I agree that eventually this will be decided in AI GW WG. I believe it is likely that eventually plugins will be K8s resources. Or maybe not. But in this implementation this is just a naming convention. Is this harmful to this implementation to have rules for plugin naming? If you think it is limiting, I can drop this.

//MUST start wit an upper case letter, use CamelNotation, only aplhanumericals after the first letter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//MUST start wit an upper case letter, use CamelNotation, only aplhanumericals after the first letter
// MUST start with an upper case letter, use CamelNotation, only aplhanumericals after the first letter

PluginTypePattern = `^[A-Z][A-Za-z0-9]*$`
MaxPluginTypeLength = 63
DefaultPluginType = "MetaDataExtractor"
// Even though BBRPlugin is not a K8s resource yet, let's make its naming compliant with K8s resource naming
// Allows: lowercase letters, digits, hyphens, dots.
// Must start and end with a lowercase alphanumeric character.
// Middle characters group can contain lowercase alphanumerics, hyphens, and dots
// Middle and rightmost groups are optional
PluginNamePattern = `^[a-z0-9]([-a-z0-9.]*[a-z0-9])?$`
DefaultPluginName = "simple-model-extractor"
MaxPluginNameLength = 253
//Well-known custom header set to a model name
ModelHeader = "X-Gateway-Model-Name"
)
```

### Current BBR reimplementation as BBRPlugin

```go
/ ------------------------------------ DEFAULT PLUGIN IMPLEMENTATION ----------------------------------------------

type simpleModelExtractor struct { //implements the MetadataExtractor interface
typedName plugins.TypedName
requiresFullParsing bool
}

// defaultMetaDataExtractor implements the MetadataExtractor interface and extracts only the mmodel name AS-IS
type defaultMetaDataExtractor struct {
typedName plugins.TypedName
requiresFullParsing bool //this field will be used to determine whether shared struct should be created in this chain
}

// NewSimpleModelExtractor is a factory that constructs SimpleModelExtractor plugin
// A developer who wishes to create her own implementation, will implement the BBRPlugin interface and
// use Registry and PluginsChain to register and execute the plugin (together with other plugins in a chain)
func NewDefaultMetaDataExtractor() BBRPlugin {
return &defaultMetaDataExtractor{
typedName: plugins.TypedName{
Type: DefaultPluginType,
Name: "simple-model-extractor",
},
requiresFullParsing: false,
}
}

func (s *defaultMetaDataExtractor) RequiresFullParsing() bool {
return s.requiresFullParsing
}

func (s *defaultMetaDataExtractor) TypedName() plugins.TypedName {
return s.typedName
}

// Execute extracts the "model" from the JSON request body and sets X-Gateway-Model-Name header.
// This implementation intentionally ignores metaDataKeys and does not mutate the body.
// It expects the request body to be a JSON object containing a "model" field.
// A nil for metaDataKeysToHeaders map SHOULD be specified by a caller for clarity
// The metaDataKeysToHeaders is explicitly ignored in this implementation
// This implementation is simply refactoring of the default BBR implementation to work with the pluggable framework
func (s *defaultMetaDataExtractor) Execute(requestBodyBytes []byte) (
headers map[string]string,
mutatedBodyBytes []byte,
err error) {

type RequestBody struct {
Model string `json:"model"`
}

h := make(map[string]string)

var requestBody RequestBody

if err := json.Unmarshal(requestBodyBytes, &requestBody); err != nil {
// return original body on decode failure
return nil, requestBodyBytes, err
}

if requestBody.Model == "" {
return nil, requestBodyBytes, fmt.Errorf("missing required field: model")
}

// ModelHeader is a constant defined in ./pkg/bbr/plugins/interfaces
h[ModelHeader] = requestBody.Model

// Body is not mutated in this implementation hence returning original requestBodyBytes. This is intentional.
return h, requestBodyBytes, nil
}

func (s *defaultMetaDataExtractor) String() string {
return fmt.Sprintf(("BBRPlugin{%v/requiresFullParsing=%v}"), s.TypedName(), s.requiresFullParsing)
}
```

### Implementation Phases

The pluggable framework will be implemented iteratively over several phases.

1. Introduce `BBRPlugin` `MetadataExtractor`, interface, registry, plugins chain, sample plugin implementation (`SimpleModelExtraction`) and its factory. Plugin configuration will be implemented via environment variables set in helm chart
1. Introduce a second plugin interface, `ModelSelector` and sample plugin implementation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: for the initial task (various ways to do body based routing), a ModelSelector interface would have been sufficient. Why do we need multiple interfaces and what's the delta between interface in (1) and in (2)?

1. Introduce shared struct (shared among the plugins of a plugins chain)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that does what?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ModelSelector is aimed at semantic model selection rather than simple metadata extraction and putting it on the headers.

1. Introduce an interface for guardrail plugin, introduce simple reference implementation, experiment with plugins chains on request and response messages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggets making this out of scope for the implementation focusing on model selection in BBR and not other features.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will exclude guardrails from the proposal for simplicity and crisper focus, even though I can imagine a use case when one first has to run a semantic model selection plugin and then a guardrail plugin (s), which is/are specifically tailored to the selected model.

The scope will only be on model selection, either via metadata extraction as-is, or more complicated logic like in issue #1812, or semantic model selection.

1. Refactor metrics as needed to work with the new pluggable framework
1. Implement configuration via manifests similar to those in EPP
1. Implement `PluginsDAG` to allow for more complex topological order and concurrency.
1. Continously learn lessons from this implementation and scheduling framework to improve the implementation
1. Aim at aligning and cross-polination with the [AI GW WG]("https://github.com/kubernetes-sigs/wg-ai-gateway").

## Open Questions

1. More elaborate shared memory architecture for the best performance
1. TBA

## Note

The proposed interfaces can slightly change from those implemented in the initial [PR 1981](https://github.com/kubernetes-sigs/gateway-api-inference-extension/pull/1981)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.