Skip to content

Commit 377259c

Browse files
pluggable BBR framework proposal
fixes typos and minor reformatting issues shortens description: omits the sequence diagram minor Fix base BBRPlugin interface fixes PluginsChain interface: adds the Run method Adds reimplementation of BBR as a BBRPlugin example
1 parent 5a65e9f commit 377259c

File tree

2 files changed

+212
-0
lines changed

2 files changed

+212
-0
lines changed
Lines changed: 212 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,212 @@
1+
# Pluggable Body-Based Routing (BBR) Framework
2+
3+
Author(s): @davidbreitgand @srampal
4+
5+
## Proposal Status
6+
7+
***Draft***
8+
9+
## Summary
10+
11+
The Gateway API Inference Extension (v1.2.1) includes an initial implementation of Body-Based Routing (BBR). Currently, BBR provides a single capability: it extracts the model name from the request body and adds it to the `X-Gateway-Model-Name` header. This header is then used to route the request to the appropriate InferencePool and its associated Endpoint Picker Extension (EPP) instances.
12+
13+
The current BBR implementation is limited and lacks extensibility. Similar to the [pluggability introduced in the scheduling subsystem](../0845-scheduler-architecture-proposal/README.md), BBR should support custom extensions without requiring modifications to the GIE code base.
14+
15+
This proposal introduces a plugin architecture for BBR that allows developers to implement custom logic. Plugins could be organized into a chain or DAG for ordered and concurrent execution.
16+
17+
See [this document](https://docs.google.com/document/d/1So9uRjZrLUHf7Rjv13xy_ip3_5HSI1cn1stS3EsXLWg/edit?tab=t.0#heading=h.55jwocr94axs) for additional context amd reference.
18+
19+
## Goals
20+
21+
The pluggable BBR Framework aims at addressing the following goals
22+
23+
- Avoid monolithic architecture
24+
- Mimic pluggability and configurability of the scheduling subsystem without coupling between the two
25+
- Enable organizing plugins into a topology for ordered and concurrent execution
26+
- Avoid redundant recurrent body parsing across plugins in a topology for the sake of performance
27+
- Limit changes to the BBR feature to avoid any changes in the rest of the code base
28+
- Follow best practices and experience from the Scheduling subsystem
29+
pluggability effort. For example, extending the system to support the above
30+
should be through implementing well defined `Plugin` interfaces and registering
31+
them in the BBR subsystem; any configuration would be done in the
32+
same way (e.g., code and/or configuration file)
33+
- Reuse common code from EPP, such as `TypedName`, wherever make sense, but avoid reusing specialized code with non-BBR functionality to avoid abuse
34+
- Enable extensible collection and registration of metrics using lessons from the pluggable scheduling sub-system
35+
- Provide reference plugin implementations.
36+
37+
## Non-Goals
38+
39+
- Modify existing GIE abstractions
40+
- Fully align plugins, registries, and factories across BBR and EPP
41+
- Dynamically reconfigure plugins and plugin topologies at runtime
42+
43+
## Proposal
44+
45+
### Overview
46+
47+
There is an embedded `BBRPlugin` interface building on the `Plugin` interface adopted from EPP. This interface should be implemented by any BBR plugin. Each pluigin is identified by its `TypedName` (adopted from EPP), where `TypedName().Type` gives the string representing the type of the plugin and `TypedName().Name()` returns the string representing the plugins implementation. BBR is refactored to implement the registered factory pattern. To that end, a `PluginRegistry` interface and its implementation are added to register `BBRPlugin` factories and concrete implementations created by the factories.
48+
In addition, a `PluginsChain` interface is defined to define an order of plugin executions. In the future, `PluginsChain` will be replaced by `PluginsDAG` to allow for more complex topological order and concurrency.
49+
50+
`PluginsChain` only contains ordered `BBRPlugin` types registered in the `PluginRegistry`. `RequestPluginsChain` and `ResponsePluginsChain` are optionally configured for handling requests and responses respectively. If no configuration is provided, default `PluginsChain` instances will be configured automatically.
51+
52+
Depending on a `BBRPlugin` functionality and implementation, the plugin might require full or selective body parsing. To save the parsing overhead, if there is at least one `BBRPlugin` in the `PluginsChain` that requires full body parsing, the parsing is performed only once into a shared official appropriate `openai-go` struct (either `openai.CompletionNewParams` or `openai.ChatCompletionNewParams` depending on the request endpoint). This struct is shared for read-only to all plugins in the chain. Each `BBRplugin` receives the shared struct by value. If a plugin needs to mutate the body, in the initial implementation, it MUST work on its own copy, and the a mutated body is returned separately by each plugiin.
53+
54+
### Suggested Components
55+
56+
The sketch of the proposed framework is shown in the figure below.
57+
<img src="./images/pluggable-framework-architecture-sketch.png" alt="Components of the proposed framework" width="1000" />
58+
59+
### Suggested BBR Pluggable Framework Interfaces
60+
61+
```go
62+
// ------------------------------------ Defaults ------------------------------------------
63+
const DefaultPluginType = "MetadataExtractor"
64+
const DefaultPluginImplementation = "simple-model-selector"
65+
66+
// BBRPlugin defines the interface for plugins in the BBR framework should never mutate the body directly.
67+
type BBRPlugin interface {
68+
plugins.Plugin
69+
70+
// RequiresFullParsing indicates whether full body parsing is required
71+
// to facilitate efficient memory sharing across plugins in a chain.
72+
// RequiresFullParsing() bool
73+
74+
// Execute runs the plugin logic on the request body and returns headers,
75+
// a potentially mutated body, and an error if any.
76+
Execute(
77+
requestBodyBytes []byte,
78+
metaDataKeys []string,
79+
) (
80+
headers map[string]string,
81+
mutatedBodyBytes []byte,
82+
err error,
83+
)
84+
}
85+
86+
87+
// placeholder for BBRPlugin constructors
88+
type PluginFactoryFunc func() bbrplugins.BBRPlugin //concrete constructors are assigned to this type
89+
90+
// PluginRegistry defines operations for managing plugin factories and plugin instances
91+
type PluginRegistry interface {
92+
RegisterFactory(typeKey string, factory PluginFactoryFunc) error //constructors
93+
RegisterPlugin(plugin bbrplugins.BBRPlugin) error //registers a plugin instance (the instance MUST be created via the factory first)
94+
GetFactory(typeKey string) (PluginFactoryFunc, error)
95+
GetPlugin(typeKey string) (bbrplugins.BBRPlugin, error)
96+
GetFactories() map[string]PluginFactoryFunc
97+
GetPlugins() map[string]bbrplugins.BBRPlugin
98+
ListPlugins() []string
99+
ListFactories() []string
100+
CreatePlugin(typeKey string) (bbrplugins.BBRPlugin, error)
101+
ContainsFactory(typeKey string) bool
102+
ContainsPlugin(typeKey string) bool
103+
String() string //human readable string for logging
104+
}
105+
106+
// PluginsChain is used to define a specific order of execution of the BBRPlugin instances stored in the registry
107+
// The BBRPlugin instances
108+
type PluginsChain interface {
109+
AddPlugin(typeKey string, registry PluginRegistry) error //to be added to the chain the plugin should be registered in the registry first
110+
AddPluginAtInd(typeKey string, i int, r PluginRegistry) error //only affects the instance of the plugin chain
111+
GetPlugin(index int, registry PluginRegistry) (bbrplugins.BBRPlugin, error) //retrieves i-th plugin as defined in the chain from the registry
112+
Length() int
113+
ParseChatCompletion(data []byte) (openai.ChatCompletionNewParams, error) //parses the bytes slice into an appropriate openai-go struct
114+
ParseCompletion(data []byte) (openai.CompletionNewParams, error) //likewise
115+
GetSharedMemory(which string) interface{} //returns an appropriate shared open-ai struct dependent on whether which
116+
//corresponds to Completion or ChatCompletion endpoint requested in the body
117+
Run(bodyBytes []byte, metaDataKeys []string, registry PluginRegistry) ([]byte, map[string]string, error) //return potentially mutated body and all headers map safely merged
118+
String() string
119+
}
120+
//NOTE: for simplicity, in the initial PR, PluginsChain instance will be defined for request for request only
121+
```
122+
123+
### Current BBR reimplementation as BBRPlugin
124+
125+
```go
126+
/ ------------------------------------ SAMPLE PLUGIN IMPLEMENTATION ----------------------------------------------
127+
128+
type simpleModelExtractor struct { //implements the MetadataExtractor interface
129+
typedName plugins.TypedName
130+
requiresFullParsing bool
131+
}
132+
133+
// NewSimpleModelExtractor is a factory that constructs SimpleModelExtractor plugin
134+
// A developer who wishes to create her own implementation, will implement the BBRPlugin interface and
135+
// use Registry and PluginsChain to register and execute the plugin (together with other plugins in a chain)
136+
func NewSimpleModelExtractor() BBRPlugin {
137+
return &simpleModelExtractor{
138+
typedName: plugins.TypedName{
139+
Type: "MetadataExtractor",
140+
Name: "simple-model-extractor",
141+
},
142+
requiresFullParsing: false,
143+
}
144+
}
145+
146+
func (s *simpleModelExtractor) RequiresFullParsing() bool {
147+
return s.requiresFullParsing
148+
}
149+
150+
func (s *simpleModelExtractor) TypedName() plugins.TypedName {
151+
return s.typedName
152+
}
153+
154+
// Execute extracts the "model" from the JSON request body and sets X-Gateway-Model-Name header.
155+
// This implementation intentionally ignores metaDataKeys and does not mutate the body.
156+
// It expects the request body to be a JSON object containing a "model" field.
157+
// Thus, this is simply refactoring of the default BBR implementation to work with the pluggable framework
158+
func (s *simpleModelExtractor) Execute(
159+
requestBodyBytes []byte,
160+
metaDataKeys []string, //intentionally ignored in this plugin implementation
161+
) (
162+
headers map[string]string,
163+
mutatedBodyBytes []byte,
164+
err error) {
165+
166+
type RequestBody struct {
167+
Model string `json:"model"`
168+
}
169+
170+
h := make(map[string]string)
171+
172+
var requestBody RequestBody
173+
174+
if err := json.Unmarshal(requestBodyBytes, &requestBody); err != nil {
175+
// return original body on decode failure
176+
return nil, requestBodyBytes, err
177+
}
178+
179+
if requestBody.Model == "" {
180+
return nil, requestBodyBytes, fmt.Errorf("missing required field: model")
181+
}
182+
183+
// ModelHeader is a constant defined in in interfaces
184+
h[ModelHeader] = requestBody.Model
185+
186+
// Body is not mutated in this implementation. This is intentional.
187+
return h, requestBodyBytes, nil
188+
}
189+
190+
func (s *simpleModelExtractor) String() string {
191+
return fmt.Sprintf(("BBRPlugin{%v/requiresFullParsing=%v}"), s.TypedName(), s.requiresFullParsing)
192+
}
193+
```
194+
195+
### Implementation Phases
196+
197+
The pluggable framework will be implemented iteratively over several phases.
198+
199+
1. Introduce `BBRPlugin` `MetadataExtractor`, interface, registry, plugins chain, sample plugin implementation (`SimpleModelExtraction`) and its factory. Plugin configuration will be implemented via environment variables set in helm chart
200+
1. Introduce a second plugin interface, `ModelSelector` and sample plugin implementation
201+
1. Introduce shared struct (shared among the plugins of a plugins chain)
202+
1. Introduce an interface for guardrail plugin, introduce simple reference implementation, experiment with plugins chains on request and response messages
203+
1. Refactor metrics as needed to work with the new pluggable framework
204+
1. Implement configuration via manifests similar to those in EPP
205+
1. Implement `PluginsDAG` to allow for more complex topological order and concurrency.
206+
1. Continously learn lessons from this implementation and scheduling framework to improve the implementation
207+
1. Aim at aligning and cross-polination with the [AI GW WG]("https://github.com/kubernetes-sigs/wg-ai-gateway").
208+
209+
## Open Questions
210+
211+
1. More elaborate shared memory architecture for the best performance
212+
1. TBA
31.1 KB
Loading

0 commit comments

Comments
 (0)