Skip to content

Commit af49267

Browse files
pluggable BBR framework proposal
fixes typos and minor reformatting issues shortens description: omits the sequence diagram minor
1 parent 5a65e9f commit af49267

File tree

2 files changed

+163
-0
lines changed

2 files changed

+163
-0
lines changed
Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
# Pluggable Body-Based Routing (BBR) Framework
2+
3+
Author(s): @davidbreitgand @srampal
4+
5+
## Proposal Status
6+
7+
***Draft***
8+
9+
## Summary
10+
11+
The Gateway API Inference Extension (v1.2.1) includes an initial implementation of Body-Based Routing (BBR). Currently, BBR provides a single capability: it extracts the model name from the request body and adds it to the `X-Gateway-Model-Name` header. This header is then used to route the request to the appropriate InferencePool and its associated Endpoint Picker Extension (EPP) instances.
12+
13+
The current BBR implementation is limited and lacks extensibility. Similar to the [pluggability introduced in the scheduling subsystem](../0845-scheduler-architecture-proposal/README.md), BBR should support custom extensions without requiring modifications to the GIE code base.
14+
15+
This proposal introduces a plugin architecture for BBR that allows developers to implement custom logic. Plugins could be organized into a chain or DAG for ordered and concurrent execution.
16+
17+
See [this document](https://docs.google.com/document/d/1So9uRjZrLUHf7Rjv13xy_ip3_5HSI1cn1stS3EsXLWg/edit?tab=t.0#heading=h.55jwocr94axs) for additional context amd reference.
18+
19+
## Goals
20+
21+
The pluggable BBR Framework aims at addressing the following goals
22+
23+
- Avoid monolithic architecture
24+
- Mimic pluggability and configurability of the scheduling subsystem without coupling between the two
25+
- Enable organizing plugins into a topology for ordered and concurrent execution
26+
- Avoid redundant recurrent body parsing across plugins in a topology for the sake of performance
27+
- Limit changes to the BBR feature to avoid any changes in the rest of the code base
28+
- Follow best practices and experience from the Scheduling subsystem
29+
pluggability effort. For example, extending the system to support the above
30+
should be through implementing well defined `Plugin` interfaces and registering
31+
them in the BBR subsystem; any configuration would be done in the
32+
same way (e.g., code and/or configuration file)
33+
- Reuse common code from EPP, such as `TypedName`, wherever make sense, but avoid reusing specialized code with non-BBR functionality to avoid abuse
34+
- Enable extensible collection and registration of metrics using lessons from the pluggable scheduling sub-system
35+
- Provide reference plugin implementations.
36+
37+
## Non-Goals
38+
39+
- Modify existing GIE abstractions
40+
- Fully align plugins, registries, and factories across BBR and EPP
41+
- Dynamically reconfigure plugins and plugin topologies at runtime
42+
43+
## Proposal
44+
45+
### Overview
46+
47+
There is an embedded `BBRPlugin` interface building on the `Plugin` interface adopted from EPP. This interface should be implemented by any BBR plugin. Each pluigin is identified by its `TypedName` (adopted from EPP), where `TypedName().Type` gives the string representing the type of the plugin and `TypedName().Name()` returns the string representing the plugins implementation. BBR is refactored to implement the registered factory pattern. To that end, a `PluginRegistry` interface and its implementation are added to register `BBRPlugin` factories and concrete implementations created by the factories.
48+
In addition, a `PluginsChain` interface is defined to define an order of plugin executions. In the future, `PluginsChain` will be replaced by `PluginsDAG` to allow for more complex topological order and concurrency.
49+
50+
`PluginsChain` only contains ordered `BBRPlugin` types registered in the `PluginRegistry`. `RequestPluginsChain` and `ResponsePluginsChain` are optionally configured for handling requests and responses respectively. If no configuration is provided, default `PluginsChain` instances will be configured automatically.
51+
52+
Depending on a `BBRPlugin` functionality and implementation, the plugin might require full or selective body parsing. To save the parsing overhead, if there is at least one `BBRPlugin` in the `PluginsChain` that requires full body parsing, the parsing is performed only once into a shared official appropriate `openai-go` struct (either `openai.CompletionNewParams` or `openai.ChatCompletionNewParams` depending on the request endpoint). This struct is shared for read-only to all plugins in the chain. Each `BBRplugin` receives the shared struct by value. If a plugin needs to mutate the body, in the initial implementation, it MUST work on its own copy, and the a mutated body is returned separately by each plugiin.
53+
54+
### Suggested Components
55+
56+
The sketch of the proposed framework is shown in the figure below.
57+
<img src="./images/pluggable-framework-architecture-sketch.png" alt="Components of the proposed framework" width="1000" />
58+
59+
### Suggested BBR Pluggable Framework Interfaces
60+
61+
```go
62+
//Base BBRPlugin interface
63+
type BBRPlugin interface {
64+
plugins.Plugin //imported from "sigs.k8s.io/gateway-api-inference-extension/pkg/epp/plugins"
65+
RequiresFullParsing() bool //specifies whether a full parsing of the body is required (to facilitate efficient memory sharing across plugins in a plugins chain)
66+
}
67+
68+
// MetadataExtractor is an example of embedded interface for extending BBR functionality.
69+
// Plugins implementing this interface extract metadata from an inbound OpenAI message by keys specified.
70+
// Plugins implementing the MetadataExtractor interface should be read-only.
71+
// One specific implementation of this interface is Model extraction from the body and returning
72+
// it in the headers map by the custom key X-Gateway-Model-Name.
73+
// The current BBR implementation will be rewritten accordingly to implement this interface.
74+
// NOTE: the chain runner will safely merge the headers map from all the plugins in the plugins chain
75+
type MetadataExtractor interface {
76+
BBRPlugin
77+
Extract(ctx context.Context,
78+
requestBodyBytes []byte,
79+
metaDataKeys []string,
80+
sharedMemory interface{}) (headers map[string]string, err error) //valid shared memory MUST be either openai.ChatCompletionNewParams or openai.CompletionNewParams struct
81+
}
82+
//NOTE: in the initial PR, sharedMemory struct will be omitted for simplicity
83+
84+
// ModelSelector is an example embedded interface for extending BBR fubctionality.
85+
// It defines an interface for selecting a model based on the body of the inbound request
86+
// model is the model selected
87+
// The model might be different from the original model requested.
88+
// In this case, the Model in the body is replaced by the chosen model name
89+
// The headers map contains are the HTTP headers added by Select(...) on the inbound request message before passing the message back to the gateway provider (e.g., Istio/Envoy)
90+
// X-Gateway-Model-Name is MUSTbe set always by any ModelSelector implementation
91+
// Plugins implementing the ModelSelector interface should always return mutatedBodyBytes even if the body is not mutated: in this case, the original bodyBytes MUST be returned
92+
//Typically, but not neceassarily an implementation of ModelSelector will require a full body parsing to get exposed to the full body (e.g., for semantic routing)
93+
94+
type ModelSelector interface {
95+
BBRPlugin
96+
Select(ctx context.Context, requestBodyBytes []byte, sharedMemory interface{}) (
97+
headers map[string]string,
98+
mutatedBodyBytes []byte, err error)
99+
}
100+
101+
// placeholder for BBRPlugin constructors
102+
type PluginFactoryFunc func() bbrplugins.BBRPlugin //concrete constructors are assigned to this type
103+
104+
// PluginRegistry defines operations for managing plugin factories and plugin instances
105+
type PluginRegistry interface {
106+
RegisterFactory(typeKey string, factory PluginFactoryFunc) error //constructors
107+
RegisterPlugin(plugin bbrplugins.BBRPlugin) error //registers a plugin instance (the instance MUST be created via the factory first)
108+
GetFactory(typeKey string) (PluginFactoryFunc, error)
109+
GetPlugin(typeKey string) (bbrplugins.BBRPlugin, error)
110+
GetFactories() map[string]PluginFactoryFunc
111+
GetPlugins() map[string]bbrplugins.BBRPlugin
112+
ListPlugins() []string
113+
ListFactories() []string
114+
CreatePlugin(typeKey string) (bbrplugins.BBRPlugin, error)
115+
ContainsFactory(typeKey string) bool
116+
ContainsPlugin(typeKey string) bool
117+
String() string //human readable string for logging
118+
}
119+
120+
// PluginsChain is used to define a specific order of execution of the BBRPlugin instances stored in the registry
121+
// The BBRPlugin instances
122+
type PluginsChain interface {
123+
AddPlugin(typeKey string, registry PluginRegistry) error //to be added to the chain the plugin should be registered in the registry first
124+
AddPluginAtInd(typeKey string, i int, r PluginRegistry) error //only affects the instance of the plugin chain
125+
GetPlugin(index int, registry PluginRegistry) (bbrplugins.BBRPlugin, error) //retrieves i-th plugin as defined in the chain from the registry
126+
Length() int
127+
ParseChatCompletion(data []byte) (openai.ChatCompletionNewParams, error) //parses the bytes slice into an appropriate openai-go struct
128+
ParseCompletion(data []byte) (openai.CompletionNewParams, error) //likewise
129+
GetSharedMemory(which string) interface{} //returns an appropriate shared open-ai struct dependent on whether which
130+
//corresponds to Completion or ChatCompletion endpoint requested in the body
131+
String() string
132+
}
133+
//NOTE: for simplicity, in the initial PR, PluginsChain instance will be defined for request for request only
134+
135+
// ------------------------- Supported plugin interfaces -----------------------------------------------------------
136+
// The key in the SuppotedInterfaces map is the BBRPlugin type and the value is a slice of supported implementations
137+
// For this BBRPlugin type.
138+
// Edit SupportedInterfaces map when new interfaces and/or implementations are added/removed
139+
var SupportedInterfaces = map[string][]string{
140+
"MetadataExtractor": {"simple-model-extractor"},
141+
"ModelSelector": {"semantic-model-selector-uniroute", "semantic-model-selector-knn"}, //future
142+
"GuardRail": {"obscenity-blocker", "pid-disclosure-blocker"}, //future
143+
}
144+
```
145+
146+
### Implementation Phases
147+
148+
The pluggable framework will be implemented iteratively over several phases.
149+
150+
1. Introduce `BBRPlugin` `MetadataExtractor`, interface, registry, plugins chain, sample plugin implementation (`SimpleModelExtraction`) and its factory. Plugin configuration will be implemented via command line flags
151+
1. Introduce a second plugin interface, `ModelSelector` and sample plugin implementation
152+
1. Introduce shared struct (shared among the plugins of a plugins chain)
153+
1. Introduce an interface for guardrail plugin, introduce simple reference implementation, experiment with plugins chains on request and response messages
154+
1. Refactor metrics as needed to work with the new pluggable framework
155+
1. Implement configuration via manifests similar to those in EPP
156+
1. Implement `PluginsDAG` to allow for more complex topological order and concurrency.
157+
1. Continously learn lessons from this implementation and scheduling framework to improve the implementation
158+
1. Aim at aligning and cross-polination with the [AI GW WG]("https://github.com/kubernetes-sigs/wg-ai-gateway").
159+
160+
## Open Questions
161+
162+
1. More elaborate shared memory architecture for the best performance
163+
1. TBA
31.1 KB
Loading

0 commit comments

Comments
 (0)