|
| 1 | +# Pluggable Body-Based Routing (BBR) Framework |
| 2 | + |
| 3 | +Author(s): @davidbreitgand @srampal |
| 4 | + |
| 5 | +## Proposal Status |
| 6 | + |
| 7 | +***Draft*** |
| 8 | + |
| 9 | +## Summary |
| 10 | + |
| 11 | +The Gateway API Inference Extension (v1.2.1) includes an initial implementation of Body-Based Routing (BBR). Currently, BBR provides a single capability: it extracts the model name from the request body and adds it to the `X-Gateway-Model-Name` header. This header is then used to route the request to the appropriate InferencePool and its associated Endpoint Picker Extension (EPP) instances. |
| 12 | + |
| 13 | +The current BBR implementation is limited and lacks extensibility. Similar to the [pluggability introduced in the scheduling subsystem](../0845-scheduler-architecture-proposal/README.md), BBR should support custom extensions without requiring modifications to the GIE code base. |
| 14 | + |
| 15 | +This proposal introduces a plugin architecture for BBR that allows developers to implement custom logic. Plugins could be organized into a chain or DAG for ordered and concurrent execution. |
| 16 | + |
| 17 | +See [this document](https://docs.google.com/document/d/1So9uRjZrLUHf7Rjv13xy_ip3_5HSI1cn1stS3EsXLWg/edit?tab=t.0#heading=h.55jwocr94axs) for additional context amd reference. |
| 18 | + |
| 19 | +## Goals |
| 20 | + |
| 21 | +The pluggable BBR Framework aims at addressing the following goals |
| 22 | + |
| 23 | +- Avoid monolithic architecture |
| 24 | +- Mimic pluggability and configurability of the scheduling subsystem without coupling between the two |
| 25 | +- Enable organizing plugins into a topology for ordered and concurrent execution |
| 26 | +- Avoid redundant recurrent body parsing across plugins in a topology for the sake of performance |
| 27 | +- Limit changes to the BBR feature to avoid any changes in the rest of the code base |
| 28 | +- Follow best practices and experience from the Scheduling subsystem |
| 29 | + pluggability effort. For example, extending the system to support the above |
| 30 | + should be through implementing well defined `Plugin` interfaces and registering |
| 31 | + them in the BBR subsystem; any configuration would be done in the |
| 32 | + same way (e.g., code and/or configuration file) |
| 33 | +- Reuse common code from EPP, such as `TypedName`, wherever make sense, but avoid reusing specialized code with non-BBR functionality to avoid abuse |
| 34 | +- Enable extensible collection and registration of metrics using lessons from the pluggable scheduling sub-system |
| 35 | +- Provide reference plugin implementations. |
| 36 | + |
| 37 | +## Non-Goals |
| 38 | + |
| 39 | +- Modify existing GIE abstractions |
| 40 | +- Fully align plugins, registries, and factories across BBR and EPP |
| 41 | +- Dynamically reconfigure plugins and plugin topologies at runtime |
| 42 | + |
| 43 | +## Proposal |
| 44 | + |
| 45 | +### Overview |
| 46 | + |
| 47 | +There is an embedded `BBRPlugin` interface building on the `Plugin` interface adopted from EPP. This interface should be implemented by any BBR plugin. Each pluigin is identified by its `TypedName` (adopted from EPP), where `TypedName().Type` gives the string representing the type of the plugin and `TypedName().Name()` returns the string representing the plugins implementation. BBR is refactored to implement the registered factory pattern. To that end, a `PluginRegistry` interface and its implementation are added to register `BBRPlugin` factories and concrete implementations created by the factories. |
| 48 | +In addition, a `PluginsChain` interface is defined to define an order of plugin executions. In the future, `PluginsChain` will be replaced by `PluginsDAG` to allow for more complex topological order and concurrency. |
| 49 | + |
| 50 | +`PluginsChain` only contains ordered `BBRPlugin` types registered in the `PluginRegistry`. `RequestPluginsChain` and `ResponsePluginsChain` are optionally configured for handling requests and responses respectively. If no configuration is provided, default `PluginsChain` instances will be configured automatically. |
| 51 | + |
| 52 | +Depending on a `BBRPlugin` functionality and implementation, the plugin might require full or selective body parsing. To save the parsing overhead, if there is at least one `BBRPlugin` in the `PluginsChain` that requires full body parsing, the parsing is performed only once into a shared official appropriate `openai-go` struct (either `openai.CompletionNewParams` or `openai.ChatCompletionNewParams` depending on the request endpoint). This struct is shared for read-only to all plugins in the chain. Each `BBRplugin` receives the shared struct by value. If a plugin needs to mutate the body, in the initial implementation, it MUST work on its own copy, and the a mutated body is returned separately by each plugiin. |
| 53 | + |
| 54 | +### Suggested Components |
| 55 | + |
| 56 | +The sketch of the proposed framework is shown in the figure below. |
| 57 | +<img src="./images/pluggable-framework-architecture-sketch.png" alt="Components of the proposed framework" width="1000" /> |
| 58 | + |
| 59 | +### Suggested BBR Pluggable Framework Interfaces |
| 60 | + |
| 61 | +```go |
| 62 | +// ------------------------------------ Defaults ------------------------------------------ |
| 63 | +const DefaultPluginType = "MetadataExtractor" |
| 64 | +const DefaultPluginImplementation = "simple-model-selector" |
| 65 | + |
| 66 | +// BBRPlugin defines the interface for plugins in the BBR framework should never mutate the body directly. |
| 67 | +type BBRPlugin interface { |
| 68 | + plugins.Plugin |
| 69 | + |
| 70 | + // RequiresFullParsing indicates whether full body parsing is required |
| 71 | + // to facilitate efficient memory sharing across plugins in a chain. |
| 72 | + // RequiresFullParsing() bool |
| 73 | + |
| 74 | + // Execute runs the plugin logic on the request body and returns headers, |
| 75 | + // a potentially mutated body, and an error if any. |
| 76 | + Execute( |
| 77 | + requestBodyBytes []byte, |
| 78 | + metaDataKeys []string, |
| 79 | + ) ( |
| 80 | + headers map[string]string, |
| 81 | + mutatedBodyBytes []byte, |
| 82 | + err error, |
| 83 | + ) |
| 84 | +} |
| 85 | + |
| 86 | + |
| 87 | +// placeholder for BBRPlugin constructors |
| 88 | +type PluginFactoryFunc func() bbrplugins.BBRPlugin //concrete constructors are assigned to this type |
| 89 | + |
| 90 | +// PluginRegistry defines operations for managing plugin factories and plugin instances |
| 91 | +type PluginRegistry interface { |
| 92 | + RegisterFactory(typeKey string, factory PluginFactoryFunc) error //constructors |
| 93 | + RegisterPlugin(plugin bbrplugins.BBRPlugin) error //registers a plugin instance (the instance MUST be created via the factory first) |
| 94 | + GetFactory(typeKey string) (PluginFactoryFunc, error) |
| 95 | + GetPlugin(typeKey string) (bbrplugins.BBRPlugin, error) |
| 96 | + GetFactories() map[string]PluginFactoryFunc |
| 97 | + GetPlugins() map[string]bbrplugins.BBRPlugin |
| 98 | + ListPlugins() []string |
| 99 | + ListFactories() []string |
| 100 | + CreatePlugin(typeKey string) (bbrplugins.BBRPlugin, error) |
| 101 | + ContainsFactory(typeKey string) bool |
| 102 | + ContainsPlugin(typeKey string) bool |
| 103 | + String() string //human readable string for logging |
| 104 | +} |
| 105 | + |
| 106 | +// PluginsChain is used to define a specific order of execution of the BBRPlugin instances stored in the registry |
| 107 | +// The BBRPlugin instances |
| 108 | +type PluginsChain interface { |
| 109 | + AddPlugin(typeKey string, registry PluginRegistry) error //to be added to the chain the plugin should be registered in the registry first |
| 110 | + AddPluginAtInd(typeKey string, i int, r PluginRegistry) error //only affects the instance of the plugin chain |
| 111 | + GetPlugin(index int, registry PluginRegistry) (bbrplugins.BBRPlugin, error) //retrieves i-th plugin as defined in the chain from the registry |
| 112 | + Length() int |
| 113 | + ParseChatCompletion(data []byte) (openai.ChatCompletionNewParams, error) //parses the bytes slice into an appropriate openai-go struct |
| 114 | + ParseCompletion(data []byte) (openai.CompletionNewParams, error) //likewise |
| 115 | + GetSharedMemory(which string) interface{} //returns an appropriate shared open-ai struct dependent on whether which |
| 116 | + //corresponds to Completion or ChatCompletion endpoint requested in the body |
| 117 | + Run(bodyBytes []byte, metaDataKeys []string, registry PluginRegistry) ([]byte, map[string]string, error) //return potentially mutated body and all headers map safely merged |
| 118 | + String() string |
| 119 | +} |
| 120 | +//NOTE: for simplicity, in the initial PR, PluginsChain instance will be defined for request for request only |
| 121 | +``` |
| 122 | + |
| 123 | +### Current BBR reimplementation as BBRPlugin |
| 124 | + |
| 125 | +```go |
| 126 | +/ ------------------------------------ SAMPLE PLUGIN IMPLEMENTATION ---------------------------------------------- |
| 127 | + |
| 128 | +type simpleModelExtractor struct { //implements the MetadataExtractor interface |
| 129 | + typedName plugins.TypedName |
| 130 | + requiresFullParsing bool |
| 131 | +} |
| 132 | + |
| 133 | +// NewSimpleModelExtractor is a factory that constructs SimpleModelExtractor plugin |
| 134 | +// A developer who wishes to create her own implementation, will implement the BBRPlugin interface and |
| 135 | +// use Registry and PluginsChain to register and execute the plugin (together with other plugins in a chain) |
| 136 | +func NewSimpleModelExtractor() BBRPlugin { |
| 137 | + return &simpleModelExtractor{ |
| 138 | + typedName: plugins.TypedName{ |
| 139 | + Type: "MetadataExtractor", |
| 140 | + Name: "simple-model-extractor", |
| 141 | + }, |
| 142 | + requiresFullParsing: false, |
| 143 | + } |
| 144 | +} |
| 145 | + |
| 146 | +func (s *simpleModelExtractor) RequiresFullParsing() bool { |
| 147 | + return s.requiresFullParsing |
| 148 | +} |
| 149 | + |
| 150 | +func (s *simpleModelExtractor) TypedName() plugins.TypedName { |
| 151 | + return s.typedName |
| 152 | +} |
| 153 | + |
| 154 | +// Execute extracts the "model" from the JSON request body and sets X-Gateway-Model-Name header. |
| 155 | +// This implementation intentionally ignores metaDataKeys and does not mutate the body. |
| 156 | +// It expects the request body to be a JSON object containing a "model" field. |
| 157 | +// Thus, this is simply refactoring of the default BBR implementation to work with the pluggable framework |
| 158 | +func (s *simpleModelExtractor) Execute( |
| 159 | + requestBodyBytes []byte, |
| 160 | + metaDataKeys []string, //intentionally ignored in this plugin implementation |
| 161 | +) ( |
| 162 | + headers map[string]string, |
| 163 | + mutatedBodyBytes []byte, |
| 164 | + err error) { |
| 165 | + |
| 166 | + type RequestBody struct { |
| 167 | + Model string `json:"model"` |
| 168 | + } |
| 169 | + |
| 170 | + h := make(map[string]string) |
| 171 | + |
| 172 | + var requestBody RequestBody |
| 173 | + |
| 174 | + if err := json.Unmarshal(requestBodyBytes, &requestBody); err != nil { |
| 175 | + // return original body on decode failure |
| 176 | + return nil, requestBodyBytes, err |
| 177 | + } |
| 178 | + |
| 179 | + if requestBody.Model == "" { |
| 180 | + return nil, requestBodyBytes, fmt.Errorf("missing required field: model") |
| 181 | + } |
| 182 | + |
| 183 | + // ModelHeader is a constant defined in in interfaces |
| 184 | + h[ModelHeader] = requestBody.Model |
| 185 | + |
| 186 | + // Body is not mutated in this implementation. This is intentional. |
| 187 | + return h, requestBodyBytes, nil |
| 188 | +} |
| 189 | + |
| 190 | +func (s *simpleModelExtractor) String() string { |
| 191 | + return fmt.Sprintf(("BBRPlugin{%v/requiresFullParsing=%v}"), s.TypedName(), s.requiresFullParsing) |
| 192 | +} |
| 193 | +``` |
| 194 | + |
| 195 | +### Implementation Phases |
| 196 | + |
| 197 | +The pluggable framework will be implemented iteratively over several phases. |
| 198 | + |
| 199 | +1. Introduce `BBRPlugin` `MetadataExtractor`, interface, registry, plugins chain, sample plugin implementation (`SimpleModelExtraction`) and its factory. Plugin configuration will be implemented via environment variables set in helm chart |
| 200 | +1. Introduce a second plugin interface, `ModelSelector` and sample plugin implementation |
| 201 | +1. Introduce shared struct (shared among the plugins of a plugins chain) |
| 202 | +1. Introduce an interface for guardrail plugin, introduce simple reference implementation, experiment with plugins chains on request and response messages |
| 203 | +1. Refactor metrics as needed to work with the new pluggable framework |
| 204 | +1. Implement configuration via manifests similar to those in EPP |
| 205 | +1. Implement `PluginsDAG` to allow for more complex topological order and concurrency. |
| 206 | +1. Continously learn lessons from this implementation and scheduling framework to improve the implementation |
| 207 | +1. Aim at aligning and cross-polination with the [AI GW WG]("https://github.com/kubernetes-sigs/wg-ai-gateway"). |
| 208 | + |
| 209 | +## Open Questions |
| 210 | + |
| 211 | +1. More elaborate shared memory architecture for the best performance |
| 212 | +1. TBA |
0 commit comments