|
| 1 | +# Pluggable Body-Based Routing (BBR) Framework |
| 2 | + |
| 3 | +Author(s): @davidbreitgand @srampal |
| 4 | + |
| 5 | +## Proposal Status |
| 6 | + |
| 7 | +***Draft*** |
| 8 | + |
| 9 | +## Summary |
| 10 | + |
| 11 | +The Gateway API Inference Extension (v1.2.1) includes an initial implementation of Body-Based Routing (BBR). Currently, BBR provides a single capability: it extracts the model name from the request body and adds it to the `X-Gateway-Model-Name` header. This header is then used to route the request to the appropriate InferencePool and its associated Endpoint Picker Extension (EPP) instances. |
| 12 | + |
| 13 | +The current BBR implementation is limited and lacks extensibility. Similar to the [pluggability introduced in the scheduling subsystem](../0845-scheduler-architecture-proposal/README.md), BBR should support custom extensions without requiring modifications to the GIE code base. |
| 14 | + |
| 15 | +This proposal introduces a plugin architecture for BBR that allows developers to implement custom logic. Plugins could be organized into a chain or DAG for ordered and concurrent execution. |
| 16 | + |
| 17 | +See [this document](https://docs.google.com/document/d/1So9uRjZrLUHf7Rjv13xy_ip3_5HSI1cn1stS3EsXLWg/edit?tab=t.0#heading=h.55jwocr94axs) for additional context amd reference. |
| 18 | + |
| 19 | +## Goals |
| 20 | + |
| 21 | +The pluggable BBR Framework aims at addressing the following goals |
| 22 | + |
| 23 | +- Avoid monolithic architecture |
| 24 | +- Mimic pluggability and configurability of the scheduling subsystem without coupling between the two |
| 25 | +- Enable organizing plugins into a topology for ordered and concurrent execution |
| 26 | +- Avoid redundant recurrent body parsing across plugins in a topology for the sake of performance |
| 27 | +- Limit changes to the BBR feature to avoid any changes in the rest of the code base |
| 28 | +- Follow best practices and experience from the Scheduling subsystem |
| 29 | + pluggability effort. For example, extending the system to support the above |
| 30 | + should be through implementing well defined `Plugin` interfaces and registering |
| 31 | + them in the BBR subsystem; any configuration would be done in the |
| 32 | + same way (e.g., code and/or configuration file) |
| 33 | +- Reuse common code from EPP, such as `TypedName`, wherever make sense, but avoid reusing specialized code with non-BBR functionality to avoid abuse |
| 34 | +- Enable extensible collection and registration of metrics using lessons from the pluggable scheduling sub-system |
| 35 | +- Provide reference plugin implementations. |
| 36 | + |
| 37 | +## Non-Goals |
| 38 | + |
| 39 | +- Modify existing GIE abstractions |
| 40 | +- Fully align plugins, registries, and factories across BBR and EPP |
| 41 | +- Dynamically reconfigure plugins and plugin topologies at runtime |
| 42 | + |
| 43 | +## Proposal |
| 44 | + |
| 45 | +### Overview |
| 46 | + |
| 47 | +There is an embedded `BBRPlugin` interface building on the `Plugin` interface adopted from EPP. This interface should be implemented by any BBR plugin. Each pluigin is identified by its `TypedName` (adopted from EPP), where `TypedName().Type` gives the string representing the type of the plugin and `TypedName().Name()` returns the string representing the plugins implementation. BBR is refactored to implement the registered factory pattern. To that end, a `PluginRegistry` interface and its implementation are added to register `BBRPlugin` factories and concrete implementations created by the factories. |
| 48 | +In addition, a `PluginsChain` interface is defined to define an order of plugin executions. In the future, `PluginsChain` will be replaced by `PluginsDAG` to allow for more complex topological order and concurrency. |
| 49 | + |
| 50 | +`PluginsChain` only contains ordered `BBRPlugin` types registered in the `PluginRegistry`. `RequestPluginsChain` and `ResponsePluginsChain` are optionally configured for handling requests and responses respectively. If no configuration is provided, default `PluginsChain` instances will be configured automatically. |
| 51 | + |
| 52 | +Depending on a `BBRPlugin` functionality and implementation, the plugin might require full or selective body parsing. To save the parsing overhead, if there is at least one `BBRPlugin` in the `PluginsChain` that requires full body parsing, the parsing is performed only once into a shared official appropriate `openai-go` struct (either `openai.CompletionNewParams` or `openai.ChatCompletionNewParams` depending on the request endpoint). This struct is shared for read-only to all plugins in the chain. Each `BBRplugin` receives the shared struct by value. If a plugin needs to mutate the body, in the initial implementation, it MUST work on its own copy, and the a mutated body is returned separately by each plugiin. |
| 53 | + |
| 54 | +### Suggested Components |
| 55 | + |
| 56 | +The sketch of the proposed framework is shown in the figure below. |
| 57 | +<img src="./images/pluggable-framework-architecture-sketch.png" alt="Components of the proposed framework" width="1000" /> |
| 58 | + |
| 59 | +### Suggested BBR Pluggable Framework Interfaces |
| 60 | + |
| 61 | +```go |
| 62 | +//Base BBRPlugin interface |
| 63 | +type BBRPlugin interface { |
| 64 | + plugins.Plugin //imported from "sigs.k8s.io/gateway-api-inference-extension/pkg/epp/plugins" |
| 65 | + RequiresFullParsing() bool //specifies whether a full parsing of the body is required (to facilitate efficient memory sharing across plugins in a plugins chain) |
| 66 | +} |
| 67 | + |
| 68 | +// MetadataExtractor is an example of embedded interface for extending BBR functionality. |
| 69 | +// Plugins implementing this interface extract metadata from an inbound OpenAI message by keys specified. |
| 70 | +// Plugins implementing the MetadataExtractor interface should be read-only. |
| 71 | +// One specific implementation of this interface is Model extraction from the body and returning |
| 72 | +// it in the headers map by the custom key X-Gateway-Model-Name. |
| 73 | +// The current BBR implementation will be rewritten accordingly to implement this interface. |
| 74 | +// NOTE: the chain runner will safely merge the headers map from all the plugins in the plugins chain |
| 75 | +type MetadataExtractor interface { |
| 76 | + BBRPlugin |
| 77 | + Extract(ctx context.Context, |
| 78 | + requestBodyBytes []byte, |
| 79 | + metaDataKeys []string, |
| 80 | + sharedMemory interface{}) (headers map[string]string, err error) //valid shared memory MUST be either openai.ChatCompletionNewParams or openai.CompletionNewParams struct |
| 81 | +} |
| 82 | +//NOTE: in the initial PR, sharedMemory struct will be omitted for simplicity |
| 83 | + |
| 84 | +// ModelSelector is an example embedded interface for extending BBR fubctionality. |
| 85 | +// It defines an interface for selecting a model based on the body of the inbound request |
| 86 | +// model is the model selected |
| 87 | +// The model might be different from the original model requested. |
| 88 | +// In this case, the Model in the body is replaced by the chosen model name |
| 89 | +// The headers map contains are the HTTP headers added by Select(...) on the inbound request message before passing the message back to the gateway provider (e.g., Istio/Envoy) |
| 90 | +// X-Gateway-Model-Name is MUSTbe set always by any ModelSelector implementation |
| 91 | +// Plugins implementing the ModelSelector interface should always return mutatedBodyBytes even if the body is not mutated: in this case, the original bodyBytes MUST be returned |
| 92 | +//Typically, but not neceassarily an implementation of ModelSelector will require a full body parsing to get exposed to the full body (e.g., for semantic routing) |
| 93 | + |
| 94 | +type ModelSelector interface { |
| 95 | + BBRPlugin |
| 96 | + Select(ctx context.Context, requestBodyBytes []byte, sharedMemory interface{}) ( |
| 97 | + headers map[string]string, |
| 98 | + mutatedBodyBytes []byte, err error) |
| 99 | +} |
| 100 | + |
| 101 | +// placeholder for BBRPlugin constructors |
| 102 | +type PluginFactoryFunc func() bbrplugins.BBRPlugin //concrete constructors are assigned to this type |
| 103 | + |
| 104 | +// PluginRegistry defines operations for managing plugin factories and plugin instances |
| 105 | +type PluginRegistry interface { |
| 106 | + RegisterFactory(typeKey string, factory PluginFactoryFunc) error //constructors |
| 107 | + RegisterPlugin(plugin bbrplugins.BBRPlugin) error //registers a plugin instance (the instance MUST be created via the factory first) |
| 108 | + GetFactory(typeKey string) (PluginFactoryFunc, error) |
| 109 | + GetPlugin(typeKey string) (bbrplugins.BBRPlugin, error) |
| 110 | + GetFactories() map[string]PluginFactoryFunc |
| 111 | + GetPlugins() map[string]bbrplugins.BBRPlugin |
| 112 | + ListPlugins() []string |
| 113 | + ListFactories() []string |
| 114 | + CreatePlugin(typeKey string) (bbrplugins.BBRPlugin, error) |
| 115 | + ContainsFactory(typeKey string) bool |
| 116 | + ContainsPlugin(typeKey string) bool |
| 117 | + String() string //human readable string for logging |
| 118 | +} |
| 119 | + |
| 120 | +// PluginsChain is used to define a specific order of execution of the BBRPlugin instances stored in the registry |
| 121 | +// The BBRPlugin instances |
| 122 | +type PluginsChain interface { |
| 123 | + AddPlugin(typeKey string, registry PluginRegistry) error //to be added to the chain the plugin should be registered in the registry first |
| 124 | + AddPluginAtInd(typeKey string, i int, r PluginRegistry) error //only affects the instance of the plugin chain |
| 125 | + GetPlugin(index int, registry PluginRegistry) (bbrplugins.BBRPlugin, error) //retrieves i-th plugin as defined in the chain from the registry |
| 126 | + Length() int |
| 127 | + ParseChatCompletion(data []byte) (openai.ChatCompletionNewParams, error) //parses the bytes slice into an appropriate openai-go struct |
| 128 | + ParseCompletion(data []byte) (openai.CompletionNewParams, error) //likewise |
| 129 | + GetSharedMemory(which string) interface{} //returns an appropriate shared open-ai struct dependent on whether which |
| 130 | + //corresponds to Completion or ChatCompletion endpoint requested in the body |
| 131 | + String() string |
| 132 | +} |
| 133 | +//NOTE: for simplicity, in the initial PR, PluginsChain instance will be defined for request for request only |
| 134 | + |
| 135 | +// ------------------------- Supported plugin interfaces ----------------------------------------------------------- |
| 136 | +// The key in the SuppotedInterfaces map is the BBRPlugin type and the value is a slice of supported implementations |
| 137 | +// For this BBRPlugin type. |
| 138 | +// Edit SupportedInterfaces map when new interfaces and/or implementations are added/removed |
| 139 | +var SupportedInterfaces = map[string][]string{ |
| 140 | + "MetadataExtractor": {"simple-model-extractor"}, |
| 141 | + "ModelSelector": {"semantic-model-selector-uniroute", "semantic-model-selector-knn"}, //future |
| 142 | + "GuardRail": {"obscenity-blocker", "pid-disclosure-blocker"}, //future |
| 143 | +} |
| 144 | +``` |
| 145 | + |
| 146 | +### Implementation Phases |
| 147 | + |
| 148 | +The pluggable framework will be implemented iteratively over several phases. |
| 149 | + |
| 150 | +1. Introduce `BBRPlugin` `MetadataExtractor`, interface, registry, plugins chain, sample plugin implementation (`SimpleModelExtraction`) and its factory. Plugin configuration will be implemented via command line flags |
| 151 | +1. Introduce a second plugin interface, `ModelSelector` and sample plugin implementation |
| 152 | +1. Introduce shared struct (shared among the plugins of a plugins chain) |
| 153 | +1. Introduce an interface for guardrail plugin, introduce simple reference implementation, experiment with plugins chains on request and response messages |
| 154 | +1. Refactor metrics as needed to work with the new pluggable framework |
| 155 | +1. Implement configuration via manifests similar to those in EPP |
| 156 | +1. Implement `PluginsDAG` to allow for more complex topological order and concurrency. |
| 157 | +1. Continously learn lessons from this implementation and scheduling framework to improve the implementation |
| 158 | +1. Aim at aligning and cross-polination with the [AI GW WG]("https://github.com/kubernetes-sigs/wg-ai-gateway"). |
| 159 | + |
| 160 | +## Open Questions |
| 161 | + |
| 162 | +1. More elaborate shared memory architecture for the best performance |
| 163 | +1. TBA |
0 commit comments