-
Notifications
You must be signed in to change notification settings - Fork 207
BBR pluggable framework proposal #1964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,251 @@ | ||||||
| # Pluggable Body-Based Routing (BBR) Framework | ||||||
|
|
||||||
| Author(s): @davidbreitgand @srampal | ||||||
|
|
||||||
| ## Proposal Status | ||||||
|
|
||||||
| ***Draft*** | ||||||
|
|
||||||
| ## Summary | ||||||
|
|
||||||
| The Gateway API Inference Extension (v1.2.1) includes an initial implementation of Body-Based Routing (BBR). Currently, BBR provides a single capability: it extracts the model name from the request body and adds it to the `X-Gateway-Model-Name` header. This header is then used to route the request to the appropriate InferencePool and its associated Endpoint Picker Extension (EPP) instances. | ||||||
|
|
||||||
| The current BBR implementation is limited and lacks extensibility. Similar to the [pluggability introduced in the scheduling subsystem](../0845-scheduler-architecture-proposal/README.md), BBR should support custom extensions without requiring modifications to the GIE code base. | ||||||
|
|
||||||
| This proposal introduces a plugin architecture for BBR that allows developers to implement custom logic. Plugins could be organized into a chain or DAG for ordered and concurrent execution. | ||||||
|
|
||||||
| See [this document](https://docs.google.com/document/d/1So9uRjZrLUHf7Rjv13xy_ip3_5HSI1cn1stS3EsXLWg/edit?tab=t.0#heading=h.55jwocr94axs) for additional context amd reference. | ||||||
|
|
||||||
| ## Goals | ||||||
|
|
||||||
| The pluggable BBR Framework aims at addressing the following goals | ||||||
|
|
||||||
| - Avoid monolithic architecture | ||||||
| - Mimic pluggability and configurability of the scheduling subsystem without coupling between the two | ||||||
| - Enable organizing plugins into a topology for ordered and concurrent execution | ||||||
| - Avoid redundant recurrent body parsing across plugins in a topology for the sake of performance | ||||||
| - Limit changes to the BBR feature to avoid any changes in the rest of the code base | ||||||
| - Follow best practices and experience from the Scheduling subsystem | ||||||
| pluggability effort. For example, extending the system to support the above | ||||||
| should be through implementing well defined `Plugin` interfaces and registering | ||||||
| them in the BBR subsystem; any configuration would be done in the | ||||||
| same way (e.g., code and/or configuration file) | ||||||
| - Reuse common code from EPP, such as `TypedName`, wherever make sense, but avoid reusing specialized code with non-BBR functionality to avoid abuse | ||||||
| - Enable extensible collection and registration of metrics using lessons from the pluggable scheduling sub-system | ||||||
| - Provide reference plugin implementations. | ||||||
|
|
||||||
| ## Non-Goals | ||||||
|
|
||||||
| - Modify existing GIE abstractions | ||||||
| - Fully align plugins, registries, and factories across BBR and EPP | ||||||
| - Dynamically reconfigure plugins and plugin topologies at runtime | ||||||
|
|
||||||
| ## Proposal | ||||||
|
|
||||||
| ### Overview | ||||||
|
|
||||||
| There is an embedded `BBRPlugin` interface building on the `Plugin` interface adopted from EPP. This interface should be implemented by any BBR plugin. Each pluigin is identified by its `TypedName` (adopted from EPP), where `TypedName().Type` gives the string representing the type of the plugin and `TypedName().Name()` returns the string representing the plugins implementation. BBR is refactored to implement the registered factory pattern. To that end, a `PluginRegistry` interface and its implementation are added to register `BBRPlugin` factories and concrete implementations created by the factories. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
nit: In EPP factories are a concrete implementation and not an interface. What is the reason for the factory abstraction? |
||||||
| In addition, a `PluginsChain` interface is defined to define an order of plugin executions. In the future, `PluginsChain` will be replaced by `PluginsDAG` to allow for more complex topological order and concurrency. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the first implementation it might be worthwhile to treat it as a single plugin solely for "body based routing" functionality. Chains and DAGs might be desireable once the BBR changes to support body based processing and not just routing. |
||||||
|
|
||||||
| `PluginsChain` only contains ordered `BBRPlugin` types registered in the `PluginRegistry`. `RequestPluginsChain` and `ResponsePluginsChain` are optionally configured for handling requests and responses respectively. If no configuration is provided, default `PluginsChain` instances will be configured automatically. | ||||||
|
|
||||||
| Depending on a `BBRPlugin` functionality and implementation, the plugin might require full or selective body parsing. To save the parsing overhead, if there is at least one `BBRPlugin` in the `PluginsChain` that requires full body parsing, the parsing is performed only once into a shared official appropriate `openai-go` struct (either `openai.CompletionNewParams` or `openai.ChatCompletionNewParams` depending on the request endpoint). This struct is shared for read-only to all plugins in the chain. Each `BBRplugin` receives the shared struct by value. If a plugin needs to mutate the body, in the initial implementation, it MUST work on its own copy, and the a mutated body is returned separately by each plugiin. | ||||||
|
|
||||||
| ### Suggested Components | ||||||
|
|
||||||
| The sketch of the proposed framework is shown in the figure below. | ||||||
| <img src="./images/pluggable-framework-architecture-sketch.png" alt="Components of the proposed framework" width="1000" /> | ||||||
|
|
||||||
| ### Suggested BBR Pluggable Framework Interfaces | ||||||
|
|
||||||
| ```go | ||||||
| // ------------------------------------ Defaults ------------------------------------------ | ||||||
| const DefaultPluginType = "MetadataExtractor" | ||||||
| const DefaultPluginImplementation = "simple-model-selector" | ||||||
|
|
||||||
| // BBRPlugin defines the interface for plugins in the BBR framework | ||||||
| type BBRPlugin interface { | ||||||
| plugins.Plugin | ||||||
|
|
||||||
| // RequiresFullParsing indicates whether full body parsing is required | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: consider using type assertion where presence of |
||||||
| // to facilitate efficient memory sharing across plugins in a chain. | ||||||
| RequiresFullParsing() bool | ||||||
|
|
||||||
| // Execute runs the plugin logic on the request body. | ||||||
| // A plugin's imnplementation logic CAN mutate the body of the message. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| // A plugin's implementation MUST return a map of headers | ||||||
| // If no headers are set by the implementation, the map must be empty | ||||||
| // A value of a header in an extended implementation NEED NOT to be identical to the value of that same header as would be set | ||||||
| // in a default implementation. | ||||||
| // Example: in the body of a request model is set to "semantic-model-selector", | ||||||
| // which, say, stands for "select a best model for this request at minimal cost" | ||||||
| // A plugin implementation of "semantic-model-selector" sets X-Gateway-Model-Name to any valid | ||||||
| // model name from the inventory of the backend models and also mutates the body accordingly | ||||||
| // In contrast, | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sentence cut off? |
||||||
| Execute(requestBodyBytes []byte) ( | ||||||
| headers map[string]string, | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not clear from description if
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also, note that HTTP headers are |
||||||
| mutatedBodyBytes []byte, | ||||||
| err error, | ||||||
| ) | ||||||
| } | ||||||
|
|
||||||
|
|
||||||
| // placeholder for BBRPlugin constructors | ||||||
| type PluginFactoryFunc func() bbrplugins.BBRPlugin //concrete constructors are assigned to this type | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: please add empty space following |
||||||
|
|
||||||
| // PluginRegistry defines operations for managing plugin factories and plugin instances | ||||||
| type PluginRegistry interface { | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. would be worthwhile to explain need for interface and not concrete type. |
||||||
| RegisterFactory(typeKey string, factory PluginFactoryFunc) error //constructors | ||||||
| RegisterPlugin(plugin bbrplugins.BBRPlugin) error //registers a plugin instance (the instance MUST be created via the factory first) | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Explain rationale for having both factories and plugin instance in the same interface/structure. |
||||||
| GetFactory(typeKey string) (PluginFactoryFunc, error) | ||||||
| GetPlugin(typeKey string) (bbrplugins.BBRPlugin, error) | ||||||
| GetFactories() map[string]PluginFactoryFunc | ||||||
| GetPlugins() map[string]bbrplugins.BBRPlugin | ||||||
| ListPlugins() []string | ||||||
| ListFactories() []string | ||||||
| CreatePlugin(typeKey string) (bbrplugins.BBRPlugin, error) | ||||||
| ContainsFactory(typeKey string) bool | ||||||
| ContainsPlugin(typeKey string) bool | ||||||
| String() string //human readable string for logging | ||||||
| } | ||||||
|
|
||||||
| // PluginsChain is used to define a specific order of execution of the BBRPlugin instances stored in the registry | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggest moving this to future (once the need arises, will have concrete requirements for chain or DAG, parallel or sequential execution, etc.). |
||||||
| // The BBRPlugin instances | ||||||
| type PluginsChain interface { | ||||||
| AddPlugin(typeKey string, registry PluginRegistry) error //to be added to the chain the plugin should be registered in the registry first | ||||||
| AddPluginAtInd(typeKey string, i int, r PluginRegistry) error //only affects the instance of the plugin chain | ||||||
| GetPlugin(index int, registry PluginRegistry) (bbrplugins.BBRPlugin, error) //retrieves i-th plugin as defined in the chain from the registry | ||||||
| Length() int | ||||||
| ParseChatCompletion(data []byte) (openai.ChatCompletionNewParams, error) //parses the bytes slice into an appropriate openai-go struct | ||||||
| ParseCompletion(data []byte) (openai.CompletionNewParams, error) //likewise | ||||||
| GetSharedMemory(which string) interface{} //returns an appropriate shared open-ai struct dependent on whether which | ||||||
| //corresponds to Completion or ChatCompletion endpoint requested in the body | ||||||
| Run(bodyBytes []byte, registry PluginRegistry) ([]byte, map[string]string, error) //return potentially mutated body and all headers map safely merged | ||||||
| String() string | ||||||
| } | ||||||
| //NOTE: for simplicity, in the initial PR, PluginsChain instance will be defined request only | ||||||
| ``` | ||||||
|
|
||||||
| ### Defaults | ||||||
|
|
||||||
| ```go | ||||||
|
|
||||||
| const ( | ||||||
| //A deafult plugin implementation of this plugin type will always be configured for request plugins chain | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
nit: please close comment lines with |
||||||
| //Even though BBRPlugin type is not (yet) a K8s resource, it's logically akin to `kind` | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not a limitation we have in EPP. Perhaps delay it until a more formal k8s Plugin type is defined in some AI gateway work stream? |
||||||
| //MUST start wit an upper case letter, use CamelNotation, only aplhanumericals after the first letter | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| PluginTypePattern = `^[A-Z][A-Za-z0-9]*$` | ||||||
| MaxPluginTypeLength = 63 | ||||||
| DefaultPluginType = "MetaDataExtractor" | ||||||
| // Even though BBRPlugin is not a K8s resource yet, let's make its naming compliant with K8s resource naming | ||||||
| // Allows: lowercase letters, digits, hyphens, dots. | ||||||
| // Must start and end with a lowercase alphanumeric character. | ||||||
| // Middle characters group can contain lowercase alphanumerics, hyphens, and dots | ||||||
| // Middle and rightmost groups are optional | ||||||
| PluginNamePattern = `^[a-z0-9]([-a-z0-9.]*[a-z0-9])?$` | ||||||
| DefaultPluginName = "simple-model-extractor" | ||||||
| MaxPluginNameLength = 253 | ||||||
| //Well-known custom header set to a model name | ||||||
| ModelHeader = "X-Gateway-Model-Name" | ||||||
| ) | ||||||
| ``` | ||||||
|
|
||||||
| ### Current BBR reimplementation as BBRPlugin | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure we need to have (most of) the code duplicated. Suffice to say the current implementation can be made compliant with the proposed interface. |
||||||
|
|
||||||
| ```go | ||||||
| / ------------------------------------ DEFAULT PLUGIN IMPLEMENTATION ---------------------------------------------- | ||||||
|
|
||||||
| type simpleModelExtractor struct { //implements the MetadataExtractor interface | ||||||
| typedName plugins.TypedName | ||||||
| requiresFullParsing bool | ||||||
| } | ||||||
|
|
||||||
| // defaultMetaDataExtractor implements the MetadataExtractor interface and extracts only the mmodel name AS-IS | ||||||
| type defaultMetaDataExtractor struct { | ||||||
| typedName plugins.TypedName | ||||||
| requiresFullParsing bool //this field will be used to determine whether shared struct should be created in this chain | ||||||
| } | ||||||
|
|
||||||
| // NewSimpleModelExtractor is a factory that constructs SimpleModelExtractor plugin | ||||||
| // A developer who wishes to create her own implementation, will implement the BBRPlugin interface and | ||||||
| // use Registry and PluginsChain to register and execute the plugin (together with other plugins in a chain) | ||||||
| func NewDefaultMetaDataExtractor() BBRPlugin { | ||||||
| return &defaultMetaDataExtractor{ | ||||||
| typedName: plugins.TypedName{ | ||||||
| Type: DefaultPluginType, | ||||||
| Name: "simple-model-extractor", | ||||||
| }, | ||||||
| requiresFullParsing: false, | ||||||
| } | ||||||
| } | ||||||
|
|
||||||
| func (s *defaultMetaDataExtractor) RequiresFullParsing() bool { | ||||||
| return s.requiresFullParsing | ||||||
| } | ||||||
|
|
||||||
| func (s *defaultMetaDataExtractor) TypedName() plugins.TypedName { | ||||||
| return s.typedName | ||||||
| } | ||||||
|
|
||||||
| // Execute extracts the "model" from the JSON request body and sets X-Gateway-Model-Name header. | ||||||
| // This implementation intentionally ignores metaDataKeys and does not mutate the body. | ||||||
| // It expects the request body to be a JSON object containing a "model" field. | ||||||
| // A nil for metaDataKeysToHeaders map SHOULD be specified by a caller for clarity | ||||||
| // The metaDataKeysToHeaders is explicitly ignored in this implementation | ||||||
| // This implementation is simply refactoring of the default BBR implementation to work with the pluggable framework | ||||||
| func (s *defaultMetaDataExtractor) Execute(requestBodyBytes []byte) ( | ||||||
| headers map[string]string, | ||||||
| mutatedBodyBytes []byte, | ||||||
| err error) { | ||||||
|
|
||||||
| type RequestBody struct { | ||||||
| Model string `json:"model"` | ||||||
| } | ||||||
|
|
||||||
| h := make(map[string]string) | ||||||
|
|
||||||
| var requestBody RequestBody | ||||||
|
|
||||||
| if err := json.Unmarshal(requestBodyBytes, &requestBody); err != nil { | ||||||
| // return original body on decode failure | ||||||
| return nil, requestBodyBytes, err | ||||||
| } | ||||||
|
|
||||||
| if requestBody.Model == "" { | ||||||
| return nil, requestBodyBytes, fmt.Errorf("missing required field: model") | ||||||
| } | ||||||
|
|
||||||
| // ModelHeader is a constant defined in ./pkg/bbr/plugins/interfaces | ||||||
| h[ModelHeader] = requestBody.Model | ||||||
|
|
||||||
| // Body is not mutated in this implementation hence returning original requestBodyBytes. This is intentional. | ||||||
| return h, requestBodyBytes, nil | ||||||
| } | ||||||
|
|
||||||
| func (s *defaultMetaDataExtractor) String() string { | ||||||
| return fmt.Sprintf(("BBRPlugin{%v/requiresFullParsing=%v}"), s.TypedName(), s.requiresFullParsing) | ||||||
| } | ||||||
| ``` | ||||||
|
|
||||||
| ### Implementation Phases | ||||||
|
|
||||||
| The pluggable framework will be implemented iteratively over several phases. | ||||||
|
|
||||||
| 1. Introduce `BBRPlugin` `MetadataExtractor`, interface, registry, plugins chain, sample plugin implementation (`SimpleModelExtraction`) and its factory. Plugin configuration will be implemented via environment variables set in helm chart | ||||||
| 1. Introduce a second plugin interface, `ModelSelector` and sample plugin implementation | ||||||
| 1. Introduce shared struct (shared among the plugins of a plugins chain) | ||||||
| 1. Introduce an interface for guardrail plugin, introduce simple reference implementation, experiment with plugins chains on request and response messages | ||||||
| 1. Refactor metrics as needed to work with the new pluggable framework | ||||||
| 1. Implement configuration via manifests similar to those in EPP | ||||||
| 1. Implement `PluginsDAG` to allow for more complex topological order and concurrency. | ||||||
| 1. Continously learn lessons from this implementation and scheduling framework to improve the implementation | ||||||
| 1. Aim at aligning and cross-polination with the [AI GW WG]("https://github.com/kubernetes-sigs/wg-ai-gateway"). | ||||||
|
|
||||||
| ## Open Questions | ||||||
|
|
||||||
| 1. More elaborate shared memory architecture for the best performance | ||||||
| 1. TBA | ||||||
|
|
||||||
| ## Note | ||||||
|
|
||||||
| The proposed interfaces can slightly change from those implemented in the initial [PR 1981](https://github.com/kubernetes-sigs/gateway-api-inference-extension/pull/1981) | ||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider grouping goals into sections (e.g., splitting goals into required and optional, generalizing into categories).
We might be able to support ordered and concurrent execution at a later stage, when a concrete requirement is defined.