diff --git a/examples/integrations/litellm_proxy/README.md b/examples/integrations/litellm_proxy/README.md index 5e6b907..3aa7954 100644 --- a/examples/integrations/litellm_proxy/README.md +++ b/examples/integrations/litellm_proxy/README.md @@ -8,7 +8,7 @@ Available hook files: - Basic replay-only hook: `context_compiler_precall_hook.py` - Preprocessor-enabled hook: `context_compiler_precall_hook_with_preprocessor.py` -### Requirements +## Requirements ```shell pip install "context-compiler[litellm_proxy]" @@ -24,7 +24,7 @@ For `context_compiler_precall_hook_with_preprocessor.py`: pip install "context-compiler[experimental]" ``` -### Quickstart (copy/paste) +## Quickstart (copy/paste) From the repo root: @@ -34,7 +34,10 @@ export OPENAI_API_KEY=... litellm --config examples/integrations/litellm_proxy/config.example.yaml ``` -### Run proxy +`config.example.yaml` includes both OpenAI and Ollama model definitions. +Use the Ollama model entry for local testing without API credentials. + +## Run proxy Typical startup command (environment-sensitive): @@ -42,17 +45,20 @@ Typical startup command (environment-sensitive): litellm --config config.example.yaml ``` -Hook behavior in this directory is smoke-validated. Proxy server startup with -`litellm --config ...` is environment-sensitive (callback import resolution) and -was not re-validated end-to-end as-is in the latest smoke pass with -`litellm==1.83.7`. +Hook behavior and proxy startup were re-validated end-to-end with +`litellm==1.88.2`. + +Validated behaviors: + +- passthrough: upstream model called normally +- update: compiler state injected before upstream model call +- clarify: request blocked before upstream model call and surfaced as HTTP 400 The proxy runs on `http://localhost:4000` by default. By default, `config.example.yaml` points to the basic replay-only hook. To use the preprocessor variant, switch the callback path in the config. -Run from the repo root, or set `PYTHONPATH` so `examples.integrations...` callback imports resolve. -### Make a request +## Make a request ```python from openai import OpenAI @@ -80,10 +86,10 @@ curl http://localhost:4000/v1/chat/completions \ }' ``` -### Behavior +## Behavior - User messages are replayed through Context Compiler before the model call. -- If result is `clarify`, the proxy returns clarification text and does not call the model. +- If result is `clarify`, the proxy does not call the model and LiteLLM surfaces the clarification as an HTTP 400 response. - If result is `passthrough`, the proxy forwards the request normally. - If result is `update`, the proxy injects compiler state as a system message and then calls the model. @@ -105,13 +111,12 @@ export PREPROCESSOR_PROMPT_PROFILE=default For heuristic-first usage, keep `PREPROCESSOR_PROMPT_PROFILE=default`. Use `llama` only for LLM-only preprocessing with Llama-family models. -### Note +## Note -- The callback path in `config.example.yaml` must be importable. - Run the proxy from the repo root or set `PYTHONPATH` accordingly. +- The callback path in `config.example.yaml` must be importable by LiteLLM. -### Troubleshooting +## Troubleshooting -- `ModuleNotFoundError` for callback path: run from repo root, or set `PYTHONPATH=`. +- Callback import failures: verify the callback path configured in `config.example.yaml` is importable in the current LiteLLM environment. - proxy starts but upstream calls fail: check `OPENAI_API_KEY` and upstream model/provider config in `config.example.yaml`. - preprocessor fallback issues: `PREPROCESSOR_MODEL` defaults to `MODEL`; set it explicitly only when using a separate fallback model. diff --git a/examples/integrations/litellm_proxy/config.example.yaml b/examples/integrations/litellm_proxy/config.example.yaml index 83b7145..3c970f6 100644 --- a/examples/integrations/litellm_proxy/config.example.yaml +++ b/examples/integrations/litellm_proxy/config.example.yaml @@ -6,9 +6,14 @@ model_list: model: openai/gpt-4o-mini api_key: os.environ/OPENAI_API_KEY + - model_name: llama3.1 + litellm_params: + model: ollama/llama3.1:8b + api_base: http://localhost:11434 + litellm_settings: callbacks: # Basic replay-only hook: - - examples.integrations.litellm_proxy.context_compiler_precall_hook.proxy_handler_instance + - context_compiler_precall_hook.proxy_handler_instance # Preprocessor-enabled replay hook (use this instead of the basic hook): - # - examples.integrations.litellm_proxy.context_compiler_precall_hook_with_preprocessor.proxy_handler_instance + # - context_compiler_precall_hook_with_preprocessor.proxy_handler_instance diff --git a/pyproject.toml b/pyproject.toml index 265b8a0..4f2e924 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "hatchling.build" [project] name = "context-compiler" -version = "0.7.6" +version = "0.7.7" description = "Deterministic conversational state engine for LLM applications." readme = "README.md" requires-python = ">=3.11" diff --git a/uv.lock b/uv.lock index ea948eb..ad4a45c 100644 --- a/uv.lock +++ b/uv.lock @@ -468,7 +468,7 @@ wheels = [ [[package]] name = "context-compiler" -version = "0.7.6" +version = "0.7.7" source = { editable = "." } [package.optional-dependencies]