rlippmann · rlippmann · May 30, 2026 · May 30, 2026 · May 30, 2026 · May 30, 2026
diff --git a/examples/integrations/litellm_proxy/README.md b/examples/integrations/litellm_proxy/README.md
@@ -8,7 +8,7 @@ Available hook files:
 - Basic replay-only hook: `context_compiler_precall_hook.py`
 - Preprocessor-enabled hook: `context_compiler_precall_hook_with_preprocessor.py`
 
-### Requirements
+## Requirements
 
 ```shell
 pip install "context-compiler[litellm_proxy]"
@@ -24,7 +24,7 @@ For `context_compiler_precall_hook_with_preprocessor.py`:
 pip install "context-compiler[experimental]"
 ```
 
-### Quickstart (copy/paste)
+## Quickstart (copy/paste)
 
 From the repo root:
 
@@ -34,25 +34,31 @@ export OPENAI_API_KEY=...
 litellm --config examples/integrations/litellm_proxy/config.example.yaml
 ```
 
-### Run proxy
+`config.example.yaml` includes both OpenAI and Ollama model definitions.
+Use the Ollama model entry for local testing without API credentials.
+
+## Run proxy
 
 Typical startup command (environment-sensitive):
 
 ```shell
 litellm --config config.example.yaml
 ```
 
-Hook behavior in this directory is smoke-validated. Proxy server startup with
-`litellm --config ...` is environment-sensitive (callback import resolution) and
-was not re-validated end-to-end as-is in the latest smoke pass with
-`litellm==1.83.7`.
+Hook behavior and proxy startup were re-validated end-to-end with
+`litellm==1.88.2`.
+
+Validated behaviors:
+
+- passthrough: upstream model called normally
+- update: compiler state injected before upstream model call
+- clarify: request blocked before upstream model call and surfaced as HTTP 400
 
 The proxy runs on `http://localhost:4000` by default.
 By default, `config.example.yaml` points to the basic replay-only hook.
 To use the preprocessor variant, switch the callback path in the config.
-Run from the repo root, or set `PYTHONPATH` so `examples.integrations...` callback imports resolve.
 
-### Make a request
+## Make a request
 
 ```python
 from openai import OpenAI
@@ -80,10 +86,10 @@ curl http://localhost:4000/v1/chat/completions \
   }'
 ```
 
-### Behavior
+## Behavior
 
 - User messages are replayed through Context Compiler before the model call.
-- If result is `clarify`, the proxy returns clarification text and does not call the model.
+- If result is `clarify`, the proxy does not call the model and LiteLLM surfaces the clarification as an HTTP 400 response.
 - If result is `passthrough`, the proxy forwards the request normally.
 - If result is `update`, the proxy injects compiler state as a system message and then calls the model.
 
@@ -105,13 +111,12 @@ export PREPROCESSOR_PROMPT_PROFILE=default
 For heuristic-first usage, keep `PREPROCESSOR_PROMPT_PROFILE=default`.
 Use `llama` only for LLM-only preprocessing with Llama-family models.
 
-### Note
+## Note
 
-- The callback path in `config.example.yaml` must be importable.
-  Run the proxy from the repo root or set `PYTHONPATH` accordingly.
+- The callback path in `config.example.yaml` must be importable by LiteLLM.
 
-### Troubleshooting
+## Troubleshooting
 
-- `ModuleNotFoundError` for callback path: run from repo root, or set `PYTHONPATH=<repo-root>`.
+- Callback import failures: verify the callback path configured in `config.example.yaml` is importable in the current LiteLLM environment.
 - proxy starts but upstream calls fail: check `OPENAI_API_KEY` and upstream model/provider config in `config.example.yaml`.
 - preprocessor fallback issues: `PREPROCESSOR_MODEL` defaults to `MODEL`; set it explicitly only when using a separate fallback model.
diff --git a/examples/integrations/litellm_proxy/config.example.yaml b/examples/integrations/litellm_proxy/config.example.yaml
@@ -6,9 +6,14 @@ model_list:
       model: openai/gpt-4o-mini
       api_key: os.environ/OPENAI_API_KEY
 
+  - model_name: llama3.1
+    litellm_params:
+      model: ollama/llama3.1:8b
+      api_base: http://localhost:11434
+
 litellm_settings:
   callbacks:
     # Basic replay-only hook:
-    - examples.integrations.litellm_proxy.context_compiler_precall_hook.proxy_handler_instance
+    - context_compiler_precall_hook.proxy_handler_instance
     # Preprocessor-enabled replay hook (use this instead of the basic hook):
-    # - examples.integrations.litellm_proxy.context_compiler_precall_hook_with_preprocessor.proxy_handler_instance
+    # - context_compiler_precall_hook_with_preprocessor.proxy_handler_instance
diff --git a/pyproject.toml b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 
 [project]
 name = "context-compiler"
-version = "0.7.6"
+version = "0.7.7"
 description = "Deterministic conversational state engine for LLM applications."
 readme = "README.md"
 requires-python = ">=3.11"

diff --git a/uv.lock b/uv.lock