Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 21 additions & 16 deletions examples/integrations/litellm_proxy/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Available hook files:
- Basic replay-only hook: `context_compiler_precall_hook.py`
- Preprocessor-enabled hook: `context_compiler_precall_hook_with_preprocessor.py`

### Requirements
## Requirements

```shell
pip install "context-compiler[litellm_proxy]"
Expand All @@ -24,7 +24,7 @@ For `context_compiler_precall_hook_with_preprocessor.py`:
pip install "context-compiler[experimental]"
```

### Quickstart (copy/paste)
## Quickstart (copy/paste)

From the repo root:

Expand All @@ -34,25 +34,31 @@ export OPENAI_API_KEY=...
litellm --config examples/integrations/litellm_proxy/config.example.yaml
```

### Run proxy
`config.example.yaml` includes both OpenAI and Ollama model definitions.
Use the Ollama model entry for local testing without API credentials.

## Run proxy

Typical startup command (environment-sensitive):

```shell
litellm --config config.example.yaml
```

Hook behavior in this directory is smoke-validated. Proxy server startup with
`litellm --config ...` is environment-sensitive (callback import resolution) and
was not re-validated end-to-end as-is in the latest smoke pass with
`litellm==1.83.7`.
Hook behavior and proxy startup were re-validated end-to-end with
`litellm==1.88.2`.

Validated behaviors:

- passthrough: upstream model called normally
- update: compiler state injected before upstream model call
- clarify: request blocked before upstream model call and surfaced as HTTP 400

The proxy runs on `http://localhost:4000` by default.
By default, `config.example.yaml` points to the basic replay-only hook.
To use the preprocessor variant, switch the callback path in the config.
Run from the repo root, or set `PYTHONPATH` so `examples.integrations...` callback imports resolve.

### Make a request
## Make a request

```python
from openai import OpenAI
Expand Down Expand Up @@ -80,10 +86,10 @@ curl http://localhost:4000/v1/chat/completions \
}'
```

### Behavior
## Behavior

- User messages are replayed through Context Compiler before the model call.
- If result is `clarify`, the proxy returns clarification text and does not call the model.
- If result is `clarify`, the proxy does not call the model and LiteLLM surfaces the clarification as an HTTP 400 response.
- If result is `passthrough`, the proxy forwards the request normally.
- If result is `update`, the proxy injects compiler state as a system message and then calls the model.

Expand All @@ -105,13 +111,12 @@ export PREPROCESSOR_PROMPT_PROFILE=default
For heuristic-first usage, keep `PREPROCESSOR_PROMPT_PROFILE=default`.
Use `llama` only for LLM-only preprocessing with Llama-family models.

### Note
## Note

- The callback path in `config.example.yaml` must be importable.
Run the proxy from the repo root or set `PYTHONPATH` accordingly.
- The callback path in `config.example.yaml` must be importable by LiteLLM.

### Troubleshooting
## Troubleshooting

- `ModuleNotFoundError` for callback path: run from repo root, or set `PYTHONPATH=<repo-root>`.
- Callback import failures: verify the callback path configured in `config.example.yaml` is importable in the current LiteLLM environment.
- proxy starts but upstream calls fail: check `OPENAI_API_KEY` and upstream model/provider config in `config.example.yaml`.
- preprocessor fallback issues: `PREPROCESSOR_MODEL` defaults to `MODEL`; set it explicitly only when using a separate fallback model.
9 changes: 7 additions & 2 deletions examples/integrations/litellm_proxy/config.example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,14 @@ model_list:
model: openai/gpt-4o-mini
api_key: os.environ/OPENAI_API_KEY

- model_name: llama3.1
litellm_params:
model: ollama/llama3.1:8b
api_base: http://localhost:11434

litellm_settings:
callbacks:
# Basic replay-only hook:
- examples.integrations.litellm_proxy.context_compiler_precall_hook.proxy_handler_instance
- context_compiler_precall_hook.proxy_handler_instance
# Preprocessor-enabled replay hook (use this instead of the basic hook):
# - examples.integrations.litellm_proxy.context_compiler_precall_hook_with_preprocessor.proxy_handler_instance
# - context_compiler_precall_hook_with_preprocessor.proxy_handler_instance
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "hatchling.build"

[project]
name = "context-compiler"
version = "0.7.6"
version = "0.7.7"
description = "Deterministic conversational state engine for LLM applications."
readme = "README.md"
requires-python = ">=3.11"
Expand Down
2 changes: 1 addition & 1 deletion uv.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading