This package provides optional preprocessing helpers for app integrations with Context Compiler.
It is experimental and separate from the deterministic core engine in src/.
Model or tool-description translation can help in simple direct cases, but raw model output is not safe for state changes on its own. The app must validate outputs before applying them.
In MCP/tool-calling environments, over-eager tool calling on conversational or ambiguous input is a known failure mode. Conservative preprocessing and validation help reduce unintended mutation.
Recommended install for integrations using this package:
pip install "context-compiler[experimental]".
Integrations should import this package from the installed environment rather than using repo-relative preprocessor paths.
Compatibility note:
- Use
heuristic_preprocessor.pyandparse_preprocessor_output(...).
heuristic_preprocessor.py: conservative structural preprocessing pass.output_validation.py: shared normalization and validation checks.prompt_utils.py: state-aware prompt rendering helper.constants.py: shared protocol literals and directive validation patterns.prompts/default.txt: default runtime prompt.prompts/llama.txt: stricter prompt for Llama-family models in LLM-only mode.
Public validator entry point:
parse_preprocessor_output(raw_output: object, *, source_input: str | None = None) -> str | Nonevalidate_preprocessor_output(raw_output: object, *, source_input: str | None = None) -> dict
All preprocessor outputs (heuristic or LLM) must be validated with
parse_preprocessor_output(...) before being applied.
Classification contract:
directive: safe, validated canonical directive (outputis a directive string)no_directive: confident ordinary content (outputisnull)unknown: unsafe to rewrite (outputisnull)
unknown is reject/abstain behavior. Malformed, ambiguous, mixed-intent,
quoted/reported, unsupported, or unsafe outputs must not be rewritten.
Only validated directive output may be used as rewritten compiler input.
no_directive and unknown must fall back to original user input.
source_input is optional at the API level for backward compatibility.
For integration behavior, it is REQUIRED for LLM fallback validation calls:
pass source_input=<original user text> so source-aware reject rules can
block unsafe rewrites.
Engine-owned near-misses are reject cases (for example set premise to X,
change premise X) and must remain unknown (not rewritten).
Raw preprocessor/LLM outputs must not be passed directly to the compiler. Raw model output must never directly change state.
The preprocessor does not expand directive grammar. It may emit only validated canonical directives that the compiler accepts.
Conservative boundary policy:
- Only process the full message, not pieces of it (no directive mining from prose).
- Emit at most one canonical directive; otherwise abstain.
- No sentence splitting or hidden multi-line extraction.
- No mixed directive + task extraction.
- No markdown/code-block extraction.
- No broad natural-language semantic rewriting.
- Prefer false negatives over false positives.
- Quoted payload tokens inside an otherwise canonical directive (for example
use "docker") are preserved as-is; they are not silently unquoted.
If you need natural-language proposal behavior, use an explicit host workflow that shows suggestions first and asks for confirmation.
- Run
preprocess_heuristic(message). - If a heuristic candidate directive exists, validate it with
parse_preprocessor_output(...). - If no valid directive was produced, run LLM fallback preprocess.
- Validate fallback output with
parse_preprocessor_output(..., source_input=message). - If a valid directive is produced, pass it through a normal compiler input path.
For session-owned integrations, use
engine.step(...). For transcript-based integrations that receive full chat history each turn:- use
context_compiler.compile_transcript(...)for stateless evaluation - use
engine.apply_transcript(...)to update an existing engine Otherwise pass the original user input unchanged.
- use
Decision handling reminder:
passthrough: no directive was applied; handle as ordinary user input.clarify: mutation is blocked; surfaceprompt_to_userand do not treat state as updated.update: validated canonical directive applied; use updated state as the source of truth.
This section is architectural direction, not committed implementation.
Future preprocessing may evolve beyond direct natural-language to directive conversion.
- Policy directives and premise-like facts have different risk profiles.
- Premise-like facts (for example,
I am vegetarian) may be useful persistent context, but should not be auto-persisted without confirmation. - Likely direction:
- conservative directive preprocessing remains separate
- a possible suggestion layer is inspectable, non-mutating, and previewable
- user confirms before anything is saved
- Use
prompts/default.txtas the recommended default prompt. - Use
prompts/llama.txtonly for LLM-only preprocessing with Llama-family models. - Heuristic-first integrations should still keep
default.txtas the normal fallback prompt unless there is a model-specific reason not to.
prompt_utils.py exposes:
render_prompt(path: Path, state: State) -> str | None
Behavior:
- reads prompt text from
path - strips leading
#header lines and leading blank lines - replaces
<NULL_OR_VALUE>and<SET OF CURRENT POLICY ITEMS>using state - returns
Noneif prompt loading fails
- This package does not mutate compiler state directly.
- State changes still occur only through compiler parsing/replay paths.