Releases: deepset-ai/hayhooks
v1.18.0
This release adds first-class OpenTelemetry tracing support to Hayhooks, with end-to-end visibility across REST, OpenAI-compatible endpoints, and MCP operations.
Install tracing support with:
pip install "hayhooks[tracing]"✨ OpenTelemetry Tracing Support
Hayhooks now emits structured tracing spans for key lifecycle and runtime actions, including:
- Pipeline deploy / prepare / commit / startup deploy / undeploy
- Pipeline run endpoint (
/<pipeline>/run) - OpenAI-compatible execution (
/chat/completions,/responses) and file uploads - MCP actions (
list_tools,call_tool, and pipeline-as-tool execution)
Streaming responses are traced with stream-aware metadata, and failures are tagged consistently for easier diagnosis.
⚙️ Configuration and Bootstrap
Tracing uses standard OpenTelemetry configuration (OTEL_* environment variables), plus one Hayhooks-specific tuning option:
HAYHOOKS_TRACING_EXCLUDED_SPANS(default:["send", "receive"]) to reduce low-level ASGI span noise in streaming scenarios.
Hayhooks also attempts OTLP auto-bootstrap at startup when:
OTEL_EXPORTER_OTLP_ENDPOINTorOTEL_EXPORTER_OTLP_TRACES_ENDPOINTis set- Protocol is supported via
OTEL_EXPORTER_OTLP_TRACES_PROTOCOL/OTEL_EXPORTER_OTLP_PROTOCOL
📈 Log Correlation Improvements
When tracing is enabled, logs now include normalized trace_id and span_id context (alongside existing request_id) to simplify correlation between logs and traces.
📚 Documentation
- Added a new tracing reference page
- Expanded environment-variable docs with OpenTelemetry guidance
- Added tracing notes in installation, logging, MCP docs, and README
What's Changed
- Add tracing support to Hayhooks by @mpangrazzi in #236
Full Changelog: v1.17.0...v1.18.0
v1.17.0
✨ Reasoning Content Support
This release adds first-class support for reasoning chunks streamed by modern reasoning-capable models (e.g. GPT-5 family such as gpt-5.4-mini and gpt-5, or Claude Opus 4.6 via compatible gateways). Reasoning output is forwarded to clients automatically — no pipeline wrapper changes required — and Open WebUI renders it as collapsible "Thinking" blocks out of the box.
Automatic Reasoning Streaming
Both the Chat Completions (/v1/chat/completions) and Responses API (/v1/responses) endpoints now handle StreamingChunk objects that carry a reasoning field:
- Chat Completions: reasoning tokens are emitted as
reasoning_contenton the message delta, following the DeepSeek convention — compatible with Open WebUI and other clients. - Responses API: reasoning tokens are emitted as
response.reasoning_summary_text.delta/response.reasoning_summary_text.doneSSE events, producingtype: "reasoning"output items with asummaryarray (matching the OpenAI spec).
on_reasoning Callback
A new on_reasoning callback for streaming_generator / async_streaming_generator lets pipeline wrappers intercept reasoning chunks — similar to the existing on_tool_call_start / on_tool_call_end hooks:
from typing import Any
from hayhooks import PipelineEvent, streaming_generator
def on_reasoning(
text: str,
extra: dict[str, Any] | None,
) -> PipelineEvent | str | None | list[PipelineEvent | str]:
return text
def run_chat_completion(self, model, messages, body):
return streaming_generator(
pipeline=self.pipeline,
pipeline_run_args={"messages": messages},
on_reasoning=on_reasoning,
)/run Endpoint Reasoning Fallback
When a StreamingChunk has empty content but carries reasoning, the /run streaming endpoint now forwards the reasoning text instead of emitting an empty string.
📦 Dependency Updates
- Bumped
fastapi-openai-compatfrom>=1.1.0to>=1.2.0to pick up reasoning content support in the OpenAI-compatible layer.
📚 Documentation
- New Reasoning Content sections in the OpenAI Compatibility and Open WebUI Integration pages.
- New Reasoning Content Callback section in the Pipeline Wrapper concept page.
- Updated Agent Deployment docs with a pointer to the new reasoning agent example.
🆕 New Example
reasoning_agent— a minimal Open WebUI–ready pipeline wrapper usingOpenAIResponsesChatGeneratorwithgpt-5.4-mini, showing how reasoning summaries are streamed to the UI.
What's Changed
- Add reasoning chunks support (#235) by @mpangrazzi
Full Changelog: v1.16.0...v1.17.0
v1.16.0
✨ CLI & Logging Overhaul
This release brings a polished, branded look to the Hayhooks CLI and unifies all log output - including uvicorn, FastAPI, and application logs - through a single, color-coded Loguru pipeline. See PR for visual examples!
Branded CLI Theme
The entire CLI now uses a consistent color palette (#4A7AFF brand blue, semantic greens/reds/yellows) powered by a new Rich theme system. Panels have been replaced with lightweight prefixed messages (✔ / ✘ / !) for a cleaner, less noisy terminal experience. Typer help screens, tables, progress bars, and all status output follow the same visual language.
Unified Logging via Loguru
All stdlib loggers (uvicorn, uvicorn.error, uvicorn.access, fastapi) are now intercepted and routed through Loguru, giving you one consistent log format regardless of whether the message comes from the framework or application code. Key details:
- Colored log levels — each severity gets its own distinct color
- Request ID middleware — every HTTP/WebSocket request is tagged with a short unique ID (
x-request-idheader), threaded through all log lines vialoguru.contextualize - Pipeline execution logging — pipeline runs now log their name, parameters, and elapsed time automatically
HAYHOOKS_LOG_LEVEL— new env var (replaces the legacyLOGalias, which still works as a fallback)HAYHOOKS_LOG_FORMAT— set toverboseto includemodule:function:linemetadata in every log lineHAYHOOKS_INTERCEPTED_LOGGERS— configure which stdlib loggers are intercepted (defaults to uvicorn + FastAPI; addhaystackor others as needed)log_config=Nonepassed to uvicorn — prevents double-formatted log output
Shared Color Palette
A new hayhooks.colors module defines the canonical palette used by both the CLI (Rich) and the server (Loguru) layers, ensuring visual consistency across the entire tool.
📚 Documentation
- Updated Environment Variables reference with the new
HAYHOOKS_LOG_LEVEL,HAYHOOKS_LOG_FORMAT, andHAYHOOKS_INTERCEPTED_LOGGERSsettings - Expanded Logging reference with verbose vs. default format examples and intercepted-logger configuration
🔧 CI
- Switched to trusted publishing for PyPI releases (#233)
What's Changed
- CLI and logging overhaul by @mpangrazzi in #232
- build: switch to trusted publishing by @julian-risch in #233
Full Changelog: v1.15.0...v1.16.0
Contributors
v1.15.0
✨ New Features
OpenAI Responses API Support
Hayhooks now supports the OpenAI Responses API (/v1/responses) alongside the existing Chat Completions API. Pipeline wrappers can implement run_response or run_response_async to handle Responses API requests — with full support for streaming (named SSE events), non-streaming, and async modes.
This makes Hayhooks compatible with clients that use the Responses API wire format, such as the OpenAI Codex CLI.
class PipelineWrapper(BasePipelineWrapper):
def setup(self) -> None:
self.pipeline = Pipeline.loads(...)
def run_response(self, model: str, input_items: list[dict], body: dict) -> str | Generator:
question = get_last_user_input_text(input_items)
result = self.pipeline.run({"prompt_builder": {"question": question}})
return result["llm"]["replies"][0].textSee the examples:
responses_with_file_upload— Responses API with file uploadsagent_codex— Hybrid tool-calling with Codex CLI (client-side tools + server-side enrichment)chat_completion_with_file_upload— Chat Completions API with file uploads
Files API (/v1/files)
A new /v1/files endpoint lets clients upload files for use with the Responses API. Pipeline wrappers can override run_file_upload to store or process uploaded files. If no wrapper implements file handling, Hayhooks returns a stub FileObject with a warning — so the endpoint is always available.
Responses API Utilities
New public utility functions make it easy to work with Responses API input items inside pipeline wrappers:
| Utility | Description |
|---|---|
get_last_user_input_text(input_items) |
Extract the last user text from Responses API input items |
get_input_files(input_items) |
Extract all input_file content parts from input items |
chat_messages_from_openai_response(input_items) |
Convert Responses API input items to Haystack ChatMessage objects (including function_call / function_call_output round-trips) |
All three are exported from the top-level hayhooks package.
🏗️ Internal Improvements
Refactored OpenAI Router
The OpenAI router has been refactored from a single create_openai_router call into four composable sub-routers (create_models_router, create_chat_completion_router, create_responses_router, create_files_router), powered by the upgraded fastapi-openai-compat >= 1.1.0 dependency. Shared dispatch logic is consolidated in a single _run_pipeline_method helper, reducing duplication between Chat Completions and Responses code paths.
Smarter Stream Handling
When a pipeline wrapper returns a plain str but the client requested stream=True, Hayhooks now automatically wraps the string in a single-chunk generator instead of failing. This means non-streaming wrappers work transparently with streaming clients.
📚 Documentation
- Added Development Best Practices guide — tips for local development, debugging, logging, and testing
- Added Production Best Practices guide — CORS lockdown, health checks, structured logging, secret management, and more
- Expanded OpenAI Compatibility docs with Responses API, Files API, and utility function reference
- New Codex CLI integration example (
agent_codex) — demonstrates hybrid tool-calling where Codex owns client-side tools and Hayhooks enriches with server-side tools - New file upload examples for both Chat Completions and Responses API
🔧 CI
- Pinned all GitHub Actions to specific commit SHAs for supply-chain security (#231)
What's Changed
- chore: pin GitHub Actions to specific commit SHAs by @julian-risch in #231
- Add OpenAI Responses API and Files API support by @mpangrazzi in #230
Full Changelog: v1.14.0...v1.15.0
v1.14.0
✨ New Features
Async Deploy & Undeploy
Runtime deploy and undeploy operations — via both REST API and MCP — now run asynchronously off the event loop using asyncio.to_thread. This means deploying or undeploying a pipeline no longer blocks other incoming requests.
A new HAYHOOKS_DEPLOY_CONCURRENCY setting controls how these operations are synchronized:
serialized(default): One deploy/undeploy at a time — safe and predictable.parallel: Allow concurrent deploy/undeploy for higher admin throughput.
# Default: one deploy at a time (safe)
export HAYHOOKS_DEPLOY_CONCURRENCY=serialized
# Advanced: concurrent deploys (use with caution)
export HAYHOOKS_DEPLOY_CONCURRENCY=parallelParallel Startup Deployment
When many pipelines are loaded from HAYHOOKS_PIPELINES_DIR at startup, deployment time can now be dramatically reduced. Hayhooks introduces a two-phase approach: pipelines are prepared in parallel (file I/O, module loading, wrapper setup()) using a bounded thread pool, then committed serially to the registry with a single OpenAPI schema rebuild at the end.
Two new environment variables control this behavior:
| Variable | Default | Description |
|---|---|---|
HAYHOOKS_STARTUP_DEPLOY_STRATEGY |
parallel |
parallel or sequential |
HAYHOOKS_STARTUP_DEPLOY_WORKERS |
4 |
Max worker threads (1–32) |
# Parallel startup with 8 workers (recommended for many pipelines)
export HAYHOOKS_STARTUP_DEPLOY_STRATEGY=parallel
export HAYHOOKS_STARTUP_DEPLOY_WORKERS=8
# Fall back to sequential if needed
export HAYHOOKS_STARTUP_DEPLOY_STRATEGY=sequential🏗️ Internal Improvements
Prepare / Commit Architecture
The deploy logic has been refactored into a clean two-phase pattern:
- Prepare — the expensive, thread-safe work (file I/O, YAML parsing, module loading, wrapper
setup()) is isolated intoprepare_pipeline_yamlandprepare_pipeline_files, which return aPreparedPipelinedataclass. - Commit — the cheap, shared-state mutations (registry update, route addition) happen in
commit_prepared_pipeline, which must run serially.
This separation is what enables both parallel startup and the async deploy/undeploy features.
Deferred OpenAPI Rebuild
During batch startup deployments, the OpenAPI schema is now rebuilt exactly once at the end instead of after every individual pipeline. This avoids redundant app.setup() calls and further reduces startup latency.
YAML Parse-Once Optimization
YAML source code is now parsed once via parse_yaml_pipeline() and shared with downstream helpers (get_inputs_outputs_from_yaml, get_streaming_components_from_yaml), eliminating redundant yaml.safe_load calls per pipeline.
log_elapsed Decorator
A new log_elapsed logging utility automatically measures and logs wall-clock time for decorated functions — used throughout the deploy pipeline for observability.
📚 Documentation
- Added Startup Deploy Performance section to Deployment Guidelines
- Added Runtime Deploy Concurrency section to Deployment Guidelines
- Documented new environment variables:
HAYHOOKS_DEPLOY_CONCURRENCY,HAYHOOKS_STARTUP_DEPLOY_STRATEGY,HAYHOOKS_STARTUP_DEPLOY_WORKERSin the Environment Variables reference
🧪 Tests
- Added comprehensive test suite for async deploy/undeploy, parallel vs. sequential startup, deferred OpenAPI rebuild, prepare/commit pipeline workflow, and deploy concurrency policies
🔧 CI
- Switched to the official Hatch install GitHub Action (
pypa/hatch@install) for faster and more reliable CI across all workflows
What's Changed
- Async / parallel deployment with prepare-commit architecture, refactoring and tests by @mpangrazzi
- ci: use Hatch install action by @anakin87 in #226
Full Changelog: v1.13.0...v1.14.0
v1.13.0
What's Changed
- Fix
on_tool_call_startreceivingNoneinstead ofdictfor arguments by @mpangrazzi in #224 - Detect if 'inner' components supports streamng (e.g.
CodeComponent) by @mpangrazzi in #225
Full Changelog: v1.12.1...v1.13.0
v1.12.1
What's Changed
- No need to force-include public folder by @mpangrazzi in #223
Full Changelog: v1.12.0...v1.12.1
v1.12.0
✨ New Features
Chainlit Chat UI Integration
Hayhooks can now serve a built-in Chainlit-powered chat interface for your deployed pipelines - no frontend code required. Enable it with a single flag and get a fully-featured chat UI out of the box.
Key features:
- Streaming chat - Real-time token streaming via SSE
- Automatic model discovery - Pipelines are listed from
/v1/models; auto-selects if only one is deployed - Custom React elements - Pipelines can emit rich UI widgets (cards, charts, etc.) rendered from
.jsxfiles - Tool call visualization - Tool arguments and results are displayed in formatted steps
- Status updates & notifications - Progress indicators and toast-style messages during pipeline execution
- Fully configurable - Custom Chainlit app, mount path, request timeout, and more
# Start Hayhooks with the Chainlit UI enabled
hayhooks run --with-chainlit
# Or via environment variable
HAYHOOKS_CHAINLIT_ENABLED=true hayhooks runPipelines can emit rich events (status, tool results, custom elements) through callbacks:
from hayhooks.chainlit_events import create_custom_element_event
class PipelineWrapper(BasePipelineWrapper):
def on_tool_call_end(self, tool_name, arguments, result, error):
return [
create_custom_element_event(
name="WeatherCard",
props={"location": "Rome", "temperature": 22}
)
]See the Chainlit Integration docs and the Weather Agent example for full details.
OpenAI Chat Completion Compatibility Layer Update
Hayhooks OpenAI compatibility layer for the Chat Completion API is now powered by fastapi-openai-compat.
📚 Documentation
- Added comprehensive Chainlit Integration documentation with architecture diagrams and configuration reference
- Updated README with Chainlit integration notes
- Fixed some broken documentation links
--
What's Changed
- Integration of
fastapi-openai-compatfor OpenAI Chat Completion compat layer by @mpangrazzi in #218 - Chainlit integration by @mpangrazzi in #212
- Update README and docs with a note about chainlit integration by @mpangrazzi in #220
- docs: fix docs link by @anakin87 in #221
- Refactoring
ui->chainlitby @mpangrazzi in #222
Full Changelog: v1.11.0...v1.12.0
Contributors
v1.11.0
✨ New Features
Context Variable Propagation in Sync Streaming
The sync streaming_generator now propagates contextvars into the pipeline execution thread. This means caller-set context, such as tracing/span IDs, request-scoped state, or authentication tokens is correctly available inside the pipeline thread during streaming execution.
The async streaming paths (asyncio.create_task, asyncio.to_thread) already handled this automatically and are unaffected.
🎨 Other Changes
- Use
deepset/haystack:stableas the base Docker image instead of the previous default
What's Changed
- chore: use
deepset/haystack:stableas base Docker image by @anakin87 in #216 - Add contextvars copying to sync streaming generator by @mpangrazzi in #217
Full Changelog: v1.10.0...v1.11.0
v1.10.0
✨ New Features
File Response Support
You can now build cleaner APIs that return images, PDFs, audio, or any binary content directly — no Base64 encoding or JSON wrapping needed. Just return a FastAPI Response object (e.g. FileResponse, StreamingResponse) from run_api and Hayhooks will serve it straight to the client with the correct Content-Type.
from fastapi.responses import FileResponse
from hayhooks import BasePipelineWrapper
class PipelineWrapper(BasePipelineWrapper):
def setup(self) -> None:
pass
def run_api(self, prompt: str) -> FileResponse:
image_path = generate_image(prompt)
return FileResponse(path=image_path, media_type="image/png", filename="result.png")OpenAPI docs are also automatically updated to reflect the correct response type for these endpoints.
See the File Response Support docs and the Image Generation example for full details.
🏗️ Internal Improvements
Unified Pipeline Deployment Architecture
YAML pipeline handling has been refactored to use the same PipelineWrapper architecture as wrapper-based pipelines. A new internal YAMLPipelineWrapper class now wraps YAML pipelines so both deployment paths share the same code — simplifying the codebase, reducing duplication, and making future improvements easier to implement consistently.
📚 Documentation
- Added File Response Support feature documentation
- Updated the PipelineWrapper docs with file response and
response_classsections - Added an Image Generation example showing how to return images from
run_api
🎨 Other Changes
- Removed the project triaging GitHub Actions workflow
What's Changed
- chore: remove project workflow for triaging by @julian-risch in #213
- Internals refactoring for unify pipeline deployment architecture by @mpangrazzi in #214
- Better handling of Response subclasses by @mpangrazzi in #215
Full Changelog: v1.9.0...v1.10.0