Releases · deepset-ai/hayhooks

24 Apr 11:38

v1.18.0

c0ca81a

v1.18.0 Latest

Latest

This release adds first-class OpenTelemetry tracing support to Hayhooks, with end-to-end visibility across REST, OpenAI-compatible endpoints, and MCP operations.

Install tracing support with:

pip install "hayhooks[tracing]"

✨ OpenTelemetry Tracing Support

Hayhooks now emits structured tracing spans for key lifecycle and runtime actions, including:

Pipeline deploy / prepare / commit / startup deploy / undeploy
Pipeline run endpoint (/<pipeline>/run)
OpenAI-compatible execution (/chat/completions, /responses) and file uploads
MCP actions (list_tools, call_tool, and pipeline-as-tool execution)

Streaming responses are traced with stream-aware metadata, and failures are tagged consistently for easier diagnosis.

⚙️ Configuration and Bootstrap

Tracing uses standard OpenTelemetry configuration (OTEL_* environment variables), plus one Hayhooks-specific tuning option:

HAYHOOKS_TRACING_EXCLUDED_SPANS (default: ["send", "receive"]) to reduce low-level ASGI span noise in streaming scenarios.

Hayhooks also attempts OTLP auto-bootstrap at startup when:

OTEL_EXPORTER_OTLP_ENDPOINT or OTEL_EXPORTER_OTLP_TRACES_ENDPOINT is set
Protocol is supported via OTEL_EXPORTER_OTLP_TRACES_PROTOCOL / OTEL_EXPORTER_OTLP_PROTOCOL

📈 Log Correlation Improvements

When tracing is enabled, logs now include normalized trace_id and span_id context (alongside existing request_id) to simplify correlation between logs and traces.

📚 Documentation

Added a new tracing reference page
Expanded environment-variable docs with OpenTelemetry guidance
Added tracing notes in installation, logging, MCP docs, and README

What's Changed

Add tracing support to Hayhooks by @mpangrazzi in #236

Full Changelog: v1.17.0...v1.18.0

Contributors

mpangrazzi

Assets 2

16 Apr 10:12

mpangrazzi

v1.17.0

9d8e8b3

v1.17.0

✨ Reasoning Content Support

This release adds first-class support for reasoning chunks streamed by modern reasoning-capable models (e.g. GPT-5 family such as gpt-5.4-mini and gpt-5, or Claude Opus 4.6 via compatible gateways). Reasoning output is forwarded to clients automatically — no pipeline wrapper changes required — and Open WebUI renders it as collapsible "Thinking" blocks out of the box.

Automatic Reasoning Streaming

Both the Chat Completions (/v1/chat/completions) and Responses API (/v1/responses) endpoints now handle StreamingChunk objects that carry a reasoning field:

Chat Completions: reasoning tokens are emitted as reasoning_content on the message delta, following the DeepSeek convention — compatible with Open WebUI and other clients.
Responses API: reasoning tokens are emitted as response.reasoning_summary_text.delta / response.reasoning_summary_text.done SSE events, producing type: "reasoning" output items with a summary array (matching the OpenAI spec).

`on_reasoning` Callback

A new on_reasoning callback for streaming_generator / async_streaming_generator lets pipeline wrappers intercept reasoning chunks — similar to the existing on_tool_call_start / on_tool_call_end hooks:

from typing import Any
from hayhooks import PipelineEvent, streaming_generator

def on_reasoning(
    text: str,
    extra: dict[str, Any] | None,
) -> PipelineEvent | str | None | list[PipelineEvent | str]:
    return text

def run_chat_completion(self, model, messages, body):
    return streaming_generator(
        pipeline=self.pipeline,
        pipeline_run_args={"messages": messages},
        on_reasoning=on_reasoning,
    )

`/run` Endpoint Reasoning Fallback

When a StreamingChunk has empty content but carries reasoning, the /run streaming endpoint now forwards the reasoning text instead of emitting an empty string.

📦 Dependency Updates

Bumped fastapi-openai-compat from >=1.1.0 to >=1.2.0 to pick up reasoning content support in the OpenAI-compatible layer.

📚 Documentation

New Reasoning Content sections in the OpenAI Compatibility and Open WebUI Integration pages.
New Reasoning Content Callback section in the Pipeline Wrapper concept page.
Updated Agent Deployment docs with a pointer to the new reasoning agent example.

🆕 New Example

reasoning_agent — a minimal Open WebUI–ready pipeline wrapper using OpenAIResponsesChatGenerator with gpt-5.4-mini, showing how reasoning summaries are streamed to the UI.

What's Changed

Add reasoning chunks support (#235) by @mpangrazzi

Full Changelog: v1.16.0...v1.17.0

Contributors

mpangrazzi

Assets 2

31 Mar 07:23

mpangrazzi

v1.16.0

4e0b2d3

v1.16.0

✨ CLI & Logging Overhaul

This release brings a polished, branded look to the Hayhooks CLI and unifies all log output - including uvicorn, FastAPI, and application logs - through a single, color-coded Loguru pipeline. See PR for visual examples!

Branded CLI Theme

The entire CLI now uses a consistent color palette (#4A7AFF brand blue, semantic greens/reds/yellows) powered by a new Rich theme system. Panels have been replaced with lightweight prefixed messages (✔ / ✘ / !) for a cleaner, less noisy terminal experience. Typer help screens, tables, progress bars, and all status output follow the same visual language.

Unified Logging via Loguru

All stdlib loggers (uvicorn, uvicorn.error, uvicorn.access, fastapi) are now intercepted and routed through Loguru, giving you one consistent log format regardless of whether the message comes from the framework or application code. Key details:

Colored log levels — each severity gets its own distinct color
Request ID middleware — every HTTP/WebSocket request is tagged with a short unique ID (x-request-id header), threaded through all log lines via loguru.contextualize
Pipeline execution logging — pipeline runs now log their name, parameters, and elapsed time automatically
HAYHOOKS_LOG_LEVEL — new env var (replaces the legacy LOG alias, which still works as a fallback)
HAYHOOKS_LOG_FORMAT — set to verbose to include module:function:line metadata in every log line
HAYHOOKS_INTERCEPTED_LOGGERS — configure which stdlib loggers are intercepted (defaults to uvicorn + FastAPI; add haystack or others as needed)
log_config=None passed to uvicorn — prevents double-formatted log output

Shared Color Palette

A new hayhooks.colors module defines the canonical palette used by both the CLI (Rich) and the server (Loguru) layers, ensuring visual consistency across the entire tool.

📚 Documentation

Updated Environment Variables reference with the new HAYHOOKS_LOG_LEVEL, HAYHOOKS_LOG_FORMAT, and HAYHOOKS_INTERCEPTED_LOGGERS settings
Expanded Logging reference with verbose vs. default format examples and intercepted-logger configuration

🔧 CI

Switched to trusted publishing for PyPI releases (#233)

What's Changed

CLI and logging overhaul by @mpangrazzi in #232
build: switch to trusted publishing by @julian-risch in #233

Full Changelog: v1.15.0...v1.16.0

Contributors

mpangrazzi and julian-risch

Assets 2

25 Mar 15:08

mpangrazzi

v1.15.0

e73c1db

v1.15.0

✨ New Features

OpenAI Responses API Support

Hayhooks now supports the OpenAI Responses API (/v1/responses) alongside the existing Chat Completions API. Pipeline wrappers can implement run_response or run_response_async to handle Responses API requests — with full support for streaming (named SSE events), non-streaming, and async modes.

This makes Hayhooks compatible with clients that use the Responses API wire format, such as the OpenAI Codex CLI.

class PipelineWrapper(BasePipelineWrapper):
    def setup(self) -> None:
        self.pipeline = Pipeline.loads(...)

    def run_response(self, model: str, input_items: list[dict], body: dict) -> str | Generator:
        question = get_last_user_input_text(input_items)
        result = self.pipeline.run({"prompt_builder": {"question": question}})
        return result["llm"]["replies"][0].text

See the examples:

responses_with_file_upload — Responses API with file uploads
agent_codex — Hybrid tool-calling with Codex CLI (client-side tools + server-side enrichment)
chat_completion_with_file_upload — Chat Completions API with file uploads

Files API (`/v1/files`)

A new /v1/files endpoint lets clients upload files for use with the Responses API. Pipeline wrappers can override run_file_upload to store or process uploaded files. If no wrapper implements file handling, Hayhooks returns a stub FileObject with a warning — so the endpoint is always available.

Responses API Utilities

New public utility functions make it easy to work with Responses API input items inside pipeline wrappers:

Utility	Description
`get_last_user_input_text(input_items)`	Extract the last user text from Responses API input items
`get_input_files(input_items)`	Extract all `input_file` content parts from input items
`chat_messages_from_openai_response(input_items)`	Convert Responses API input items to Haystack `ChatMessage` objects (including `function_call` / `function_call_output` round-trips)

All three are exported from the top-level hayhooks package.

🏗️ Internal Improvements

Refactored OpenAI Router

The OpenAI router has been refactored from a single create_openai_router call into four composable sub-routers (create_models_router, create_chat_completion_router, create_responses_router, create_files_router), powered by the upgraded fastapi-openai-compat >= 1.1.0 dependency. Shared dispatch logic is consolidated in a single _run_pipeline_method helper, reducing duplication between Chat Completions and Responses code paths.

Smarter Stream Handling

When a pipeline wrapper returns a plain str but the client requested stream=True, Hayhooks now automatically wraps the string in a single-chunk generator instead of failing. This means non-streaming wrappers work transparently with streaming clients.

📚 Documentation

Added Development Best Practices guide — tips for local development, debugging, logging, and testing
Added Production Best Practices guide — CORS lockdown, health checks, structured logging, secret management, and more
Expanded OpenAI Compatibility docs with Responses API, Files API, and utility function reference
New Codex CLI integration example (agent_codex) — demonstrates hybrid tool-calling where Codex owns client-side tools and Hayhooks enriches with server-side tools
New file upload examples for both Chat Completions and Responses API

🔧 CI

Pinned all GitHub Actions to specific commit SHAs for supply-chain security (#231)

What's Changed

chore: pin GitHub Actions to specific commit SHAs by @julian-risch in #231
Add OpenAI Responses API and Files API support by @mpangrazzi in #230

Full Changelog: v1.14.0...v1.15.0

Contributors

mpangrazzi and julian-risch

Assets 2

03 Mar 15:46

mpangrazzi

v1.14.0

b285240

v1.14.0

✨ New Features

Async Deploy & Undeploy

Runtime deploy and undeploy operations — via both REST API and MCP — now run asynchronously off the event loop using asyncio.to_thread. This means deploying or undeploying a pipeline no longer blocks other incoming requests.

A new HAYHOOKS_DEPLOY_CONCURRENCY setting controls how these operations are synchronized:

serialized (default): One deploy/undeploy at a time — safe and predictable.
parallel: Allow concurrent deploy/undeploy for higher admin throughput.

# Default: one deploy at a time (safe)
export HAYHOOKS_DEPLOY_CONCURRENCY=serialized

# Advanced: concurrent deploys (use with caution)
export HAYHOOKS_DEPLOY_CONCURRENCY=parallel

Parallel Startup Deployment

When many pipelines are loaded from HAYHOOKS_PIPELINES_DIR at startup, deployment time can now be dramatically reduced. Hayhooks introduces a two-phase approach: pipelines are prepared in parallel (file I/O, module loading, wrapper setup()) using a bounded thread pool, then committed serially to the registry with a single OpenAPI schema rebuild at the end.

Two new environment variables control this behavior:

Variable	Default	Description
`HAYHOOKS_STARTUP_DEPLOY_STRATEGY`	`parallel`	`parallel` or `sequential`
`HAYHOOKS_STARTUP_DEPLOY_WORKERS`	`4`	Max worker threads (1–32)

# Parallel startup with 8 workers (recommended for many pipelines)
export HAYHOOKS_STARTUP_DEPLOY_STRATEGY=parallel
export HAYHOOKS_STARTUP_DEPLOY_WORKERS=8

# Fall back to sequential if needed
export HAYHOOKS_STARTUP_DEPLOY_STRATEGY=sequential

🏗️ Internal Improvements

Prepare / Commit Architecture

The deploy logic has been refactored into a clean two-phase pattern:

Prepare — the expensive, thread-safe work (file I/O, YAML parsing, module loading, wrapper setup()) is isolated into prepare_pipeline_yaml and prepare_pipeline_files, which return a PreparedPipeline dataclass.
Commit — the cheap, shared-state mutations (registry update, route addition) happen in commit_prepared_pipeline, which must run serially.

This separation is what enables both parallel startup and the async deploy/undeploy features.

Deferred OpenAPI Rebuild

During batch startup deployments, the OpenAPI schema is now rebuilt exactly once at the end instead of after every individual pipeline. This avoids redundant app.setup() calls and further reduces startup latency.

YAML Parse-Once Optimization

YAML source code is now parsed once via parse_yaml_pipeline() and shared with downstream helpers (get_inputs_outputs_from_yaml, get_streaming_components_from_yaml), eliminating redundant yaml.safe_load calls per pipeline.

`log_elapsed` Decorator

A new log_elapsed logging utility automatically measures and logs wall-clock time for decorated functions — used throughout the deploy pipeline for observability.

📚 Documentation

Added Startup Deploy Performance section to Deployment Guidelines
Added Runtime Deploy Concurrency section to Deployment Guidelines
Documented new environment variables: HAYHOOKS_DEPLOY_CONCURRENCY, HAYHOOKS_STARTUP_DEPLOY_STRATEGY, HAYHOOKS_STARTUP_DEPLOY_WORKERS in the Environment Variables reference

🧪 Tests

Added comprehensive test suite for async deploy/undeploy, parallel vs. sequential startup, deferred OpenAPI rebuild, prepare/commit pipeline workflow, and deploy concurrency policies

🔧 CI

Switched to the official Hatch install GitHub Action (pypa/hatch@install) for faster and more reliable CI across all workflows

What's Changed

Async / parallel deployment with prepare-commit architecture, refactoring and tests by @mpangrazzi
ci: use Hatch install action by @anakin87 in #226

Full Changelog: v1.13.0...v1.14.0

Contributors

mpangrazzi and anakin87

Assets 2

25 Feb 13:14

mpangrazzi

v1.13.0

ca8de95

v1.13.0

What's Changed

Fix on_tool_call_start receiving None instead of dict for arguments by @mpangrazzi in #224
Detect if 'inner' components supports streamng (e.g. CodeComponent) by @mpangrazzi in #225

Full Changelog: v1.12.1...v1.13.0

Contributors

mpangrazzi

Assets 2

24 Feb 15:03

mpangrazzi

v1.12.1

6efc816

v1.12.1

What's Changed

No need to force-include public folder by @mpangrazzi in #223

Full Changelog: v1.12.0...v1.12.1

Contributors

mpangrazzi

Assets 2

24 Feb 14:28

mpangrazzi

v1.12.0

94c2bda

v1.12.0

✨ New Features

Chainlit Chat UI Integration

Hayhooks can now serve a built-in Chainlit-powered chat interface for your deployed pipelines - no frontend code required. Enable it with a single flag and get a fully-featured chat UI out of the box.

Key features:

Streaming chat - Real-time token streaming via SSE
Automatic model discovery - Pipelines are listed from /v1/models; auto-selects if only one is deployed
Custom React elements - Pipelines can emit rich UI widgets (cards, charts, etc.) rendered from .jsx files
Tool call visualization - Tool arguments and results are displayed in formatted steps
Status updates & notifications - Progress indicators and toast-style messages during pipeline execution
Fully configurable - Custom Chainlit app, mount path, request timeout, and more

# Start Hayhooks with the Chainlit UI enabled
hayhooks run --with-chainlit

# Or via environment variable
HAYHOOKS_CHAINLIT_ENABLED=true hayhooks run

Pipelines can emit rich events (status, tool results, custom elements) through callbacks:

from hayhooks.chainlit_events import create_custom_element_event

class PipelineWrapper(BasePipelineWrapper):
    def on_tool_call_end(self, tool_name, arguments, result, error):
        return [
            create_custom_element_event(
                name="WeatherCard",
                props={"location": "Rome", "temperature": 22}
            )
        ]

See the Chainlit Integration docs and the Weather Agent example for full details.

OpenAI Chat Completion Compatibility Layer Update

Hayhooks OpenAI compatibility layer for the Chat Completion API is now powered by fastapi-openai-compat.

📚 Documentation

Added comprehensive Chainlit Integration documentation with architecture diagrams and configuration reference
Updated README with Chainlit integration notes
Fixed some broken documentation links

What's Changed

Integration of fastapi-openai-compat for OpenAI Chat Completion compat layer by @mpangrazzi in #218
Chainlit integration by @mpangrazzi in #212
Update README and docs with a note about chainlit integration by @mpangrazzi in #220
docs: fix docs link by @anakin87 in #221
Refactoring ui -> chainlit by @mpangrazzi in #222

Full Changelog: v1.11.0...v1.12.0

Contributors

mpangrazzi and anakin87

Assets 2

13 Feb 16:17

mpangrazzi

v1.11.0

a7cb36c

v1.11.0

✨ New Features

Context Variable Propagation in Sync Streaming

The sync streaming_generator now propagates contextvars into the pipeline execution thread. This means caller-set context, such as tracing/span IDs, request-scoped state, or authentication tokens is correctly available inside the pipeline thread during streaming execution.

The async streaming paths (asyncio.create_task, asyncio.to_thread) already handled this automatically and are unaffected.

🎨 Other Changes

Use deepset/haystack:stable as the base Docker image instead of the previous default

What's Changed

chore: use deepset/haystack:stable as base Docker image by @anakin87 in #216
Add contextvars copying to sync streaming generator by @mpangrazzi in #217

Full Changelog: v1.10.0...v1.11.0

Contributors

mpangrazzi and anakin87

Assets 2

09 Feb 15:39

mpangrazzi

v1.10.0

1bcc019

v1.10.0

✨ New Features

File Response Support

You can now build cleaner APIs that return images, PDFs, audio, or any binary content directly — no Base64 encoding or JSON wrapping needed. Just return a FastAPI Response object (e.g. FileResponse, StreamingResponse) from run_api and Hayhooks will serve it straight to the client with the correct Content-Type.

from fastapi.responses import FileResponse
from hayhooks import BasePipelineWrapper

class PipelineWrapper(BasePipelineWrapper):
    def setup(self) -> None:
        pass

    def run_api(self, prompt: str) -> FileResponse:
        image_path = generate_image(prompt)
        return FileResponse(path=image_path, media_type="image/png", filename="result.png")

OpenAPI docs are also automatically updated to reflect the correct response type for these endpoints.

See the File Response Support docs and the Image Generation example for full details.

🏗️ Internal Improvements

Unified Pipeline Deployment Architecture

YAML pipeline handling has been refactored to use the same PipelineWrapper architecture as wrapper-based pipelines. A new internal YAMLPipelineWrapper class now wraps YAML pipelines so both deployment paths share the same code — simplifying the codebase, reducing duplication, and making future improvements easier to implement consistently.

📚 Documentation

Added File Response Support feature documentation
Updated the PipelineWrapper docs with file response and response_class sections
Added an Image Generation example showing how to return images from run_api

🎨 Other Changes

Removed the project triaging GitHub Actions workflow

What's Changed

chore: remove project workflow for triaging by @julian-risch in #213
Internals refactoring for unify pipeline deployment architecture by @mpangrazzi in #214
Better handling of Response subclasses by @mpangrazzi in #215

Full Changelog: v1.9.0...v1.10.0

Contributors

mpangrazzi and julian-risch

Assets 2

Releases: deepset-ai/hayhooks

v1.18.0

✨ OpenTelemetry Tracing Support

⚙️ Configuration and Bootstrap

📈 Log Correlation Improvements

📚 Documentation

What's Changed

Contributors

Uh oh!

v1.17.0

✨ Reasoning Content Support

Automatic Reasoning Streaming

on_reasoning Callback

/run Endpoint Reasoning Fallback

📦 Dependency Updates

📚 Documentation

🆕 New Example

What's Changed

Contributors

Uh oh!

v1.16.0

✨ CLI & Logging Overhaul

Branded CLI Theme

Unified Logging via Loguru

Shared Color Palette

📚 Documentation

🔧 CI

What's Changed

Contributors

Contributors

Uh oh!

v1.15.0

✨ New Features

OpenAI Responses API Support

Files API (/v1/files)

Responses API Utilities

🏗️ Internal Improvements

Refactored OpenAI Router

Smarter Stream Handling

📚 Documentation

🔧 CI

What's Changed

Contributors

Uh oh!

v1.14.0

✨ New Features

Async Deploy & Undeploy

Parallel Startup Deployment

🏗️ Internal Improvements

Prepare / Commit Architecture

Deferred OpenAPI Rebuild

YAML Parse-Once Optimization

log_elapsed Decorator

📚 Documentation

🧪 Tests

🔧 CI

What's Changed

Contributors

Uh oh!

v1.13.0

What's Changed

Contributors

Uh oh!

v1.12.1

What's Changed

Contributors

Uh oh!

v1.12.0

✨ New Features

Chainlit Chat UI Integration

OpenAI Chat Completion Compatibility Layer Update

📚 Documentation

What's Changed

Contributors

Contributors

Uh oh!

v1.11.0

✨ New Features

Context Variable Propagation in Sync Streaming

🎨 Other Changes

`on_reasoning` Callback

`/run` Endpoint Reasoning Fallback

Files API (`/v1/files`)

`log_elapsed` Decorator