Skip to content

Releases: deepset-ai/hayhooks

v1.18.0

24 Apr 11:38
c0ca81a

Choose a tag to compare

This release adds first-class OpenTelemetry tracing support to Hayhooks, with end-to-end visibility across REST, OpenAI-compatible endpoints, and MCP operations.

Install tracing support with:

pip install "hayhooks[tracing]"

✨ OpenTelemetry Tracing Support

Hayhooks now emits structured tracing spans for key lifecycle and runtime actions, including:

  • Pipeline deploy / prepare / commit / startup deploy / undeploy
  • Pipeline run endpoint (/<pipeline>/run)
  • OpenAI-compatible execution (/chat/completions, /responses) and file uploads
  • MCP actions (list_tools, call_tool, and pipeline-as-tool execution)

Streaming responses are traced with stream-aware metadata, and failures are tagged consistently for easier diagnosis.

⚙️ Configuration and Bootstrap

Tracing uses standard OpenTelemetry configuration (OTEL_* environment variables), plus one Hayhooks-specific tuning option:

  • HAYHOOKS_TRACING_EXCLUDED_SPANS (default: ["send", "receive"]) to reduce low-level ASGI span noise in streaming scenarios.

Hayhooks also attempts OTLP auto-bootstrap at startup when:

  • OTEL_EXPORTER_OTLP_ENDPOINT or OTEL_EXPORTER_OTLP_TRACES_ENDPOINT is set
  • Protocol is supported via OTEL_EXPORTER_OTLP_TRACES_PROTOCOL / OTEL_EXPORTER_OTLP_PROTOCOL

📈 Log Correlation Improvements

When tracing is enabled, logs now include normalized trace_id and span_id context (alongside existing request_id) to simplify correlation between logs and traces.

📚 Documentation


What's Changed

Full Changelog: v1.17.0...v1.18.0

v1.17.0

16 Apr 10:12
9d8e8b3

Choose a tag to compare

✨ Reasoning Content Support

This release adds first-class support for reasoning chunks streamed by modern reasoning-capable models (e.g. GPT-5 family such as gpt-5.4-mini and gpt-5, or Claude Opus 4.6 via compatible gateways). Reasoning output is forwarded to clients automatically — no pipeline wrapper changes required — and Open WebUI renders it as collapsible "Thinking" blocks out of the box.

Automatic Reasoning Streaming

Both the Chat Completions (/v1/chat/completions) and Responses API (/v1/responses) endpoints now handle StreamingChunk objects that carry a reasoning field:

  • Chat Completions: reasoning tokens are emitted as reasoning_content on the message delta, following the DeepSeek convention — compatible with Open WebUI and other clients.
  • Responses API: reasoning tokens are emitted as response.reasoning_summary_text.delta / response.reasoning_summary_text.done SSE events, producing type: "reasoning" output items with a summary array (matching the OpenAI spec).

on_reasoning Callback

A new on_reasoning callback for streaming_generator / async_streaming_generator lets pipeline wrappers intercept reasoning chunks — similar to the existing on_tool_call_start / on_tool_call_end hooks:

from typing import Any
from hayhooks import PipelineEvent, streaming_generator

def on_reasoning(
    text: str,
    extra: dict[str, Any] | None,
) -> PipelineEvent | str | None | list[PipelineEvent | str]:
    return text

def run_chat_completion(self, model, messages, body):
    return streaming_generator(
        pipeline=self.pipeline,
        pipeline_run_args={"messages": messages},
        on_reasoning=on_reasoning,
    )

/run Endpoint Reasoning Fallback

When a StreamingChunk has empty content but carries reasoning, the /run streaming endpoint now forwards the reasoning text instead of emitting an empty string.

📦 Dependency Updates

  • Bumped fastapi-openai-compat from >=1.1.0 to >=1.2.0 to pick up reasoning content support in the OpenAI-compatible layer.

📚 Documentation

🆕 New Example

  • reasoning_agent — a minimal Open WebUI–ready pipeline wrapper using OpenAIResponsesChatGenerator with gpt-5.4-mini, showing how reasoning summaries are streamed to the UI.

What's Changed

Full Changelog: v1.16.0...v1.17.0

v1.16.0

31 Mar 07:23
4e0b2d3

Choose a tag to compare

✨ CLI & Logging Overhaul

This release brings a polished, branded look to the Hayhooks CLI and unifies all log output - including uvicorn, FastAPI, and application logs - through a single, color-coded Loguru pipeline. See PR for visual examples!

Branded CLI Theme

The entire CLI now uses a consistent color palette (#4A7AFF brand blue, semantic greens/reds/yellows) powered by a new Rich theme system. Panels have been replaced with lightweight prefixed messages (✔ / ✘ / !) for a cleaner, less noisy terminal experience. Typer help screens, tables, progress bars, and all status output follow the same visual language.

Unified Logging via Loguru

All stdlib loggers (uvicorn, uvicorn.error, uvicorn.access, fastapi) are now intercepted and routed through Loguru, giving you one consistent log format regardless of whether the message comes from the framework or application code. Key details:

  • Colored log levels — each severity gets its own distinct color
  • Request ID middleware — every HTTP/WebSocket request is tagged with a short unique ID (x-request-id header), threaded through all log lines via loguru.contextualize
  • Pipeline execution logging — pipeline runs now log their name, parameters, and elapsed time automatically
  • HAYHOOKS_LOG_LEVEL — new env var (replaces the legacy LOG alias, which still works as a fallback)
  • HAYHOOKS_LOG_FORMAT — set to verbose to include module:function:line metadata in every log line
  • HAYHOOKS_INTERCEPTED_LOGGERS — configure which stdlib loggers are intercepted (defaults to uvicorn + FastAPI; add haystack or others as needed)
  • log_config=None passed to uvicorn — prevents double-formatted log output

Shared Color Palette

A new hayhooks.colors module defines the canonical palette used by both the CLI (Rich) and the server (Loguru) layers, ensuring visual consistency across the entire tool.

📚 Documentation

  • Updated Environment Variables reference with the new HAYHOOKS_LOG_LEVEL, HAYHOOKS_LOG_FORMAT, and HAYHOOKS_INTERCEPTED_LOGGERS settings
  • Expanded Logging reference with verbose vs. default format examples and intercepted-logger configuration

🔧 CI

  • Switched to trusted publishing for PyPI releases (#233)

What's Changed

Full Changelog: v1.15.0...v1.16.0

Contributors

v1.15.0

25 Mar 15:08
e73c1db

Choose a tag to compare

✨ New Features

OpenAI Responses API Support

Hayhooks now supports the OpenAI Responses API (/v1/responses) alongside the existing Chat Completions API. Pipeline wrappers can implement run_response or run_response_async to handle Responses API requests — with full support for streaming (named SSE events), non-streaming, and async modes.

This makes Hayhooks compatible with clients that use the Responses API wire format, such as the OpenAI Codex CLI.

class PipelineWrapper(BasePipelineWrapper):
    def setup(self) -> None:
        self.pipeline = Pipeline.loads(...)

    def run_response(self, model: str, input_items: list[dict], body: dict) -> str | Generator:
        question = get_last_user_input_text(input_items)
        result = self.pipeline.run({"prompt_builder": {"question": question}})
        return result["llm"]["replies"][0].text

See the examples:

Files API (/v1/files)

A new /v1/files endpoint lets clients upload files for use with the Responses API. Pipeline wrappers can override run_file_upload to store or process uploaded files. If no wrapper implements file handling, Hayhooks returns a stub FileObject with a warning — so the endpoint is always available.

Responses API Utilities

New public utility functions make it easy to work with Responses API input items inside pipeline wrappers:

Utility Description
get_last_user_input_text(input_items) Extract the last user text from Responses API input items
get_input_files(input_items) Extract all input_file content parts from input items
chat_messages_from_openai_response(input_items) Convert Responses API input items to Haystack ChatMessage objects (including function_call / function_call_output round-trips)

All three are exported from the top-level hayhooks package.

🏗️ Internal Improvements

Refactored OpenAI Router

The OpenAI router has been refactored from a single create_openai_router call into four composable sub-routers (create_models_router, create_chat_completion_router, create_responses_router, create_files_router), powered by the upgraded fastapi-openai-compat >= 1.1.0 dependency. Shared dispatch logic is consolidated in a single _run_pipeline_method helper, reducing duplication between Chat Completions and Responses code paths.

Smarter Stream Handling

When a pipeline wrapper returns a plain str but the client requested stream=True, Hayhooks now automatically wraps the string in a single-chunk generator instead of failing. This means non-streaming wrappers work transparently with streaming clients.

📚 Documentation

  • Added Development Best Practices guide — tips for local development, debugging, logging, and testing
  • Added Production Best Practices guide — CORS lockdown, health checks, structured logging, secret management, and more
  • Expanded OpenAI Compatibility docs with Responses API, Files API, and utility function reference
  • New Codex CLI integration example (agent_codex) — demonstrates hybrid tool-calling where Codex owns client-side tools and Hayhooks enriches with server-side tools
  • New file upload examples for both Chat Completions and Responses API

🔧 CI

  • Pinned all GitHub Actions to specific commit SHAs for supply-chain security (#231)

What's Changed

Full Changelog: v1.14.0...v1.15.0

v1.14.0

03 Mar 15:46
b285240

Choose a tag to compare

✨ New Features

Async Deploy & Undeploy

Runtime deploy and undeploy operations — via both REST API and MCP — now run asynchronously off the event loop using asyncio.to_thread. This means deploying or undeploying a pipeline no longer blocks other incoming requests.

A new HAYHOOKS_DEPLOY_CONCURRENCY setting controls how these operations are synchronized:

  • serialized (default): One deploy/undeploy at a time — safe and predictable.
  • parallel: Allow concurrent deploy/undeploy for higher admin throughput.
# Default: one deploy at a time (safe)
export HAYHOOKS_DEPLOY_CONCURRENCY=serialized

# Advanced: concurrent deploys (use with caution)
export HAYHOOKS_DEPLOY_CONCURRENCY=parallel

Parallel Startup Deployment

When many pipelines are loaded from HAYHOOKS_PIPELINES_DIR at startup, deployment time can now be dramatically reduced. Hayhooks introduces a two-phase approach: pipelines are prepared in parallel (file I/O, module loading, wrapper setup()) using a bounded thread pool, then committed serially to the registry with a single OpenAPI schema rebuild at the end.

Two new environment variables control this behavior:

Variable Default Description
HAYHOOKS_STARTUP_DEPLOY_STRATEGY parallel parallel or sequential
HAYHOOKS_STARTUP_DEPLOY_WORKERS 4 Max worker threads (1–32)
# Parallel startup with 8 workers (recommended for many pipelines)
export HAYHOOKS_STARTUP_DEPLOY_STRATEGY=parallel
export HAYHOOKS_STARTUP_DEPLOY_WORKERS=8

# Fall back to sequential if needed
export HAYHOOKS_STARTUP_DEPLOY_STRATEGY=sequential

🏗️ Internal Improvements

Prepare / Commit Architecture

The deploy logic has been refactored into a clean two-phase pattern:

  1. Prepare — the expensive, thread-safe work (file I/O, YAML parsing, module loading, wrapper setup()) is isolated into prepare_pipeline_yaml and prepare_pipeline_files, which return a PreparedPipeline dataclass.
  2. Commit — the cheap, shared-state mutations (registry update, route addition) happen in commit_prepared_pipeline, which must run serially.

This separation is what enables both parallel startup and the async deploy/undeploy features.

Deferred OpenAPI Rebuild

During batch startup deployments, the OpenAPI schema is now rebuilt exactly once at the end instead of after every individual pipeline. This avoids redundant app.setup() calls and further reduces startup latency.

YAML Parse-Once Optimization

YAML source code is now parsed once via parse_yaml_pipeline() and shared with downstream helpers (get_inputs_outputs_from_yaml, get_streaming_components_from_yaml), eliminating redundant yaml.safe_load calls per pipeline.

log_elapsed Decorator

A new log_elapsed logging utility automatically measures and logs wall-clock time for decorated functions — used throughout the deploy pipeline for observability.

📚 Documentation

🧪 Tests

  • Added comprehensive test suite for async deploy/undeploy, parallel vs. sequential startup, deferred OpenAPI rebuild, prepare/commit pipeline workflow, and deploy concurrency policies

🔧 CI


What's Changed

  • Async / parallel deployment with prepare-commit architecture, refactoring and tests by @mpangrazzi
  • ci: use Hatch install action by @anakin87 in #226

Full Changelog: v1.13.0...v1.14.0

v1.13.0

25 Feb 13:14
ca8de95

Choose a tag to compare

What's Changed

  • Fix on_tool_call_start receiving None instead of dict for arguments by @mpangrazzi in #224
  • Detect if 'inner' components supports streamng (e.g. CodeComponent) by @mpangrazzi in #225

Full Changelog: v1.12.1...v1.13.0

v1.12.1

24 Feb 15:03
6efc816

Choose a tag to compare

What's Changed

Full Changelog: v1.12.0...v1.12.1

v1.12.0

24 Feb 14:28
94c2bda

Choose a tag to compare

✨ New Features

Chainlit Chat UI Integration

Hayhooks can now serve a built-in Chainlit-powered chat interface for your deployed pipelines - no frontend code required. Enable it with a single flag and get a fully-featured chat UI out of the box.

Key features:

  • Streaming chat - Real-time token streaming via SSE
  • Automatic model discovery - Pipelines are listed from /v1/models; auto-selects if only one is deployed
  • Custom React elements - Pipelines can emit rich UI widgets (cards, charts, etc.) rendered from .jsx files
  • Tool call visualization - Tool arguments and results are displayed in formatted steps
  • Status updates & notifications - Progress indicators and toast-style messages during pipeline execution
  • Fully configurable - Custom Chainlit app, mount path, request timeout, and more
# Start Hayhooks with the Chainlit UI enabled
hayhooks run --with-chainlit

# Or via environment variable
HAYHOOKS_CHAINLIT_ENABLED=true hayhooks run

Pipelines can emit rich events (status, tool results, custom elements) through callbacks:

from hayhooks.chainlit_events import create_custom_element_event

class PipelineWrapper(BasePipelineWrapper):
    def on_tool_call_end(self, tool_name, arguments, result, error):
        return [
            create_custom_element_event(
                name="WeatherCard",
                props={"location": "Rome", "temperature": 22}
            )
        ]

See the Chainlit Integration docs and the Weather Agent example for full details.

OpenAI Chat Completion Compatibility Layer Update

Hayhooks OpenAI compatibility layer for the Chat Completion API is now powered by fastapi-openai-compat.

📚 Documentation

  • Added comprehensive Chainlit Integration documentation with architecture diagrams and configuration reference
  • Updated README with Chainlit integration notes
  • Fixed some broken documentation links

--

What's Changed

Full Changelog: v1.11.0...v1.12.0

Contributors

v1.11.0

13 Feb 16:17
a7cb36c

Choose a tag to compare

✨ New Features

Context Variable Propagation in Sync Streaming

The sync streaming_generator now propagates contextvars into the pipeline execution thread. This means caller-set context, such as tracing/span IDs, request-scoped state, or authentication tokens is correctly available inside the pipeline thread during streaming execution.

The async streaming paths (asyncio.create_task, asyncio.to_thread) already handled this automatically and are unaffected.

🎨 Other Changes

  • Use deepset/haystack:stable as the base Docker image instead of the previous default

What's Changed

  • chore: use deepset/haystack:stable as base Docker image by @anakin87 in #216
  • Add contextvars copying to sync streaming generator by @mpangrazzi in #217

Full Changelog: v1.10.0...v1.11.0

v1.10.0

09 Feb 15:39
1bcc019

Choose a tag to compare

✨ New Features

File Response Support

You can now build cleaner APIs that return images, PDFs, audio, or any binary content directly — no Base64 encoding or JSON wrapping needed. Just return a FastAPI Response object (e.g. FileResponse, StreamingResponse) from run_api and Hayhooks will serve it straight to the client with the correct Content-Type.

from fastapi.responses import FileResponse
from hayhooks import BasePipelineWrapper

class PipelineWrapper(BasePipelineWrapper):
    def setup(self) -> None:
        pass

    def run_api(self, prompt: str) -> FileResponse:
        image_path = generate_image(prompt)
        return FileResponse(path=image_path, media_type="image/png", filename="result.png")

OpenAPI docs are also automatically updated to reflect the correct response type for these endpoints.

See the File Response Support docs and the Image Generation example for full details.

🏗️ Internal Improvements

Unified Pipeline Deployment Architecture

YAML pipeline handling has been refactored to use the same PipelineWrapper architecture as wrapper-based pipelines. A new internal YAMLPipelineWrapper class now wraps YAML pipelines so both deployment paths share the same code — simplifying the codebase, reducing duplication, and making future improvements easier to implement consistently.

📚 Documentation

🎨 Other Changes

  • Removed the project triaging GitHub Actions workflow

What's Changed

Full Changelog: v1.9.0...v1.10.0