Skip to content

Fix Responses API tool calling for codex models#356

Open
dhruv0811 wants to merge 33 commits intomainfrom
codex-responses-api-support
Open

Fix Responses API tool calling for codex models#356
dhruv0811 wants to merge 33 commits intomainfrom
codex-responses-api-support

Conversation

@dhruv0811
Copy link
Contributor

@dhruv0811 dhruv0811 commented Mar 4, 2026

Summary

Fixes Responses API tool calling for codex and GPT models in DatabricksOpenAI and ChatDatabricks.

clients.py — Truncate FMAPI response/input IDs to 64 chars (FMAPI returns 191-char IDs but rejects them on replay). Covers both non-streaming (_truncate_response_ids) and streaming (_truncate_input_ids) paths.

chat_models.py — Convert tools from Chat Completions to Responses API format in _prepare_inputs. Fix msg_/fc_ ID prefixes required by FMAPI in _convert_lc_messages_to_responses_api. Truncate all IDs to 64 chars (the LangChain sync path uses the standard SDK client, not DatabricksOpenAI, so it needs its own truncation). Strip output-only fields (status, namespace) from passthrough blocks — model_dump() on Pydantic response items includes these, but FMAPI rejects them as unknown input parameters.

fmapi.py — Add discover_responses_models() and SKIP_RESPONSES_API for Responses API test discovery.

Test files — Add TestAgentToolCallingResponsesAPI (OpenAI) and TestLangGraphResponsesAPI (LangChain) with single-turn, multi-turn, and streaming tests.

Test plan

dhruv0811 and others added 30 commits March 2, 2026 13:42
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move import logging and log = logging.getLogger(__name__) to module level
- Remove inline import logging from retry functions and _discover_foundation_models
- Fix stale _XFAIL_MODELS references in docstrings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…onse-side fix

- Extract shared skip lists, discovery, retry helpers to test_utils/fmapi.py
- Use capabilities.function_calling API instead of probe-based detection
- Remove app-templates references from test docstrings
- Comment out response-side list content fix to test if only request-side + streaming is needed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… list

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…, id handling

- Convert tools from Chat Completions to Responses API format in _prepare_inputs
- Strip null values from passthrough blocks (FMAPI rejects as unknown params)
- Deduplicate function_call items (content blocks vs tool_calls)
- Use fc_ prefix for function_call ids (FMAPI requires "fc" prefix)
- Avoid lc_msg.id which exceeds FMAPI's 64-char limit

Enables codex models (Responses API-only) to work with ChatDatabricks(use_responses_api=True)
- Split discovery into discover_chat_models + discover_responses_models
- Add TestAgentToolCallingResponsesAPI (OpenAI Agents SDK via responses)
- Add TestLangGraphResponsesAPI (ChatDatabricks use_responses_api=True)
- Remove codex from skip list, test them via Responses API path
- GPT models now tested on both Chat Completions AND Responses API
- Use DatabricksOpenAI in get_openai_client (was vanilla OpenAI via SDK)
- Set DatabricksResponses via __dict__ in both sync/async __init__
  (overrides parent's cached_property)
- Add _msg_id() to prefix message ids with msg_ and truncate to 64 chars
- Add databricks-openai as local uv source for langchain dev
…upport

# Conflicts:
#	integrations/langchain/tests/integration_tests/test_fmapi_tool_calling.py
#	integrations/openai/tests/integration_tests/test_fmapi_tool_calling.py
#	src/databricks_ai_bridge/test_utils/fmapi.py
…ixes

- clients.py: only id truncation + DatabricksResponses wiring (no Gemini)
- Remove pyproject.toml and utils.py changes (local dev only)
- Add gpt-5.4 to RESPONSES_ONLY_MODELS
Both APIs now use symmetric skip lists:
- SKIP_CHAT_COMPLETIONS: models excluded from Chat Completions tests
- SKIP_RESPONSES_API: models excluded from Responses API tests
- discover_chat_models(skip) / discover_responses_models(skip)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…I call

_truncate_response_ids only works for non-streaming responses (stream
objects lack .id/.output). On the next agent turn, the SDK replays
191-char IDs as input, which FMAPI rejects with string_above_max_length.

Add _truncate_input_ids to scan kwargs["input"] before super().create()
in both sync and async DatabricksResponses, covering the streaming gap.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Live testing confirmed FMAPI does not reject null values (e.g.
namespace: null on function_call blocks). The strip-nulls
comprehension was defensive but unnecessary.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- _msg_id: keep msg_ prefix (required) but drop [:59] truncation
  (FMAPI accepted 104-char msg ids in testing)
- Remove has_function_calls_in_content flag: FMAPI accepts duplicate
  function_call items without error

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dhruv0811 and others added 3 commits March 11, 2026 16:07
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…sthrough blocks

model_dump() on Pydantic response items includes status (None or "completed") and
namespace (None) fields. FMAPI rejects these as unknown parameters on input items.
Fix at source with exclude_none=True and at replay with explicit status pop.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
FMAPI returns 197-char response IDs but enforces 64-char max on input.
The OpenAI path has _truncate_input_ids in DatabricksResponses, but the
LangChain sync path uses the standard SDK client directly. Add truncation
in _convert_lc_messages_to_responses_api for msg_, fc_, and passthrough IDs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@bbqiu bbqiu self-requested a review March 12, 2026 01:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant