Fix Responses API tool calling for codex models by dhruv0811 · Pull Request #356 · databricks/databricks-ai-bridge

dhruv0811 · 2026-03-04T22:39:59Z

Summary

Fixes Responses API tool calling for codex and GPT models in DatabricksOpenAI and ChatDatabricks.

clients.py — Truncate FMAPI response/input IDs to 64 chars (FMAPI returns 191-char IDs but rejects them on replay). Covers both non-streaming (_truncate_response_ids) and streaming (_truncate_input_ids) paths.

chat_models.py — Convert tools from Chat Completions to Responses API format in _prepare_inputs. Fix msg_/fc_ ID prefixes required by FMAPI in _convert_lc_messages_to_responses_api. Truncate all IDs to 64 chars (the LangChain sync path uses the standard SDK client, not DatabricksOpenAI, so it needs its own truncation). Strip output-only fields (status, namespace) from passthrough blocks — model_dump() on Pydantic response items includes these, but FMAPI rejects them as unknown input parameters.

fmapi.py — Add discover_responses_models() and SKIP_RESPONSES_API for Responses API test discovery.

Test files — Add TestAgentToolCallingResponsesAPI (OpenAI) and TestLangGraphResponsesAPI (LangChain) with single-turn, multi-turn, and streaming tests.

Test plan

Full FMAPI tool calling integ tests CI run: https://github.com/databricks-eng/ai-oss-integration-tests-runner/actions/runs/22980584642
Unit tests pass (290 LangChain + 150 OpenAI)
Local live multi-turn agent test against FMAPI

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Move import logging and log = logging.getLogger(__name__) to module level - Remove inline import logging from retry functions and _discover_foundation_models - Fix stale _XFAIL_MODELS references in docstrings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…onse-side fix - Extract shared skip lists, discovery, retry helpers to test_utils/fmapi.py - Use capabilities.function_calling API instead of probe-based detection - Remove app-templates references from test docstrings - Comment out response-side list content fix to test if only request-side + streaming is needed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… list Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…, id handling - Convert tools from Chat Completions to Responses API format in _prepare_inputs - Strip null values from passthrough blocks (FMAPI rejects as unknown params) - Deduplicate function_call items (content blocks vs tool_calls) - Use fc_ prefix for function_call ids (FMAPI requires "fc" prefix) - Avoid lc_msg.id which exceeds FMAPI's 64-char limit Enables codex models (Responses API-only) to work with ChatDatabricks(use_responses_api=True)

- Split discovery into discover_chat_models + discover_responses_models - Add TestAgentToolCallingResponsesAPI (OpenAI Agents SDK via responses) - Add TestLangGraphResponsesAPI (ChatDatabricks use_responses_api=True) - Remove codex from skip list, test them via Responses API path - GPT models now tested on both Chat Completions AND Responses API

- Use DatabricksOpenAI in get_openai_client (was vanilla OpenAI via SDK) - Set DatabricksResponses via __dict__ in both sync/async __init__ (overrides parent's cached_property) - Add _msg_id() to prefix message ids with msg_ and truncate to 64 chars - Add databricks-openai as local uv source for langchain dev

…upport # Conflicts: # integrations/langchain/tests/integration_tests/test_fmapi_tool_calling.py # integrations/openai/tests/integration_tests/test_fmapi_tool_calling.py # src/databricks_ai_bridge/test_utils/fmapi.py

…upport

…ixes - clients.py: only id truncation + DatabricksResponses wiring (no Gemini) - Remove pyproject.toml and utils.py changes (local dev only) - Add gpt-5.4 to RESPONSES_ONLY_MODELS

Both APIs now use symmetric skip lists: - SKIP_CHAT_COMPLETIONS: models excluded from Chat Completions tests - SKIP_RESPONSES_API: models excluded from Responses API tests - discover_chat_models(skip) / discover_responses_models(skip)

…iring

…roperty

…run type

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…I call _truncate_response_ids only works for non-streaming responses (stream objects lack .id/.output). On the next agent turn, the SDK replays 191-char IDs as input, which FMAPI rejects with string_above_max_length. Add _truncate_input_ids to scan kwargs["input"] before super().create() in both sync and async DatabricksResponses, covering the streaming gap. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Live testing confirmed FMAPI does not reject null values (e.g. namespace: null on function_call blocks). The strip-nulls comprehension was defensive but unnecessary. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- _msg_id: keep msg_ prefix (required) but drop [:59] truncation (FMAPI accepted 104-char msg ids in testing) - Remove has_function_calls_in_content flag: FMAPI accepts duplicate function_call items without error Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…sthrough blocks model_dump() on Pydantic response items includes status (None or "completed") and namespace (None) fields. FMAPI rejects these as unknown parameters on input items. Fix at source with exclude_none=True and at replay with explicit status pop. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

FMAPI returns 197-char response IDs but enforces 64-char max on input. The OpenAI path has _truncate_input_ids in DatabricksResponses, but the LangChain sync path uses the standard SDK client directly. Add truncation in _convert_lc_messages_to_responses_api for msg_, fc_, and passthrough IDs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dhruv0811 and others added 30 commits March 2, 2026 13:42

Add FMAPI tool calling contract tests for DatabricksOpenAI

aa23d97

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge branch 'main' into fmapi-tool-calling-contract-tests

3644164

Fix missing Iterator/AsyncIterator imports for Gemini stream wrappers

6ebbfdc

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Simplify Gemini fixes, re-enable response-side fix, add codex to skip…

cca62cc

… list Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix ty error: suppress invalid-assignment on api_client.do() return type

6e49d61

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Truncate FMAPI response ids to 64 chars (server returns oversized ids)

27d2540

Exclude gpt-oss models from Responses API tests (not supported)

2dcd834

Fix LangChain Responses API tests: use correct tool names and prompts

34c3c19

Merge remote-tracking branch 'origin/main' into codex-responses-api-s…

5a60ffb

…upport

Clean up PR: remove Gemini/pyproject/utils changes, keep only codex f…

e598d67

…ixes - clients.py: only id truncation + DatabricksResponses wiring (no Gemini) - Remove pyproject.toml and utils.py changes (local dev only) - Add gpt-5.4 to RESPONSES_ONLY_MODELS

Add Gemini 2.5 models to chat skip list (list content issues)

9c7b26b

Add streaming test to TestLangGraphResponsesAPI

d344179

Add streaming test to TestAgentToolCallingResponsesAPI

f22186c

Replace __dict__ hack with @cached_property for DatabricksResponses w…

031499f

…iring

Minimal clients.py: only add _truncate_response_ids, keep existing @P…

2ac634c

…roperty

Clean up comments across PR

927a52f

Fix linting: type annotation, format, unit test expectations, Runner.…

689dbef

…run type

Clean up skip list comments in fmapi.py for consistency

775868a

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Revert test_clients.py changes: no new unit tests in this PR

436a080

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Reset test_clients.py to match remote main

93d570b

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dhruv0811 and others added 3 commits March 11, 2026 16:07

Shorten _truncate_input_ids docstring

fdc4b0c

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

bbqiu self-requested a review March 12, 2026 01:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Responses API tool calling for codex models#356

Fix Responses API tool calling for codex models#356
dhruv0811 wants to merge 33 commits intomainfrom
codex-responses-api-support

dhruv0811 commented Mar 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dhruv0811 commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dhruv0811 commented Mar 4, 2026 •

edited

Loading