Fix Responses API tool calling for codex models#356
Open
Conversation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move import logging and log = logging.getLogger(__name__) to module level - Remove inline import logging from retry functions and _discover_foundation_models - Fix stale _XFAIL_MODELS references in docstrings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…onse-side fix - Extract shared skip lists, discovery, retry helpers to test_utils/fmapi.py - Use capabilities.function_calling API instead of probe-based detection - Remove app-templates references from test docstrings - Comment out response-side list content fix to test if only request-side + streaming is needed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… list Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…, id handling - Convert tools from Chat Completions to Responses API format in _prepare_inputs - Strip null values from passthrough blocks (FMAPI rejects as unknown params) - Deduplicate function_call items (content blocks vs tool_calls) - Use fc_ prefix for function_call ids (FMAPI requires "fc" prefix) - Avoid lc_msg.id which exceeds FMAPI's 64-char limit Enables codex models (Responses API-only) to work with ChatDatabricks(use_responses_api=True)
- Split discovery into discover_chat_models + discover_responses_models - Add TestAgentToolCallingResponsesAPI (OpenAI Agents SDK via responses) - Add TestLangGraphResponsesAPI (ChatDatabricks use_responses_api=True) - Remove codex from skip list, test them via Responses API path - GPT models now tested on both Chat Completions AND Responses API
- Use DatabricksOpenAI in get_openai_client (was vanilla OpenAI via SDK) - Set DatabricksResponses via __dict__ in both sync/async __init__ (overrides parent's cached_property) - Add _msg_id() to prefix message ids with msg_ and truncate to 64 chars - Add databricks-openai as local uv source for langchain dev
…upport # Conflicts: # integrations/langchain/tests/integration_tests/test_fmapi_tool_calling.py # integrations/openai/tests/integration_tests/test_fmapi_tool_calling.py # src/databricks_ai_bridge/test_utils/fmapi.py
…ixes - clients.py: only id truncation + DatabricksResponses wiring (no Gemini) - Remove pyproject.toml and utils.py changes (local dev only) - Add gpt-5.4 to RESPONSES_ONLY_MODELS
Both APIs now use symmetric skip lists: - SKIP_CHAT_COMPLETIONS: models excluded from Chat Completions tests - SKIP_RESPONSES_API: models excluded from Responses API tests - discover_chat_models(skip) / discover_responses_models(skip)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…I call _truncate_response_ids only works for non-streaming responses (stream objects lack .id/.output). On the next agent turn, the SDK replays 191-char IDs as input, which FMAPI rejects with string_above_max_length. Add _truncate_input_ids to scan kwargs["input"] before super().create() in both sync and async DatabricksResponses, covering the streaming gap. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Live testing confirmed FMAPI does not reject null values (e.g. namespace: null on function_call blocks). The strip-nulls comprehension was defensive but unnecessary. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- _msg_id: keep msg_ prefix (required) but drop [:59] truncation (FMAPI accepted 104-char msg ids in testing) - Remove has_function_calls_in_content flag: FMAPI accepts duplicate function_call items without error Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…sthrough blocks model_dump() on Pydantic response items includes status (None or "completed") and namespace (None) fields. FMAPI rejects these as unknown parameters on input items. Fix at source with exclude_none=True and at replay with explicit status pop. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
FMAPI returns 197-char response IDs but enforces 64-char max on input. The OpenAI path has _truncate_input_ids in DatabricksResponses, but the LangChain sync path uses the standard SDK client directly. Add truncation in _convert_lc_messages_to_responses_api for msg_, fc_, and passthrough IDs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes Responses API tool calling for codex and GPT models in
DatabricksOpenAIandChatDatabricks.clients.py— Truncate FMAPI response/input IDs to 64 chars (FMAPI returns 191-char IDs but rejects them on replay). Covers both non-streaming (_truncate_response_ids) and streaming (_truncate_input_ids) paths.chat_models.py— Convert tools from Chat Completions to Responses API format in_prepare_inputs. Fixmsg_/fc_ID prefixes required by FMAPI in_convert_lc_messages_to_responses_api. Truncate all IDs to 64 chars (the LangChain sync path uses the standard SDK client, notDatabricksOpenAI, so it needs its own truncation). Strip output-only fields (status,namespace) from passthrough blocks —model_dump()on Pydantic response items includes these, but FMAPI rejects them as unknown input parameters.fmapi.py— Adddiscover_responses_models()andSKIP_RESPONSES_APIfor Responses API test discovery.Test files — Add
TestAgentToolCallingResponsesAPI(OpenAI) andTestLangGraphResponsesAPI(LangChain) with single-turn, multi-turn, and streaming tests.Test plan