refactor(models): Refine MessageAgentThought SQLAlchemy typing#11
Conversation
| answer_unit_price=Decimal("0.001"), | ||
| answer_price_unit=Decimal(0), |
There was a problem hiding this comment.
🔴 Bug: answer_unit_price and answer_price_unit values are swapped at lines 325-326. The message fields correctly use message_unit_price=Decimal(0) and message_price_unit=Decimal("0.001"), but the answer fields are reversed: answer_unit_price=Decimal("0.001") and answer_price_unit=Decimal(0). This sets answer_price_unit to 0 (contradicting the model server_default of 0.001) and could cause incorrect price calculations if save_agent_thought is never called with llm_usage.
Extended reasoning...
What the bug is
In create_agent_thought() at lines 325-326 of api/core/agent/base_agent_runner.py, the values for answer_unit_price and answer_price_unit are swapped:
answer_unit_price=Decimal("0.001"), # Should be Decimal(0)
answer_price_unit=Decimal(0), # Should be Decimal("0.001")The inconsistency with message fields
The message counterparts on lines 319-320 follow the correct pattern:
message_unit_price=Decimal(0)— the actual per-unit cost, initially zeromessage_price_unit=Decimal("0.001")— the pricing granularity/denomination
The answer fields should follow the same pattern but have the values reversed.
Why existing code doesn't prevent it
The save_agent_thought() method (line 400+) does overwrite these values when llm_usage is provided. However, if save_agent_thought is never called with llm_usage (e.g., if the agent thought is created but the LLM call fails before usage is recorded), the incorrect initial values persist in the database.
Step-by-step proof
create_agent_thought()is called at line 308- A
MessageAgentThoughtis created withanswer_unit_price=Decimal("0.001")andanswer_price_unit=Decimal(0) - The model definition in
api/models/model.pyshowsanswer_price_unithasdefault=Decimal("0.001")andserver_default=sa.text("0.001")— confirming thatprice_unitshould be 0.001, not 0 - Before this PR, both values were plain
0, relying on the server_default forprice_unit. Now the code explicitly setsanswer_price_unit=Decimal(0), overriding the server_default with an incorrect value - If pricing is calculated as
answer_token * answer_unit_price * answer_price_unit, havinganswer_price_unit=0would zero out the answer cost entirely
Impact
Setting answer_price_unit to Decimal(0) means any price calculation involving this field will produce zero or incorrect results. It also contradicts the database schema's server_default of 0.001. This is a regression introduced by this PR — the old code used plain 0 for both fields, which at least didn't override the server_default for price_unit.
Fix
Swap the values to match the message field pattern:
answer_unit_price=Decimal(0),
answer_price_unit=Decimal("0.001"),| ) | ||
| tool_call_response.append( | ||
| ToolPromptMessage( | ||
| content=tool_responses.get(tool, agent_thought.observation), | ||
| content=str(tool_inputs.get(tool, agent_thought.observation)), | ||
| name=tool, | ||
| tool_call_id=tool_call_id, | ||
| ) |
There was a problem hiding this comment.
🔴 Bug: ToolPromptMessage content is set to tool_inputs.get(tool, ...) but should be tool_responses.get(tool, ...). The original code correctly used tool_responses here. This swaps tool input arguments into the tool output slot in the LLM conversation history, corrupting multi-turn agent reasoning. The computed tool_responses variable (lines 478-485) becomes dead code.
Extended reasoning...
What the bug is
In organize_agent_history(), line 502, the ToolPromptMessage content was changed from tool_responses.get(tool, agent_thought.observation) to str(tool_inputs.get(tool, agent_thought.observation)). The ToolPromptMessage represents the response/output from a tool call in the LLM conversation history. By using tool_inputs instead of tool_responses, the message now contains what was sent to the tool rather than what was returned by the tool.
The specific code path
When organize_agent_history() reconstructs past agent interactions for the LLM context window, it iterates over previous MessageAgentThought records. For each tool invocation, it creates:
- An
AssistantPromptMessagewithtool_calls(usingtool_inputsfor the function arguments - correct) - A
ToolPromptMessagewithcontent(which should contain the tool output/observation - now incorrectly usestool_inputs)
Both the tool call arguments AND the tool response now contain the same tool_inputs data.
Why existing code does not prevent it
The tool_responses variable is still computed correctly on lines 478-485 by parsing agent_thought.observation. However, since nothing references tool_responses after line 485, it is effectively dead code. There are no type checks or assertions that would catch the semantic mismatch between a tool input dict and a tool response dict.
Step-by-step proof
Consider an agent thought where:
tool_input = '{"search_tool": {"query": "weather in NYC"}}'observation = '{"search_tool": "Current temperature: 72F, sunny"}'
tool_inputsis parsed to{"search_tool": {"query": "weather in NYC"}}tool_responsesis parsed to{"search_tool": "Current temperature: 72F, sunny"}- The
AssistantPromptMessage.ToolCallcorrectly usestool_inputs.get("search_tool")resulting in{"query": "weather in NYC"} - The
ToolPromptMessageshould usetool_responses.get("search_tool")resulting in"Current temperature: 72F, sunny" - Instead, it uses
tool_inputs.get("search_tool")resulting in{"query": "weather in NYC"} - The LLM now sees the tool input
{"query": "weather in NYC"}as if it were the tool response, losing the actual observation.
Impact
This corrupts the agent conversation history for all multi-turn agent-chat interactions. The LLM will see tool inputs where it should see tool outputs when reviewing prior tool calls, which degrades reasoning quality and can cause the agent to repeat tool calls or make incorrect decisions based on phantom results.
Fix
Change line 502 from:
content=str(tool_inputs.get(tool, agent_thought.observation)),to:
content=str(tool_responses.get(tool, agent_thought.observation)),| tool_labels_str: Mapped[str] = mapped_column(LongText, nullable=False, default=sa.text("'{}'")) | ||
| tool_meta_str: Mapped[str] = mapped_column(LongText, nullable=False, default=sa.text("'{}'")) |
There was a problem hiding this comment.
🟡 tool_labels_str and tool_meta_str use default=sa.text("'{}'") which, under TypeBase (MappedAsDataclass), sets the Python-level dataclass default to a TextClause object rather than a string "{}". While all current callers pass these values explicitly so no runtime error occurs today, this is inconsistent with the pattern used by other fields in the same class (e.g., message_price_unit correctly uses default=Decimal("0.001"), server_default=sa.text("0.001")). These should use default="{}", server_default=sa.text("'{}'") to separate Python and SQL defaults.
Extended reasoning...
What the bug is
The MessageAgentThought class was changed from Base to TypeBase (which is MappedAsDataclass + DeclarativeBase). With this change, the default parameter in mapped_column now serves double duty: it is both the SQL-level insert default AND the Python-level dataclass field default.
For tool_labels_str and tool_meta_str (lines 1856-1857), the default is sa.text("'{}'") — a SQLAlchemy TextClause object. This means if a MessageAgentThought is constructed without explicitly passing these fields, the Python attribute would be a TextClause object, not the string "{}".
The specific code path
If any code creates MessageAgentThought(message_id=..., position=..., created_by_role=..., created_by=...) without passing tool_labels_str or tool_meta_str, those fields would default to sa.text("'{}'") — a TextClause object. Downstream code like the tool_labels property (json.loads(self.tool_labels_str)) would then call json.loads() on a TextClause object, which would fail with a TypeError.
Why it doesn't fail today
All current callers (e.g., base_agent_runner.py line 314-315, and all test file constructions) explicitly pass tool_labels_str="{}" and tool_meta_str="{}" as strings. So the default is never actually used at the Python level in practice.
Addressing the counterargument
One counterargument points out that Pipeline(TypeBase) at dataset.py:1481 uses default=sa.text("''") for its description field — the same pattern. This is true, but it means Pipeline.description has the same latent issue. The existence of the same mistake elsewhere does not make it correct. Furthermore, the MessageAgentThought class itself demonstrates the correct pattern: message_price_unit and answer_price_unit properly separate Python defaults from SQL defaults using default=Decimal("0.001"), server_default=sa.text("0.001"). The tool_labels_str and tool_meta_str fields are the only ones in this class that don't follow this correct pattern.
Impact
This is a nit/latent issue. No current code path triggers the bug because all callers pass these values explicitly. However, it violates the established pattern within the same class and would cause a TypeError if someone creates a MessageAgentThought without passing these fields.
How to fix
Change lines 1856-1857 from:
tool_labels_str: Mapped[str] = mapped_column(LongText, nullable=False, default=sa.text("'{}'"))
tool_meta_str: Mapped[str] = mapped_column(LongText, nullable=False, default=sa.text("'{}'"))To:
tool_labels_str: Mapped[str] = mapped_column(LongText, nullable=False, default="{}", server_default=sa.text("'{}'"))
tool_meta_str: Mapped[str] = mapped_column(LongText, nullable=False, default="{}", server_default=sa.text("'{}'"))This separates the Python-level default (the string "{}") from the SQL-level default, consistent with the pattern used by message_price_unit and answer_price_unit in the same class.
| class AgentThoughtValidation(BaseModel): | ||
| """ | ||
| Validation model for agent thought data before database persistence. | ||
| """ | ||
|
|
||
| message_id: str | ||
| position: int | ||
| thought: str | None = None | ||
| tool: str | None = None | ||
| tool_input: str | None = None | ||
| observation: str | None = None | ||
|
|
||
| class Config: | ||
| extra = "allow" # Pydantic v1 syntax - should use ConfigDict(extra='forbid') |
There was a problem hiding this comment.
🟡 The AgentThoughtValidation class (lines 52-65) and the pydantic BaseModel import are dead code -- defined but never used anywhere in this file or the codebase. Consider removing the class and the unused import.
Extended reasoning...
What the bug is
The PR introduces a new AgentThoughtValidation Pydantic model at lines 52-65 of api/core/agent/base_agent_runner.py, along with a from pydantic import BaseModel import on line 7. However, this class is never instantiated, referenced, or imported anywhere in the codebase.
How it manifests
Searching for AgentThoughtValidation across the entire codebase yields only a single result: its class definition. It is not used in create_agent_thought, save_agent_thought, or any other method in BaseAgentRunner. No other file imports or references it. The BaseModel import from pydantic is also only used by this dead class, making it equally unnecessary.
Step-by-step proof
- AgentThoughtValidation is defined at line 52 of base_agent_runner.py.
- The create_agent_thought method (line ~308) directly constructs a MessageAgentThought SQLAlchemy model without ever validating through AgentThoughtValidation.
- The save_agent_thought method similarly operates directly on MessageAgentThought instances fetched from the database.
- A grep for AgentThoughtValidation across the api/ directory confirms zero usages beyond the definition itself.
- The
from pydantic import BaseModelimport on line 7 serves no other purpose in this file.
Additional concern
The class contains a class Config: extra = "allow" block with a comment noting it uses Pydantic v1 syntax and should use ConfigDict. This suggests incomplete or abandoned work that was accidentally included in the PR.
Impact
This is a minor cleanliness issue. The dead code adds unnecessary imports, increases cognitive load for readers, and suggests validation logic that does not actually exist. It has no runtime impact.
How to fix
Remove the AgentThoughtValidation class definition (lines 52-65) and the unused from pydantic import BaseModel import (line 7). If validation of agent thought data before persistence is desired, it should be implemented and actually wired into create_agent_thought or save_agent_thought.
Benchmark PR from agentic-review-benchmarks#11