feat(verifier): inject YouTube URLs as FileData for native video understanding#68
Conversation
…rstanding url_context fetches webpage HTML and cannot process video streams, so YouTube links passed to the verifier returned only page metadata at best. Add inject_youtube_filedata before_model_callback to ai_verifier: it scans llm_request.contents for YouTube URLs and appends them as FileData parts, allowing Gemini to watch the videos natively. The original URLs are kept intact so url_context still retrieves title/description grounding metadata. https://claude.ai/code/session_01XueFovJMCf3vhYa2wMexK1
There was a problem hiding this comment.
Code Review
This pull request introduces the ability for the ai_verifier agent to process YouTube video content by extracting URLs from the conversation and appending them as FileData to the LLM request. Feedback focuses on expanding the YouTube URL regex to include shorts and live streams, ensuring the LLM request structure remains valid by avoiding consecutive user messages, and adding a missing type hint for the llm_request parameter.
The failing trace used https://www.youtube.com/shorts/rS3wrXncnoQ but the initial regex only matched watch?v= and youtu.be/ patterns. https://claude.ai/code/session_01XueFovJMCf3vhYa2wMexK1
…tions The verifier was filling gaps in video content with Gemini's training knowledge (e.g. identifying an event as 2024 青鳥行動 when the video never stated it) and hallucinating media citations not in the input URLs. Add three hard rules to the instruction: - No training knowledge: video reports must be based solely on what is visible/audible; unknown fields must be stated as "無法從影片判斷" - Video title/hashtags are uploader metadata, not evidence of content - No invented citations: Sources list must only contain input URLs Langfuse trace: https://langfuse.cofacts.tw/project/cmm0emerr0001qi07eugd0760/sessions/3488376c-d9cd-4e08-ad2e-d18fc0e01361?peek=2744a1259f4c9caf825e5f353a15c8fc×tamp=2026-05-23T18%3A11%3A44.843Z&observation=1361e5c2af5a5f51 https://claude.ai/code/session_01XueFovJMCf3vhYa2wMexK1
c91f1e0 to
7c0b106
Compare
Previously a separate Content block was appended; now FileData parts are added directly into the parts array of the user message that contains the YouTube URL, keeping text and video content together in one message as the API intends. https://claude.ai/code/session_01XueFovJMCf3vhYa2wMexK1
- Regex now covers /live/, /embed/, /v/ paths in addition to /watch?v=, /shorts/, and youtu.be - Add Any type hint to llm_request parameter https://claude.ai/code/session_01XueFovJMCf3vhYa2wMexK1
… FileData into investigator Verifier: replace "title is not evidence" with two-layer reporting — uploader-provided metadata (title/description) and directly-observed content are reported as separate labelled sections so the writer can judge reliability rather than having the verifier suppress the title. Investigator: apply inject_youtube_filedata before_model_callback so it can watch YouTube videos before searching, preventing hallucinations like wrong speaker identity inferred from training data alone. Move _YOUTUBE_URL_RE and inject_youtube_filedata before ai_investigator so the forward reference is resolved. Langfuse trace: https://langfuse.cofacts.tw/project/cmm0emerr0001qi07eugd0760/sessions/67958cd8-920d-473c-b58f-151cc8a3b88d?peek=c61ea0203e9f57c7c679c3ca85533e34×tamp=2026-05-24T02%3A42%3A22.105Z&observation=ef7f8fda2de00798 https://claude.ai/code/session_01XueFovJMCf3vhYa2wMexK1
… crashing writer turn
Any exception thrown by a tool (verifier, investigator, get_single_cofacts_article, etc.)
now returns a structured {"error": ..., "message": "[SYSTEM] ..."} dict to the writer
instead of propagating as an unhandled exception.
Root cause traced to a hallucinated Facebook URL passed to verifier — url_context's
server-side URL prefetch returned 400 INVALID_ARGUMENT, crashing the entire writer turn.
This fix makes the writer resilient to any tool failure, not just this one case.
Langfuse trace: https://langfuse.cofacts.tw/project/cmm0emerr0001qi07eugd0760/traces/c61ea0203e9f57c7c679c3ca85533e34?observation=4f5cfafe4e1386a9
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ploadDate When a YouTube URL is present, inject_youtube_filedata already loads the video as file_data so Gemini can watch it. But Gemini was skipping the url_context call entirely — relying solely on video frames — and never fetching the page metadata (uploadDate, publishedAt). This caused the agent to misidentify which year's annual event was depicted, because upload date was the only signal that distinguished old footage reused in a new upload. Changes: - Step 1 now explicitly states url_context is MANDATORY even when file_data is in context, and explains the two sources return different information. - "Two-layer reporting" expanded to "Three-layer reporting for YouTube URLs": first layer is url_context page metadata (uploadDate required), then the uploader-provided title/description, then observed video content.
…o-agnostic three-layer reporting Step 1 was over-indexed on the YouTube/file_data special case, making it noisy for the majority of calls that don't involve video. Restructure so the general rule (url_context is mandatory for all URLs) leads, with a brief Note for the video case. "Three-layer reporting for YouTube URLs" renamed to "When video content is loaded in context" — same rule, but the trigger is the presence of video in context, not the URL platform. Applies automatically to any future video platform without prompt changes.
fix(verifier): require url_context for YouTube URLs to capture uploadDate
…-flash-lite gemini-3.1-flash-lite-preview has been taken offline by Google. Update all 4 proofreader agents (kmt, dpp, tpp, tpptw) to use the recommended GA replacement model per Google's migration guidance.
…-3OHcs fix: replace deprecated gemini-3.1-flash-lite-preview with gemini-3.1-flash-lite
|
@claude review |
|
Claude finished @MrOrz's task in 2m 26s —— View job Code ReviewI've completed a thorough review of this PR. Overall, the implementation is solid and the YouTube video understanding feature is well-designed. I found a few areas that could be improved for robustness and clarity. 🔍 Findings1. Missing Error Handling in
|
- Add try/except so YouTube injection failures don't crash the agent - Move `seen` set outside the content loop to deduplicate across the entire request - Replace `Any` with `LlmRequest` type annotation - Clarify verifier instruction: url_context is mandatory even for video URLs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Summary
Cofacts.ai often hallucinates or provides incomplete context when dealing with YouTube videos because it previously only relied on web metadata. This PR enables native video understanding for YouTube URLs across the agent system.
YouTube Native Video Understanding
inject_youtube_filedataas abefore_model_callbackfor bothai_verifierandai_investigator. It scans for YouTube URLs in the user messages and appends them asFileDataparts, allowing Gemini to watch the videos directly.url_context(and Google Search grounding) continues to provide page-level metadata (titles, upload dates, descriptions) alongside the video frames.verifierandinvestigatorto enforce a "No Training Knowledge" policy for media. Agents are now required to report only what is directly visible/audible to prevent hallucinating identities or event details not present in the video.System Robustness & Upgrades
gemini-3.1-flash-lite-previewto the stablegemini-3.1-flash-lite.handle_writer_tool_errorviaon_tool_error_callbackfor theai_writer. This prevents the entire session from crashing if a tool (like Google Search or url_context) fails, allowing the writer to react to the error message instead.Architecture
Test Plan
https://langfuse.cofacts.tw/project/cmm0emerr0001qi07eugd0760/traces/3c3c0aa36707f8a1a05f2487a5fca052?timestamp=2026-05-30T07:29:47.923Z
FileDatain LLM requests.ai_verifierreports both page metadata (viaurl_context) and observable video content.FileDatainjection.ai_writergracefully handles tool errors in a live session.Generated by Gemini CLI