feat(verifier): inject YouTube URLs as FileData for native video understanding by MrOrz · Pull Request #68 · cofacts/ai

MrOrz · 2026-05-23T18:04:01Z

Summary

Cofacts.ai often hallucinates or provides incomplete context when dealing with YouTube videos because it previously only relied on web metadata. This PR enables native video understanding for YouTube URLs across the agent system.

YouTube Native Video Understanding

Automatic Injection: Added inject_youtube_filedata as a before_model_callback for both ai_verifier and ai_investigator. It scans for YouTube URLs in the user messages and appends them as FileData parts, allowing Gemini to watch the videos directly.
Complementary Context: Original URLs remain intact, so url_context (and Google Search grounding) continues to provide page-level metadata (titles, upload dates, descriptions) alongside the video frames.
Strict Instructions (Hard Rules): Updated instructions for verifier and investigator to enforce a "No Training Knowledge" policy for media. Agents are now required to report only what is directly visible/audible to prevent hallucinating identities or event details not present in the video.

System Robustness & Upgrades

Model Stability: Upgraded all proofreader agents from gemini-3.1-flash-lite-preview to the stable gemini-3.1-flash-lite.
Error Handling: Implemented handle_writer_tool_error via on_tool_error_callback for the ai_writer. This prevents the entire session from crashing if a tool (like Google Search or url_context) fails, allowing the writer to react to the error message instead.

Architecture

writer → investigator / verifier (input with youtube url)
              ↓ before_model_callback
              detect youtube urls in llm_request.contents
              append Content(role="user", parts=[FileData(file_uri=url)])
              ↓ model call
              url_context → youtube page HTML metadata (title, description)
              FileData    → Gemini watches video content directly

Test Plan

https://langfuse.cofacts.tw/project/cmm0emerr0001qi07eugd0760/traces/3c3c0aa36707f8a1a05f2487a5fca052?timestamp=2026-05-30T07:29:47.923Z

Verified that YouTube URLs are correctly injected as FileData in LLM requests.
Confirmed ai_verifier reports both page metadata (via url_context) and observable video content.
Confirmed that regular web URLs still work as expected without FileData injection.
Verify ai_writer gracefully handles tool errors in a live session.

Generated by Gemini CLI

…rstanding url_context fetches webpage HTML and cannot process video streams, so YouTube links passed to the verifier returned only page metadata at best. Add inject_youtube_filedata before_model_callback to ai_verifier: it scans llm_request.contents for YouTube URLs and appends them as FileData parts, allowing Gemini to watch the videos natively. The original URLs are kept intact so url_context still retrieves title/description grounding metadata. https://claude.ai/code/session_01XueFovJMCf3vhYa2wMexK1

gemini-code-assist

Code Review

This pull request introduces the ability for the ai_verifier agent to process YouTube video content by extracting URLs from the conversation and appending them as FileData to the LLM request. Feedback focuses on expanding the YouTube URL regex to include shorts and live streams, ensuring the LLM request structure remains valid by avoiding consecutive user messages, and adding a missing type hint for the llm_request parameter.

The failing trace used https://www.youtube.com/shorts/rS3wrXncnoQ but the initial regex only matched watch?v= and youtu.be/ patterns. https://claude.ai/code/session_01XueFovJMCf3vhYa2wMexK1

…tions The verifier was filling gaps in video content with Gemini's training knowledge (e.g. identifying an event as 2024 青鳥行動 when the video never stated it) and hallucinating media citations not in the input URLs. Add three hard rules to the instruction: - No training knowledge: video reports must be based solely on what is visible/audible; unknown fields must be stated as "無法從影片判斷" - Video title/hashtags are uploader metadata, not evidence of content - No invented citations: Sources list must only contain input URLs Langfuse trace: https://langfuse.cofacts.tw/project/cmm0emerr0001qi07eugd0760/sessions/3488376c-d9cd-4e08-ad2e-d18fc0e01361?peek=2744a1259f4c9caf825e5f353a15c8fc&timestamp=2026-05-23T18%3A11%3A44.843Z&observation=1361e5c2af5a5f51 https://claude.ai/code/session_01XueFovJMCf3vhYa2wMexK1

Previously a separate Content block was appended; now FileData parts are added directly into the parts array of the user message that contains the YouTube URL, keeping text and video content together in one message as the API intends. https://claude.ai/code/session_01XueFovJMCf3vhYa2wMexK1

- Regex now covers /live/, /embed/, /v/ paths in addition to /watch?v=, /shorts/, and youtu.be - Add Any type hint to llm_request parameter https://claude.ai/code/session_01XueFovJMCf3vhYa2wMexK1

… FileData into investigator Verifier: replace "title is not evidence" with two-layer reporting — uploader-provided metadata (title/description) and directly-observed content are reported as separate labelled sections so the writer can judge reliability rather than having the verifier suppress the title. Investigator: apply inject_youtube_filedata before_model_callback so it can watch YouTube videos before searching, preventing hallucinations like wrong speaker identity inferred from training data alone. Move _YOUTUBE_URL_RE and inject_youtube_filedata before ai_investigator so the forward reference is resolved. Langfuse trace: https://langfuse.cofacts.tw/project/cmm0emerr0001qi07eugd0760/sessions/67958cd8-920d-473c-b58f-151cc8a3b88d?peek=c61ea0203e9f57c7c679c3ca85533e34&timestamp=2026-05-24T02%3A42%3A22.105Z&observation=ef7f8fda2de00798 https://claude.ai/code/session_01XueFovJMCf3vhYa2wMexK1

… crashing writer turn Any exception thrown by a tool (verifier, investigator, get_single_cofacts_article, etc.) now returns a structured {"error": ..., "message": "[SYSTEM] ..."} dict to the writer instead of propagating as an unhandled exception. Root cause traced to a hallucinated Facebook URL passed to verifier — url_context's server-side URL prefetch returned 400 INVALID_ARGUMENT, crashing the entire writer turn. This fix makes the writer resilient to any tool failure, not just this one case. Langfuse trace: https://langfuse.cofacts.tw/project/cmm0emerr0001qi07eugd0760/traces/c61ea0203e9f57c7c679c3ca85533e34?observation=4f5cfafe4e1386a9 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ploadDate When a YouTube URL is present, inject_youtube_filedata already loads the video as file_data so Gemini can watch it. But Gemini was skipping the url_context call entirely — relying solely on video frames — and never fetching the page metadata (uploadDate, publishedAt). This caused the agent to misidentify which year's annual event was depicted, because upload date was the only signal that distinguished old footage reused in a new upload. Changes: - Step 1 now explicitly states url_context is MANDATORY even when file_data is in context, and explains the two sources return different information. - "Two-layer reporting" expanded to "Three-layer reporting for YouTube URLs": first layer is url_context page metadata (uploadDate required), then the uploader-provided title/description, then observed video content.

…o-agnostic three-layer reporting Step 1 was over-indexed on the YouTube/file_data special case, making it noisy for the majority of calls that don't involve video. Restructure so the general rule (url_context is mandatory for all URLs) leads, with a brief Note for the video case. "Three-layer reporting for YouTube URLs" renamed to "When video content is loaded in context" — same rule, but the trigger is the presence of video in context, not the URL platform. Applies automatically to any future video platform without prompt changes.

fix(verifier): require url_context for YouTube URLs to capture uploadDate

…-flash-lite gemini-3.1-flash-lite-preview has been taken offline by Google. Update all 4 proofreader agents (kmt, dpp, tpp, tpptw) to use the recommended GA replacement model per Google's migration guidance.

…-3OHcs fix: replace deprecated gemini-3.1-flash-lite-preview with gemini-3.1-flash-lite

MrOrz · 2026-05-31T03:15:55Z

@claude review

claude · 2026-05-31T03:16:07Z

Claude finished @MrOrz's task in 2m 26s —— View job

Code Review

I've completed a thorough review of this PR. Overall, the implementation is solid and the YouTube video understanding feature is well-designed. I found a few areas that could be improved for robustness and clarity.

🔍 Findings

1. Missing Error Handling in `inject_youtube_filedata` (High Priority)

File: adk/cofacts_ai/agent.py:216-242

The callback function lacks error handling. If any exception occurs during URL extraction or FileData construction, it could crash the entire agent invocation.

Issue: Operations like accessing part.text, regex matching, or constructing FileData could fail.

Recommendation: Wrap the function body in a try-except block:

def inject_youtube_filedata(
    callback_context: CallbackContext, llm_request: Any
) -> None:
    """Before-model callback for ai_investigator and ai_verifier.
    
    For each user message that contains YouTube URLs, appends FileData parts
    into the same parts array so Gemini can watch the videos inline.
    The original URLs are kept intact so url_context still fetches their
    title/description metadata.
    
    Ref: https://ai.google.dev/gemini-api/docs/video-understanding#youtube
    """
    try:
        for content in llm_request.contents:
            if content.role != "user" or not content.parts:
                continue
            youtube_urls = []
            for part in content.parts:
                if part.text:
                    youtube_urls.extend(_YOUTUBE_URL_RE.findall(part.text))
            seen = set()
            for url in youtube_urls:
                if url not in seen:
                    seen.add(url)
                    content.parts.append(
                        genai_types.Part(file_data=genai_types.FileData(file_uri=url))
                    )
    except Exception as e:
        # Log but don't crash - YouTube injection is an enhancement, not critical
        # In production, you might want to use proper logging here
        pass
    return None

Fix this →

2. Deduplication Scope Consideration (Medium Priority)

File: adk/cofacts_ai/agent.py:235-241

The current implementation deduplicates YouTube URLs within each content block separately. If the same URL appears in multiple user messages within a single request, it will be injected multiple times.

Current behavior:

for content in llm_request.contents:  # Each content gets its own `seen` set
    # ...
    seen = set()
    for url in youtube_urls:
        if url not in seen:
            seen.add(url)
            # ...

Question: Is this intentional?

If yes: Document why each content block should get separate copies
If no: Move seen = set() outside the outer loop:

seen = set()  # Deduplicate across entire request
for content in llm_request.contents:
    if content.role != "user" or not content.parts:
        continue
    # ... rest of logic

This could improve performance and reduce token usage if the same video appears multiple times.

3. Type Annotation Completeness (Low Priority)

File: adk/cofacts_ai/agent.py:216-218

The function signature could be more explicit:

Current:

def inject_youtube_filedata(
    callback_context: CallbackContext, llm_request: Any
) -> None:

Suggested (if the ADK provides a type for the request object):

from google.genai.types import GenerateContentRequest  # or appropriate type

def inject_youtube_filedata(
    callback_context: CallbackContext, 
    llm_request: GenerateContentRequest
) -> None:

Using Any is acceptable if the ADK doesn't export the proper type, but worth checking the ADK documentation.

4. Instruction Clarity in Verifier (Low Priority)

File: adk/cofacts_ai/agent.py:307-311

The instruction says url_context is "MANDATORY" but could be clearer about whether this applies to ALL URLs or just certain types:

1. Call url_context for ALL provided URLs in one call (up to 20) — this is MANDATORY.
   url_context fetches the web PAGE metadata (title, publish date, description), which is
   always required regardless of what else is visible in this conversation.
   Note: for video URLs (e.g. YouTube), page metadata and video frames are complementary —
   url_context gives you the upload date; the video gives you observable content.

Suggestion: Make it explicit that ALL URLs need url_context:

1. Call url_context for ALL provided URLs (up to 20 per call) — this is MANDATORY, even for video URLs.
   url_context fetches web PAGE metadata (title, publish date, description) from the HTML.
   For video URLs like YouTube, page metadata and video frames are complementary:
   - url_context → upload date, uploader name, page title/description
   - FileData → observable video content (speech, visuals, on-screen text)

✅ What Looks Good

YouTube URL Regex - Comprehensive pattern covering all major YouTube URL formats after previous review feedback
Error Handling for Writer - The handle_writer_tool_error callback is well-designed and provides clear error messages
Model Upgrade - Moving from preview to stable model (gemini-3.1-flash-lite) is the right call
Three-Layer Reporting - The structured approach for video verification (page metadata → title/description → observable content) is excellent for preventing hallucination
No Training Knowledge Rule - Strong safeguard against false identification in videos
Architecture - Using before_model_callback is the right approach for this feature

📊 Overall Assessment

Strengths:

Solves a real problem (YouTube video hallucination)
Clean architectural approach
Comprehensive instructions for agents
Good separation of concerns

Risks:

Missing error handling could cause failures (should fix)
Potential redundant processing if same URL appears multiple times (worth investigating)

Recommendation: Add error handling (#1), clarify deduplication intent (#2), then this is good to merge.

📸 Test Plan Review

The test plan looks solid:

✅ FileData injection verified
✅ Verifier reports both metadata and video content
✅ Regular URLs still work
⏳ Writer tool error handling pending

Consider adding a test case for:

Multiple YouTube URLs in a single message
Same YouTube URL appearing multiple times
Malformed YouTube URLs that match the regex

- Add try/except so YouTube injection failures don't crash the agent - Move `seen` set outside the content loop to deduplicate across the entire request - Replace `Any` with `LlmRequest` type annotation - Clarify verifier instruction: url_context is mandatory even for video URLs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

MrOrz · 2026-05-31T05:01:21Z

Verified

github-actions Bot deployed to pr-68 May 23, 2026 18:04 View deployment

gemini-code-assist Bot reviewed May 23, 2026

View reviewed changes

Comment thread adk/cofacts_ai/agent.py Outdated

Comment thread adk/cofacts_ai/agent.py Outdated

Comment thread adk/cofacts_ai/agent.py Outdated

fix(verifier): extend YouTube URL regex to include /shorts/ paths

15adc24

The failing trace used https://www.youtube.com/shorts/rS3wrXncnoQ but the initial regex only matched watch?v= and youtu.be/ patterns. https://claude.ai/code/session_01XueFovJMCf3vhYa2wMexK1

github-actions Bot deployed to pr-68 May 23, 2026 18:08 View deployment

github-actions Bot deployed to pr-68 May 24, 2026 01:27 View deployment

MrOrz force-pushed the claude/great-heisenberg-4eV14 branch from c91f1e0 to 7c0b106 Compare May 24, 2026 01:31

github-actions Bot deployed to pr-68 May 24, 2026 01:32 View deployment

github-actions Bot deployed to pr-68 May 24, 2026 01:36 View deployment

fix(verifier): extend YouTube regex and add type hint per review

9d2f4ad

- Regex now covers /live/, /embed/, /v/ paths in addition to /watch?v=, /shorts/, and youtu.be - Add Any type hint to llm_request parameter https://claude.ai/code/session_01XueFovJMCf3vhYa2wMexK1

github-actions Bot deployed to pr-68 May 24, 2026 01:37 View deployment

docs(adk): add reference to youtube video understanding

b641e15

github-actions Bot deployed to pr-68 May 24, 2026 02:42 View deployment

github-actions Bot deployed to pr-68 May 24, 2026 03:20 View deployment

github-actions Bot deployed to pr-68 May 24, 2026 07:21 View deployment

claude and others added 3 commits May 25, 2026 03:48

Merge pull request #69 from cofacts/claude/hopeful-wozniak-t1xwI

352563b

fix(verifier): require url_context for YouTube URLs to capture uploadDate

github-actions Bot deployed to pr-68 May 25, 2026 08:03 View deployment

claude and others added 2 commits May 27, 2026 13:26

fix: replace deprecated gemini-3.1-flash-lite-preview with gemini-3.1…

934ed37

…-flash-lite gemini-3.1-flash-lite-preview has been taken offline by Google. Update all 4 proofreader agents (kmt, dpp, tpp, tpptw) to use the recommended GA replacement model per Google's migration guidance.

Merge pull request #71 from cofacts/claude/verifier-url-context-error…

6fafb1c

…-3OHcs fix: replace deprecated gemini-3.1-flash-lite-preview with gemini-3.1-flash-lite

github-actions Bot deployed to pr-68 May 27, 2026 13:44 View deployment

MrOrz marked this pull request as ready for review May 30, 2026 08:11

MrOrz self-assigned this May 30, 2026

MrOrz requested review from andyy0216 and nonumpa May 30, 2026 08:12

MrOrz requested review from bil4444 and yutin1987 May 30, 2026 08:12

bil4444 approved these changes May 31, 2026

View reviewed changes

github-actions Bot deployed to pr-68 May 31, 2026 04:40 View deployment

MrOrz merged commit 6f264a2 into master May 31, 2026
1 check passed

MrOrz deleted the claude/great-heisenberg-4eV14 branch May 31, 2026 05:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(verifier): inject YouTube URLs as FileData for native video understanding#68

feat(verifier): inject YouTube URLs as FileData for native video understanding#68
MrOrz merged 14 commits into
masterfrom
claude/great-heisenberg-4eV14

MrOrz commented May 23, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MrOrz commented May 31, 2026

Uh oh!

claude Bot commented May 31, 2026 •

edited

Loading

Uh oh!

MrOrz commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

MrOrz commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

YouTube Native Video Understanding

System Robustness & Upgrades

Architecture

Test Plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MrOrz commented May 31, 2026

Uh oh!

claude Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

🔍 Findings

1. Missing Error Handling in inject_youtube_filedata (High Priority)

2. Deduplication Scope Consideration (Medium Priority)

3. Type Annotation Completeness (Low Priority)

4. Instruction Clarity in Verifier (Low Priority)

✅ What Looks Good

📊 Overall Assessment

📸 Test Plan Review

Uh oh!

MrOrz commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MrOrz commented May 23, 2026 •

edited

Loading

claude Bot commented May 31, 2026 •

edited

Loading

1. Missing Error Handling in `inject_youtube_filedata` (High Priority)