Skip to content

feat(verifier): inject YouTube URLs as FileData for native video understanding#68

Merged
MrOrz merged 14 commits into
masterfrom
claude/great-heisenberg-4eV14
May 31, 2026
Merged

feat(verifier): inject YouTube URLs as FileData for native video understanding#68
MrOrz merged 14 commits into
masterfrom
claude/great-heisenberg-4eV14

Conversation

@MrOrz
Copy link
Copy Markdown
Member

@MrOrz MrOrz commented May 23, 2026

Summary

Cofacts.ai often hallucinates or provides incomplete context when dealing with YouTube videos because it previously only relied on web metadata. This PR enables native video understanding for YouTube URLs across the agent system.

YouTube Native Video Understanding

  • Automatic Injection: Added inject_youtube_filedata as a before_model_callback for both ai_verifier and ai_investigator. It scans for YouTube URLs in the user messages and appends them as FileData parts, allowing Gemini to watch the videos directly.
  • Complementary Context: Original URLs remain intact, so url_context (and Google Search grounding) continues to provide page-level metadata (titles, upload dates, descriptions) alongside the video frames.
  • Strict Instructions (Hard Rules): Updated instructions for verifier and investigator to enforce a "No Training Knowledge" policy for media. Agents are now required to report only what is directly visible/audible to prevent hallucinating identities or event details not present in the video.

System Robustness & Upgrades

  • Model Stability: Upgraded all proofreader agents from gemini-3.1-flash-lite-preview to the stable gemini-3.1-flash-lite.
  • Error Handling: Implemented handle_writer_tool_error via on_tool_error_callback for the ai_writer. This prevents the entire session from crashing if a tool (like Google Search or url_context) fails, allowing the writer to react to the error message instead.

Architecture

writer → investigator / verifier (input with youtube url)
              ↓ before_model_callback
              detect youtube urls in llm_request.contents
              append Content(role="user", parts=[FileData(file_uri=url)])
              ↓ model call
              url_context → youtube page HTML metadata (title, description)
              FileData    → Gemini watches video content directly

Test Plan

https://langfuse.cofacts.tw/project/cmm0emerr0001qi07eugd0760/traces/3c3c0aa36707f8a1a05f2487a5fca052?timestamp=2026-05-30T07:29:47.923Z

  • Verified that YouTube URLs are correctly injected as FileData in LLM requests. 圖片
  • Confirmed ai_verifier reports both page metadata (via url_context) and observable video content. 圖片
  • Confirmed that regular web URLs still work as expected without FileData injection.
  • Verify ai_writer gracefully handles tool errors in a live session.

Generated by Gemini CLI

…rstanding

url_context fetches webpage HTML and cannot process video streams, so
YouTube links passed to the verifier returned only page metadata at best.

Add inject_youtube_filedata before_model_callback to ai_verifier: it scans
llm_request.contents for YouTube URLs and appends them as FileData parts,
allowing Gemini to watch the videos natively. The original URLs are kept
intact so url_context still retrieves title/description grounding metadata.

https://claude.ai/code/session_01XueFovJMCf3vhYa2wMexK1
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the ability for the ai_verifier agent to process YouTube video content by extracting URLs from the conversation and appending them as FileData to the LLM request. Feedback focuses on expanding the YouTube URL regex to include shorts and live streams, ensuring the LLM request structure remains valid by avoiding consecutive user messages, and adding a missing type hint for the llm_request parameter.

Comment thread adk/cofacts_ai/agent.py Outdated
Comment thread adk/cofacts_ai/agent.py Outdated
Comment thread adk/cofacts_ai/agent.py Outdated
…tions

The verifier was filling gaps in video content with Gemini's training
knowledge (e.g. identifying an event as 2024 青鳥行動 when the video
never stated it) and hallucinating media citations not in the input URLs.

Add three hard rules to the instruction:
- No training knowledge: video reports must be based solely on what is
  visible/audible; unknown fields must be stated as "無法從影片判斷"
- Video title/hashtags are uploader metadata, not evidence of content
- No invented citations: Sources list must only contain input URLs

Langfuse trace: https://langfuse.cofacts.tw/project/cmm0emerr0001qi07eugd0760/sessions/3488376c-d9cd-4e08-ad2e-d18fc0e01361?peek=2744a1259f4c9caf825e5f353a15c8fc&timestamp=2026-05-23T18%3A11%3A44.843Z&observation=1361e5c2af5a5f51

https://claude.ai/code/session_01XueFovJMCf3vhYa2wMexK1
@MrOrz MrOrz force-pushed the claude/great-heisenberg-4eV14 branch from c91f1e0 to 7c0b106 Compare May 24, 2026 01:31
Previously a separate Content block was appended; now FileData parts
are added directly into the parts array of the user message that
contains the YouTube URL, keeping text and video content together
in one message as the API intends.

https://claude.ai/code/session_01XueFovJMCf3vhYa2wMexK1
- Regex now covers /live/, /embed/, /v/ paths in addition to
  /watch?v=, /shorts/, and youtu.be
- Add Any type hint to llm_request parameter

https://claude.ai/code/session_01XueFovJMCf3vhYa2wMexK1
… FileData into investigator

Verifier: replace "title is not evidence" with two-layer reporting —
uploader-provided metadata (title/description) and directly-observed
content are reported as separate labelled sections so the writer can
judge reliability rather than having the verifier suppress the title.

Investigator: apply inject_youtube_filedata before_model_callback so
it can watch YouTube videos before searching, preventing hallucinations
like wrong speaker identity inferred from training data alone.
Move _YOUTUBE_URL_RE and inject_youtube_filedata before ai_investigator
so the forward reference is resolved.

Langfuse trace: https://langfuse.cofacts.tw/project/cmm0emerr0001qi07eugd0760/sessions/67958cd8-920d-473c-b58f-151cc8a3b88d?peek=c61ea0203e9f57c7c679c3ca85533e34&timestamp=2026-05-24T02%3A42%3A22.105Z&observation=ef7f8fda2de00798

https://claude.ai/code/session_01XueFovJMCf3vhYa2wMexK1
… crashing writer turn

Any exception thrown by a tool (verifier, investigator, get_single_cofacts_article, etc.)
now returns a structured {"error": ..., "message": "[SYSTEM] ..."} dict to the writer
instead of propagating as an unhandled exception.

Root cause traced to a hallucinated Facebook URL passed to verifier — url_context's
server-side URL prefetch returned 400 INVALID_ARGUMENT, crashing the entire writer turn.
This fix makes the writer resilient to any tool failure, not just this one case.

Langfuse trace: https://langfuse.cofacts.tw/project/cmm0emerr0001qi07eugd0760/traces/c61ea0203e9f57c7c679c3ca85533e34?observation=4f5cfafe4e1386a9

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
claude and others added 3 commits May 25, 2026 03:48
…ploadDate

When a YouTube URL is present, inject_youtube_filedata already loads the video
as file_data so Gemini can watch it. But Gemini was skipping the url_context
call entirely — relying solely on video frames — and never fetching the page
metadata (uploadDate, publishedAt). This caused the agent to misidentify which
year's annual event was depicted, because upload date was the only signal that
distinguished old footage reused in a new upload.

Changes:
- Step 1 now explicitly states url_context is MANDATORY even when file_data is
  in context, and explains the two sources return different information.
- "Two-layer reporting" expanded to "Three-layer reporting for YouTube URLs":
  first layer is url_context page metadata (uploadDate required), then the
  uploader-provided title/description, then observed video content.
…o-agnostic three-layer reporting

Step 1 was over-indexed on the YouTube/file_data special case, making it
noisy for the majority of calls that don't involve video. Restructure so
the general rule (url_context is mandatory for all URLs) leads, with a
brief Note for the video case.

"Three-layer reporting for YouTube URLs" renamed to "When video content is
loaded in context" — same rule, but the trigger is the presence of video
in context, not the URL platform. Applies automatically to any future
video platform without prompt changes.
fix(verifier): require url_context for YouTube URLs to capture uploadDate
claude and others added 2 commits May 27, 2026 13:26
…-flash-lite

gemini-3.1-flash-lite-preview has been taken offline by Google.
Update all 4 proofreader agents (kmt, dpp, tpp, tpptw) to use the
recommended GA replacement model per Google's migration guidance.
…-3OHcs

fix: replace deprecated gemini-3.1-flash-lite-preview with gemini-3.1-flash-lite
@MrOrz MrOrz marked this pull request as ready for review May 30, 2026 08:11
@MrOrz MrOrz self-assigned this May 30, 2026
@MrOrz MrOrz requested review from andyy0216 and nonumpa May 30, 2026 08:12
@MrOrz MrOrz requested review from bil4444 and yutin1987 May 30, 2026 08:12
@MrOrz
Copy link
Copy Markdown
Member Author

MrOrz commented May 31, 2026

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented May 31, 2026

Claude finished @MrOrz's task in 2m 26s —— View job


Code Review

I've completed a thorough review of this PR. Overall, the implementation is solid and the YouTube video understanding feature is well-designed. I found a few areas that could be improved for robustness and clarity.


🔍 Findings

1. Missing Error Handling in inject_youtube_filedata (High Priority)

File: adk/cofacts_ai/agent.py:216-242

The callback function lacks error handling. If any exception occurs during URL extraction or FileData construction, it could crash the entire agent invocation.

Issue: Operations like accessing part.text, regex matching, or constructing FileData could fail.

Recommendation: Wrap the function body in a try-except block:

def inject_youtube_filedata(
    callback_context: CallbackContext, llm_request: Any
) -> None:
    """Before-model callback for ai_investigator and ai_verifier.
    
    For each user message that contains YouTube URLs, appends FileData parts
    into the same parts array so Gemini can watch the videos inline.
    The original URLs are kept intact so url_context still fetches their
    title/description metadata.
    
    Ref: https://ai.google.dev/gemini-api/docs/video-understanding#youtube
    """
    try:
        for content in llm_request.contents:
            if content.role != "user" or not content.parts:
                continue
            youtube_urls = []
            for part in content.parts:
                if part.text:
                    youtube_urls.extend(_YOUTUBE_URL_RE.findall(part.text))
            seen = set()
            for url in youtube_urls:
                if url not in seen:
                    seen.add(url)
                    content.parts.append(
                        genai_types.Part(file_data=genai_types.FileData(file_uri=url))
                    )
    except Exception as e:
        # Log but don't crash - YouTube injection is an enhancement, not critical
        # In production, you might want to use proper logging here
        pass
    return None

Fix this →


2. Deduplication Scope Consideration (Medium Priority)

File: adk/cofacts_ai/agent.py:235-241

The current implementation deduplicates YouTube URLs within each content block separately. If the same URL appears in multiple user messages within a single request, it will be injected multiple times.

Current behavior:

for content in llm_request.contents:  # Each content gets its own `seen` set
    # ...
    seen = set()
    for url in youtube_urls:
        if url not in seen:
            seen.add(url)
            # ...

Question: Is this intentional?

  • If yes: Document why each content block should get separate copies
  • If no: Move seen = set() outside the outer loop:
seen = set()  # Deduplicate across entire request
for content in llm_request.contents:
    if content.role != "user" or not content.parts:
        continue
    # ... rest of logic

This could improve performance and reduce token usage if the same video appears multiple times.


3. Type Annotation Completeness (Low Priority)

File: adk/cofacts_ai/agent.py:216-218

The function signature could be more explicit:

Current:

def inject_youtube_filedata(
    callback_context: CallbackContext, llm_request: Any
) -> None:

Suggested (if the ADK provides a type for the request object):

from google.genai.types import GenerateContentRequest  # or appropriate type

def inject_youtube_filedata(
    callback_context: CallbackContext, 
    llm_request: GenerateContentRequest
) -> None:

Using Any is acceptable if the ADK doesn't export the proper type, but worth checking the ADK documentation.


4. Instruction Clarity in Verifier (Low Priority)

File: adk/cofacts_ai/agent.py:307-311

The instruction says url_context is "MANDATORY" but could be clearer about whether this applies to ALL URLs or just certain types:

1. Call url_context for ALL provided URLs in one call (up to 20) — this is MANDATORY.
   url_context fetches the web PAGE metadata (title, publish date, description), which is
   always required regardless of what else is visible in this conversation.
   Note: for video URLs (e.g. YouTube), page metadata and video frames are complementary —
   url_context gives you the upload date; the video gives you observable content.

Suggestion: Make it explicit that ALL URLs need url_context:

1. Call url_context for ALL provided URLs (up to 20 per call) — this is MANDATORY, even for video URLs.
   url_context fetches web PAGE metadata (title, publish date, description) from the HTML.
   For video URLs like YouTube, page metadata and video frames are complementary:
   - url_context → upload date, uploader name, page title/description
   - FileData → observable video content (speech, visuals, on-screen text)

✅ What Looks Good

  1. YouTube URL Regex - Comprehensive pattern covering all major YouTube URL formats after previous review feedback
  2. Error Handling for Writer - The handle_writer_tool_error callback is well-designed and provides clear error messages
  3. Model Upgrade - Moving from preview to stable model (gemini-3.1-flash-lite) is the right call
  4. Three-Layer Reporting - The structured approach for video verification (page metadata → title/description → observable content) is excellent for preventing hallucination
  5. No Training Knowledge Rule - Strong safeguard against false identification in videos
  6. Architecture - Using before_model_callback is the right approach for this feature

📊 Overall Assessment

Strengths:

  • Solves a real problem (YouTube video hallucination)
  • Clean architectural approach
  • Comprehensive instructions for agents
  • Good separation of concerns

Risks:

  • Missing error handling could cause failures (should fix)
  • Potential redundant processing if same URL appears multiple times (worth investigating)

Recommendation: Add error handling (#1), clarify deduplication intent (#2), then this is good to merge.


📸 Test Plan Review

The test plan looks solid:

  • ✅ FileData injection verified
  • ✅ Verifier reports both metadata and video content
  • ✅ Regular URLs still work
  • ⏳ Writer tool error handling pending

Consider adding a test case for:

  • Multiple YouTube URLs in a single message
  • Same YouTube URL appearing multiple times
  • Malformed YouTube URLs that match the regex

- Add try/except so YouTube injection failures don't crash the agent
- Move `seen` set outside the content loop to deduplicate across the entire request
- Replace `Any` with `LlmRequest` type annotation
- Clarify verifier instruction: url_context is mandatory even for video URLs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@MrOrz
Copy link
Copy Markdown
Member Author

MrOrz commented May 31, 2026

Verified
image

@MrOrz MrOrz merged commit 6f264a2 into master May 31, 2026
1 check passed
@MrOrz MrOrz deleted the claude/great-heisenberg-4eV14 branch May 31, 2026 05:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants