Commit a96731d
feat(tool): add arxiv HTML affiliation fetch for institutional claim verification
Enhance arxiv_search with optional fetch_affiliations config that scrapes the
ltx_authors section from arXiv HTML paper pages, providing authoritative
author+institution text (e.g. "1 Shanghai AI Laboratory 2 Abaka AI") that
the arXiv Atom API and feedparser do not expose.
Changes:
- arxiv_search.py: add ArxivConfig.fetch_affiliations field (default False),
_fetch_html_affiliations() method, and per-result HTML enrichment loop in
execute(); fix 429 error message (was generic "Search failed: HTTPError")
- agent_article_fact_checker.py: update TOOLS_DESCRIPTION, WORKFLOW_STEPS and
PER_CLAIM_VERIFICATION_PROMPT so agents treat affiliations_text as the
authoritative source for institutional/attribution claims
- Enable fetch_affiliations=True in all three entry points:
.claude/skills/dingo-verify/scripts/fact_check.py,
skills/dingo-verify/scripts/fact_check.py,
examples/agent/agent_article_fact_checking_example.py
Result: institutional claims previously UNVERIFIABLE (e.g. "OmniDocBench
released by Tsinghua/Alibaba/Shanghai AI Lab") now correctly judged FALSE
from paper affiliation data without requiring Tavily web search.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>1 parent 5729c44 commit a96731d
5 files changed
Lines changed: 966 additions & 882 deletions
File tree
- .claude/skills/dingo-verify/scripts
- dingo/model/llm/agent
- tools
- examples/agent
- skills/dingo-verify/scripts
0 commit comments