Skip to content

Neuro-S/vericite

Repository files navigation

VeriCite(核引)

中文 · English

Agent Skill Python Dependencies Network License

VeriCite(核引) is a lightweight Agent Skill for academic reference verification, citation style normalization, identifier extraction, confidence labeling, and audit-ready reporting.


中文

为什么需要 VeriCite?

学术参考文献整理看似只是“改格式”,实际常常包含更多风险:作者名是否正确、DOI 是否匹配、期刊卷期页是否可靠、预印本与正式发表版本是否混用、中文与英文文献是否遵循同一投稿规范。

VeriCite 的目标不是替代 Zotero、EndNote 或数据库,而是帮助 Agent 在处理参考文献时遵循一套更可靠的工作流:

  • 先保留原文,再做修改
  • 先核验证据,再统一格式
  • 先标注置信度,再给出结果
  • 无法确认时明确说明,而不是编造

它适合安装到 OpenClaw、Hermes、ChatGPT、Codex、Claude Code、TRAE 等能够读取 Skill 文件夹的 Agent 环境中。

VeriCite 能做什么?

  • 拆分用户粘贴的参考文献列表
  • 提取 DOI、PMID、PMCID、arXiv、ISBN、URL 等标识符
  • 指导 Agent 使用可访问来源核验参考文献信息
  • 将参考文献统一为 APA 7、Vancouver、AMA、IEEE、ACM、MLA、Chicago、GB/T 7714 等格式
  • 生成带状态、置信度、来源、修改说明和警告的审计报告
  • 在证据不足、来源冲突或重复时保留原始条目,并标记为 needs-reviewunresolvedconflictduplicate

设计原则

VeriCite 是一个增强版轻量 Skill,而不是完整软件产品。

80% Skill 指令与规则
15% references 模板与示例
5% Python 标准库辅助脚本

核心原则:

  1. 不捏造:不得虚构作者、题名、期刊名、卷期页、DOI、ISBN、访问日期或发表状态。
  2. 可追溯:每条修改都应有来源、置信度和审计说明。
  3. 轻量安全:默认脚本不联网;在线 helper 必须显式 --allow-network 才联网。脚本不安装依赖、不读取敏感目录、不修改原始输入文件。
  4. 证据优先:格式统一不等于事实核验。只有可访问来源确认过的信息才应标记为 verified。
  5. 权限诚实:不能声称访问过 CNKI、Web of Science、Scopus、Embase、机构图书馆或订阅数据库,除非用户提供了访问权限或可读内容。
  6. 冲突不合并:当 DOI、题名、作者顺序、年份、期刊、卷期页等核心字段冲突时,必须标记 conflict

安装方式

将整个 vericite/ 文件夹复制或注册到你的 Agent Skill 目录中。核心入口文件是:

vericite/SKILL.md

如果你的 Agent 支持 OpenAI 风格的 Skill UI 元数据,也可以读取:

vericite/agents/openai.yaml

推荐发布或安装时保留完整目录结构,不要只复制 SKILL.md

文件结构

vericite/
├── SKILL.md
├── LICENSE
├── README.md
├── CONTRIBUTING.md
├── SECURITY.md
├── pyproject.toml
├── .gitignore
├── agents/
│   └── openai.yaml
├── references/
│   ├── workflow.md
│   ├── source-policy.md
│   ├── style-rules.md
│   ├── audit-template.md
│   ├── examples.md
│   └── playbooks.md
├── samples/
│   ├── sample_refs.txt
│   ├── sample_audit.json
│   ├── edge_refs_multiline.txt
│   ├── edge_refs_identifiers.txt
│   ├── edge_audit_conflicts.json
│   ├── sample_online_identifiers.json
│   ├── sample_online_results.json
│   ├── edge_online_conflict.json
│   ├── edge_online_restricted_source.json
│   └── fixtures/
│       ├── crossref_doi_200.json
│       ├── crossref_doi_404.json
│       ├── crossref_agency_crossref.json
│       ├── crossref_agency_datacite.json
│       ├── pubmed_pmid_200.json
│       ├── pubmed_pmid_not_found.json
│       ├── ncbi_pmcid_200.json
│       ├── arxiv_200.xml
│       ├── arxiv_429.json
│       ├── datacite_doi_200.json
│       ├── datacite_doi_404.json
│       ├── openalex_title_zh_200.json
│       └── invalid_response.txt
└── scripts/
    ├── __init__.py
    ├── common.py
    ├── split_references.py
    ├── extract_identifiers.py
    ├── format_audit_table.py
    ├── verify_online.py
    ├── verify_chinese.py
    ├── chinese_api_enrichment.py
    ├── cnki_browser_agent.py
    ├── vericite_cli.py
    ├── chinese_validation_suite.py
    └── run_smoke_tests.py

快速开始

统一 CLI 入口

VeriCite 提供统一的命令行工具 vericite_cli.py,支持拆分、提取、核验和格式化四个子命令,同时处理中文和英文参考文献:

cd vericite/

python3 scripts/vericite_cli.py split --input samples/sample_refs.txt --output /tmp/refs.json
python3 scripts/vericite_cli.py extract --input /tmp/refs.json --output /tmp/identifiers.json
python3 scripts/vericite_cli.py verify --input samples/sample_refs.txt --style gbt7714 --output /tmp/verified.txt
python3 scripts/vericite_cli.py format --input samples/sample_refs.txt --style gbt7714 --output /tmp/formatted.txt

verify 命令会解析中英文参考文献的结构化元数据,生成 GB/T 7714 格式引文,并附带验证警告。format 命令仅做格式化,不附带验证信息。

端到端流程示例

以下是一个从原始参考文献文本到完整审计报告的完整工作流:

第一步:拆分参考文献列表

python3 scripts/split_references.py --input my_refs.txt --output /tmp/vericite_refs.json

输入示例(my_refs.txt):

[1]陈衍冲,毕志坤,张润之,等.远端桡动脉入路在介入治疗中的研究进展[J].中国循环杂志,2025,40(11):1134-1138.
[2]Smith J,Brown K,Johnson L,et al.Deep learning for medical imaging analysis[J].Nature Medicine,2023,29(5):123-145. DOI:10.1038/s41591-023-01234-5.

第二步:提取标识符

python3 scripts/extract_identifiers.py --input /tmp/vericite_refs.json --output /tmp/vericite_identifiers.json

输出包含每条文献的 DOI、PMID、arXiv 等标识符及中文文献标记。

第三步:联网核验(可选)

python3 scripts/verify_online.py --input /tmp/vericite_identifiers.json --output /tmp/vericite_online_results.json --allow-network

第四步:生成审计报告

python3 scripts/format_audit_table.py --input /tmp/vericite_online_results.json --output /tmp/vericite_audit_report.md

单步脚本使用

也可以单独运行各脚本:

python3 scripts/split_references.py --input samples/sample_refs.txt --output /tmp/refs.json
python3 scripts/extract_identifiers.py --input /tmp/refs.json --output /tmp/identifiers.json
python3 scripts/format_audit_table.py --input samples/sample_audit.json --output /tmp/audit_report.md

运行轻量回归测试:

python3 scripts/run_smoke_tests.py

也可以通过管道使用:

python3 scripts/split_references.py < samples/sample_refs.txt
python3 scripts/extract_identifiers.py < /tmp/refs.json
python3 scripts/format_audit_table.py < samples/sample_audit.json

这些脚本只做本地解析和格式整理,不执行联网核验。真正的文献信息核验应由宿主 Agent 使用其可用的浏览器、搜索、API、文件读取或连接器能力完成。

联网核验

VeriCite v0.9.4.0-Preview 使用三层模式:

  1. Offline Parsing:本地拆分参考文献、提取 DOI/PMID/PMCID/arXiv/ISBN/URL、生成审计表。
  2. Host-Agent Online Verification:当用户要求校验、核验、审核或投稿清理,且宿主 Agent 有浏览、搜索、API 或网页访问能力、用户也没有禁止联网时,默认继续联网核验。
  3. Optional Online Helper Script:在本地环境允许联网时,可用 scripts/verify_online.py 查询公开、适合程序化访问的来源。它是在线核验预览模块,不替代宿主 Agent 的浏览、搜索、人工判断或受限来源授权。

默认行为:如果任务是 verifyverify-and-format 或 journal cleanup,不应只停留在本地脚本结果。完成本地拆分和标识符提取后,应继续使用 Crossref、PubMed/NCBI、arXiv、DOI resolver、DataCite、出版商页面、期刊官网或可访问网页进行外部核验。只有用户明确要求 no-browse/offline/format-only,或当前环境没有网络能力时,才降级为离线解析,并标注 not externally verified / 未经外部联网核验

可选联网辅助脚本默认 dry-run,不联网;只有显式加入 --allow-network 才会访问公共 API。NCBI_API_KEY 不是必需的,但在 NCBI/PubMed 上可获得更高请求额度;--contact-email 会传给 Crossref/NCBI 等支持礼貌联系参数的来源。

python3 scripts/verify_online.py --input /tmp/vericite_identifiers.json --output /tmp/vericite_online_results.json
python3 scripts/verify_online.py --input /tmp/vericite_identifiers.json --output /tmp/vericite_online_results.json --allow-network

完整流程示例:

python3 scripts/split_references.py --input samples/sample_refs.txt --output /tmp/vericite_refs.json
python3 scripts/extract_identifiers.py --input /tmp/vericite_refs.json --output /tmp/vericite_identifiers.json
python3 scripts/verify_online.py --input /tmp/vericite_identifiers.json --output /tmp/vericite_online_results.json --allow-network
python3 scripts/format_audit_table.py --input /tmp/vericite_online_results.json --output /tmp/vericite_audit_report.md

调试联网失败时可加入网络诊断:

python3 scripts/verify_online.py \
  --input /tmp/vericite_ids.json \
  --output /tmp/vericite_online.json \
  --allow-network \
  --sources crossref,pubmed,arxiv,doi,datacite,openalex \
  --contact-email you@example.com \
  --diagnose-network \
  --timeout 15 \
  --retries 2

--diagnose-network 只检查 API 端点可达性,不核验书目记录。HTTP 404 表示该源未找到对应路由或精确标识符,不是网络连通性问题;HTTP 429 表示 API 限流;DNS、TLS、连接失败或超时才提示可能与本地网络、DNS、代理、VPN、防火墙或地区连通性有关。中国大陆网络环境下可能需要用户自行配置 HTTP_PROXY / HTTPS_PROXY 或在可访问环境中运行;VeriCite 只报告诊断证据,不绕过网络限制或数据库权限。

Failure type Meaning User action
not-found source did not find exact identifier check DOI/PMID or try another source
dns-error local DNS/network failure check network/proxy/VPN
rate-limited API throttled request retry later / increase interval
access-denied restricted/blocked endpoint do not bypass; provide readable source
metadata-mismatch source returned different core fields mark conflict/needs-review

中文文献核验:脚本不自动爬取 CNKI、万方、维普,也不绕过登录、验证码、付费墙或机构权限。它可用 OpenAlex、DOI/DataCite/Crossref、期刊官网或公开页面做候选发现或部分确认;受限来源必须标注 restricted-source。如果用户上传 CNKI/万方/维普导出、截图或 PDF,可作为 user-provided evidence 写入审计。公开来源不足时,状态应保持 needs-reviewpartialunresolved

ISBN / 图书:当前脚本只做 ISBN 提取与校验位检查,不默认调用 Open Library 或商业/受限目录。图书和章节核验应由宿主 Agent 使用出版社页面、国家图书馆、WorldCat、Library of Congress、Open Library 等可访问目录,或使用用户提供的版权页/馆藏导出作为证据。

Python 运行环境说明

  • 辅助脚本需要 Python 3.9+,且只使用 Python 标准库,不需要安装第三方依赖。
  • 在 macOS/Linux 上,推荐使用 python3;如果你的环境中 python 可用,也可以等价使用 python
  • 在 Windows 上,通常可以使用 pythonpy -3
  • 可先运行 python3 --versionpython --version 检查本机是否已安装 Python。
  • 如果没有安装 Python,VeriCite 的核心 Skill 指令、工作流、样式规则和审计模板仍然可用;但 scripts/ 中的自动拆分、标识符提取和审计表生成等辅助功能将无法在本机运行,需由 Agent 手动完成或在具备 Python 的环境中运行。

TRAE / Codex / Agent 实战测试建议

  1. 将完整 vericite/ 文件夹放入 Agent 可读取的 Skill、工具或项目目录;不要只复制 SKILL.md
  2. 确认 references/scripts/samples/SKILL.md 保持在同一 vericite/ 根目录下。
  3. 在 TRAE、Codex 或类似 Agent 中让 Agent 读取 vericite/SKILL.md,必要时再读取 references/workflow.mdreferences/playbooks.md
  4. 先用 samples/sample_refs.txt 运行拆分、标识符提取和审计表生成的本地测试。
  5. 再粘贴一段真实参考文献文本,要求 Agent 输出格式化参考文献、审计摘要、审计表和未解决条目。
  6. 对核验任务,不要只运行本地脚本;本地解析后继续用可用网络工具查询 Crossref、PubMed、arXiv、DOI resolver、期刊官网或公开网页。
  7. 观察 Agent 是否保留原始参考文献,区分 verifiedpartialneeds-reviewunresolvedconflictduplicaterestricted-sourcenetwork-unavailablenot-checked,不捏造 DOI、卷期页、作者或期刊,不假装访问受限数据库,并输出审计表。

典型用户请求

你可以让安装了 VeriCite 的 Agent 处理类似任务:

请检查这些参考文献,补充可验证的 DOI,并按 APA 7 输出,同时给出审计报告。
请从这篇论文的参考文献部分提取条目,统一为 Vancouver 格式,标出无法核验的文献。
请把这些中英文混合参考文献整理为 GB/T 7714,并说明哪些字段没有来源支持。
请提取这组参考文献中的 DOI、PMID、PMCID、arXiv、ISBN 和 URL。

输出规范

对于“核验 + 格式统一”的任务,推荐输出:

  1. Reformatted references:按目标样式整理后的参考文献列表
  2. Audit summary:各状态数量与总体说明
  3. Audit table:ID、状态、置信度、来源、主要修改、警告
  4. Unresolved items:无法核验的条目和用户可补充的信息

推荐状态:

Status 含义
verified 权威或高可信来源确认核心信息
corrected 已依据来源或用户授权完成可追溯修正
partial 部分字段已确认,但仍有关键字段不确定
needs-review 匹配结果合理,但证据不足以自动确认
unresolved 缺少足够证据,不能可靠修改
conflict 多个可访问来源之间存在核心字段冲突
duplicate 重复或近似重复条目,需说明重复依据
restricted-source 来源需要登录、验证码、付费、机构权限或其他访问控制
network-unavailable 已尝试联网核验但网络、DNS、API、超时或限流失败
not-checked 未执行外部核验,通常是离线、no-browse 或未给脚本 --allow-network

隐私与安全边界

  • 默认脚本不联网;verify_online.py 只有在显式 --allow-network 时才联网
  • 脚本不安装第三方依赖
  • 脚本不读取敏感系统目录
  • 脚本不修改原始输入文件
  • 受限数据库只能在用户授权或提供可读内容时使用
  • 受限数据库包括 CNKI、Wanfang、VIP、Web of Science、Scopus、Embase、IEEE Xplore、ACM Digital Library、机构代理和订阅全文平台
  • 无法核验的内容必须保留原文并标注状态
  • 引用格式转换不能被表述为事实核验
  • 数据库中查不到不能直接推断文献不存在,只能说明当前可访问范围内未核验

适合谁使用?

VeriCite 适合:

  • 需要整理投稿参考文献的研究者
  • 需要批量检查 DOI 或标识符的作者
  • 需要让 Agent 输出可追溯修改记录的编辑或助研
  • 想把参考文献处理流程封装成可复用 Skill 的个人开发者
  • 希望在轻量、透明、低权限前提下使用 Agent 处理学术文献的人

贡献与安全

许可证

本项目采用 MIT License。版权人为 Wang Junjie (NeuroS)

发布阶段与扩展方向

当前版本为 v0.9.4.0-Preview,适合作为在线核验预览版 Skill 在受控 Agent 环境中安装和试用。建议继续保持 preview 阶段,优先收集真实参考文献处理场景中的审计、冲突、网络失败和受限来源反馈,再考虑稳定版发布。

未来扩展应继续保持轻量:

  • 增加更多期刊投稿格式清单
  • 增强 BibTeX、RIS、CSL JSON 的转换说明
  • 增加更多中英混合参考文献示例
  • 对接宿主 Agent 已授权的浏览、搜索或数据库连接器

English

Why VeriCite?

Reference cleanup is rarely just formatting. Real academic citation work often involves uncertain metadata, mismatched DOI values, preprint-versus-published-version ambiguity, incomplete journal details, and mixed-language style requirements.

VeriCite helps an Agent handle this work with a safer workflow:

  • Preserve the original text before editing
  • Verify evidence before normalizing style
  • Assign confidence before presenting corrections
  • Mark uncertainty clearly instead of inventing metadata

VeriCite is designed for OpenClaw, Hermes, ChatGPT, Codex, Claude Code, TRAE, and other Agent environments that can install or read a Skill folder.

What VeriCite Does

  • Split pasted bibliography text into reference entries
  • Extract DOI, PMID, PMCID, arXiv, ISBN, and URL values
  • Guide the host Agent through evidence-based reference verification
  • Normalize references to APA 7, Vancouver, AMA, IEEE, ACM, MLA, Chicago, GB/T 7714, and journal-specific styles
  • Produce audit reports with status, confidence, sources, changes, and warnings
  • Preserve uncertain entries and label them as needs-review, unresolved, or conflict
  • Detect repeated or near-identical entries and label them as duplicate with the matching basis

Design Philosophy

VeriCite is an enhanced lightweight Skill, not a full citation manager or standalone software product.

80% Skill instructions and rules
15% reference templates and examples
5% Python standard-library helper scripts

Core principles:

  1. No fabrication: never invent authors, titles, venues, volume/issue/page data, identifiers, access dates, or publication status.
  2. Traceability: every correction should have a source, confidence level, and audit note.
  3. Lightweight safety: scripts do not use the network by default; the online helper requires explicit --allow-network. Scripts do not install dependencies, read sensitive directories, or modify original input files.
  4. Evidence first: formatting is not verification. Only evidence-backed metadata should be labeled verified.
  5. Permission honesty: do not claim access to CNKI, Web of Science, Scopus, Embase, institutional catalogs, library proxies, or subscription databases unless the user provides access or readable content.
  6. No silent conflict merges: if core fields such as DOI, title, author order, year, journal, volume, issue, or pages disagree, label the item as conflict.

Installation

Copy or register the full vericite/ folder as an Agent Skill. The required entry point is:

vericite/SKILL.md

If your Agent supports OpenAI-style Skill metadata, it can also read:

vericite/agents/openai.yaml

For best results, keep the full folder structure instead of copying only SKILL.md.

Project Structure

vericite/
├── SKILL.md
├── LICENSE
├── README.md
├── CONTRIBUTING.md
├── SECURITY.md
├── pyproject.toml
├── .gitignore
├── agents/
│   └── openai.yaml
├── references/
│   ├── workflow.md
│   ├── source-policy.md
│   ├── style-rules.md
│   ├── audit-template.md
│   ├── examples.md
│   └── playbooks.md
├── samples/
│   ├── sample_refs.txt
│   ├── sample_audit.json
│   ├── edge_refs_multiline.txt
│   ├── edge_refs_identifiers.txt
│   ├── edge_audit_conflicts.json
│   ├── sample_online_identifiers.json
│   ├── sample_online_results.json
│   ├── edge_online_conflict.json
│   ├── edge_online_restricted_source.json
│   └── fixtures/
│       ├── crossref_doi_200.json
│       ├── crossref_doi_404.json
│       ├── crossref_agency_crossref.json
│       ├── crossref_agency_datacite.json
│       ├── pubmed_pmid_200.json
│       ├── pubmed_pmid_not_found.json
│       ├── ncbi_pmcid_200.json
│       ├── arxiv_200.xml
│       ├── arxiv_429.json
│       ├── datacite_doi_200.json
│       ├── datacite_doi_404.json
│       ├── openalex_title_zh_200.json
│       └── invalid_response.txt
└── scripts/
    ├── __init__.py
    ├── common.py
    ├── split_references.py
    ├── extract_identifiers.py
    ├── format_audit_table.py
    ├── verify_online.py
    ├── verify_chinese.py
    ├── chinese_api_enrichment.py
    ├── cnki_browser_agent.py
    ├── vericite_cli.py
    ├── chinese_validation_suite.py
    └── run_smoke_tests.py

Quick Start

Unified CLI

VeriCite provides a unified command-line tool vericite_cli.py with four subcommands that handle both Chinese and English references:

cd vericite/

python3 scripts/vericite_cli.py split --input samples/sample_refs.txt --output /tmp/refs.json
python3 scripts/vericite_cli.py extract --input /tmp/refs.json --output /tmp/identifiers.json
python3 scripts/vericite_cli.py verify --input samples/sample_refs.txt --style gbt7714 --output /tmp/verified.txt
python3 scripts/vericite_cli.py format --input samples/sample_refs.txt --style gbt7714 --output /tmp/formatted.txt

The verify command parses structured metadata from Chinese and English references, generates GB/T 7714 citations, and appends validation warnings. The format command produces formatted citations without validation notes.

End-to-End Workflow

The following example shows a complete workflow from raw reference text to a full audit report:

Step 1: Split the reference list

python3 scripts/split_references.py --input my_refs.txt --output /tmp/vericite_refs.json

Sample input (my_refs.txt):

[1]Chen YC,Bi ZK,Zhang RZ,et al.Research progress of distal radial artery access in interventional therapy[J].Chinese Circulation Journal,2025,40(11):1134-1138.
[2]Smith J,Brown K,Johnson L,et al.Deep learning for medical imaging analysis[J].Nature Medicine,2023,29(5):123-145. DOI:10.1038/s41591-023-01234-5.

Step 2: Extract identifiers

python3 scripts/extract_identifiers.py --input /tmp/vericite_refs.json --output /tmp/vericite_identifiers.json

The output includes DOI, PMID, arXiv, and other identifiers for each entry, plus a Chinese-literature flag.

Step 3: Online verification (optional)

python3 scripts/verify_online.py --input /tmp/vericite_identifiers.json --output /tmp/vericite_online_results.json --allow-network

Step 4: Generate the audit report

python3 scripts/format_audit_table.py --input /tmp/vericite_online_results.json --output /tmp/vericite_audit_report.md

Individual Scripts

You can also run each script separately:

python3 scripts/split_references.py --input samples/sample_refs.txt --output /tmp/refs.json
python3 scripts/extract_identifiers.py --input /tmp/refs.json --output /tmp/identifiers.json
python3 scripts/format_audit_table.py --input samples/sample_audit.json --output /tmp/audit_report.md

Run the lightweight regression tests:

python3 scripts/run_smoke_tests.py

Pipe-based usage is also supported:

python3 scripts/split_references.py < samples/sample_refs.txt
python3 scripts/extract_identifiers.py < /tmp/refs.json
python3 scripts/format_audit_table.py < samples/sample_audit.json

The scripts only perform local parsing and formatting. External verification should be performed by the host Agent using its available browsing, search, API, file-reading, or connector capabilities.

Online Verification

VeriCite v0.9.4.0-Preview uses three levels:

  1. Offline Parsing: split references, extract DOI/PMID/PMCID/arXiv/ISBN/URL values, and render audit tables locally.
  2. Host-Agent Online Verification: when the user asks to verify, check, audit, or clean up references for submission, and the host Agent has browsing, search, API, or web-access tools, online verification is the default unless the user prohibits network access.
  3. Optional Online Helper Script: when local network access is allowed, scripts/verify_online.py can query public programmable sources. It is an online-verification preview module, not a replacement for host-Agent browsing, search, human judgment, or authorized restricted-source access.

Default behavior: for verify, verify-and-format, or journal cleanup tasks, do not stop after local helper scripts. After splitting and identifier extraction, continue with Crossref, PubMed/NCBI, arXiv, DOI resolver, DataCite, publisher pages, journal pages, or accessible web pages. Fall back to offline parsing only when the user requests no-browse/offline/format-only mode or the environment has no network capability, and mark results not externally verified.

The optional helper script defaults to dry-run and does not use the network. It only reaches public APIs with --allow-network. NCBI_API_KEY is optional but can raise NCBI/PubMed request limits; --contact-email is passed to sources that support polite contact parameters.

python3 scripts/verify_online.py --input /tmp/vericite_identifiers.json --output /tmp/vericite_online_results.json
python3 scripts/verify_online.py --input /tmp/vericite_identifiers.json --output /tmp/vericite_online_results.json --allow-network

Full workflow example:

python3 scripts/split_references.py --input samples/sample_refs.txt --output /tmp/vericite_refs.json
python3 scripts/extract_identifiers.py --input /tmp/vericite_refs.json --output /tmp/vericite_identifiers.json
python3 scripts/verify_online.py --input /tmp/vericite_identifiers.json --output /tmp/vericite_online_results.json --allow-network
python3 scripts/format_audit_table.py --input /tmp/vericite_online_results.json --output /tmp/vericite_audit_report.md

Use diagnostics when online verification fails:

python3 scripts/verify_online.py \
  --input /tmp/vericite_ids.json \
  --output /tmp/vericite_online.json \
  --allow-network \
  --sources crossref,pubmed,arxiv,doi,datacite,openalex \
  --contact-email you@example.com \
  --diagnose-network \
  --timeout 15 \
  --retries 2

--diagnose-network checks endpoint reachability only; it does not verify bibliographic records. HTTP 404 means the source did not find that route or exact identifier and is not a network connectivity problem. HTTP 429 means API rate limiting. DNS, TLS, connection failures, or timeouts may indicate local network, proxy, VPN, firewall, or regional reachability issues. In mainland China or other constrained environments, users may need to configure HTTP_PROXY / HTTPS_PROXY or run from a reachable network; VeriCite reports evidence and does not bypass network restrictions or database permissions.

Failure type Meaning User action
not-found source did not find exact identifier check DOI/PMID or try another source
dns-error local DNS/network failure check network/proxy/VPN
rate-limited API throttled request retry later / increase interval
access-denied restricted/blocked endpoint do not bypass; provide readable source
metadata-mismatch source returned different core fields mark conflict/needs-review

Chinese literature: CNKI, Wanfang, VIP, and similar databases are not scraped by the helper script, and login, CAPTCHA, paywall, or institutional controls must not be bypassed. OpenAlex, DOI/DataCite/Crossref, journal sites, and public pages can support candidate discovery or partial confirmation. Restricted sources must be marked restricted-source. User-uploaded CNKI/Wanfang/VIP exports, screenshots, or PDFs may be cited as user-provided evidence. When public sources are insufficient, keep the status needs-review, partial, or unresolved.

ISBN / books: the current scripts only extract ISBN values and check ISBN checksums. They do not call Open Library or commercial/restricted catalogs by default. Book and chapter verification should be handled by the host Agent through publisher pages, national libraries, WorldCat, Library of Congress, Open Library, other accessible catalogs, or user-provided copyright pages/catalog exports.

Python Runtime Notes

  • Helper scripts require Python 3.9+ and use only the Python standard library. No third-party packages are required.
  • On macOS/Linux, prefer python3; if your environment maps python to Python 3, python is also fine.
  • On Windows, use python or py -3.
  • Check your local environment with python3 --version or python --version.
  • If Python is not installed, VeriCite's core Skill instructions, workflows, style rules, and audit templates remain usable; however, the optional helper automation in scripts/ cannot run locally, so splitting, identifier extraction, and audit-table generation must be done manually by the Agent or run in another Python-enabled environment.

TRAE / Codex / Agent Smoke Test

  1. Put the full vericite/ folder where the Agent can read Skills, tools, or project resources; do not copy only SKILL.md.
  2. Keep references/, scripts/, samples/, and SKILL.md together under the same vericite/ root.
  3. In TRAE, Codex, or a similar Agent environment, ask the Agent to read vericite/SKILL.md; load references/workflow.md and references/playbooks.md when the task needs more detail.
  4. Use samples/sample_refs.txt once to test local splitting, identifier extraction, and audit table generation.
  5. Then paste a real reference list and ask for formatted references, an audit summary, an audit table, and unresolved items.
  6. For verification tasks, do not stop at local scripts; after parsing, continue with available network tools such as Crossref, PubMed, arXiv, DOI resolver, journal pages, or public web search.
  7. Check whether the Agent preserves original reference text, distinguishes verified, partial, needs-review, unresolved, conflict, duplicate, restricted-source, network-unavailable, and not-checked, avoids fabricating DOI/volume/pages/authors/journals, does not pretend to access restricted databases, and returns an audit table.

Example User Requests

Verify these references, add evidence-backed DOI values where possible, convert them to APA 7, and include an audit report.
Extract the reference section from this manuscript, normalize it to Vancouver style, and mark unresolved entries.
Format these mixed Chinese and English references according to GB/T 7714 and explain which fields are not source-backed.
Extract DOI, PMID, PMCID, arXiv, ISBN, and URL identifiers from this bibliography.

Output Contract

For verification plus formatting tasks, VeriCite recommends this output structure:

  1. Reformatted references: the final list in the requested style
  2. Audit summary: counts and overall findings
  3. Audit table: ID, status, confidence, source, key changes, warnings
  4. Unresolved items: entries that need more evidence from the user

Recommended statuses:

Status Meaning
verified Core metadata is confirmed by an authoritative or high-confidence source
corrected A traceable source-backed or user-authorized correction was applied
partial Some fields are confirmed, but at least one important field remains uncertain
needs-review The match is plausible but not source-backed enough for automatic correction
unresolved There is not enough evidence to correct or verify the entry
conflict Accessible sources disagree on core metadata
duplicate Repeated or near-identical entries likely describe the same work
restricted-source The source requires login, CAPTCHA, payment, institutional access, or another access control
network-unavailable Online verification was attempted but failed due to network, DNS, API, timeout, or rate-limit issues
not-checked External verification was not attempted, usually because of offline/no-browse mode or missing script-level --allow-network

Privacy and Safety

  • Scripts do not use the network by default; verify_online.py only goes online with explicit --allow-network
  • Scripts do not install third-party dependencies
  • Scripts do not read sensitive system directories
  • Scripts do not modify original input files
  • Restricted databases must only be used with user-provided access or readable content
  • Restricted databases include CNKI, Wanfang, VIP, Web of Science, Scopus, Embase, IEEE Xplore, ACM Digital Library, institutional proxies, and subscription full-text platforms
  • Unverified metadata must remain marked as uncertain
  • Style conversion must not be described as factual verification
  • A failed search does not prove that a reference does not exist; it only limits the verification scope

Who Should Use It?

VeriCite is useful for:

  • Researchers preparing references for submission
  • Authors checking DOI and identifier consistency
  • Editors or assistants who need traceable citation cleanup
  • Independent developers building reusable Agent Skills
  • Anyone who wants lightweight, transparent, low-permission reference workflows

Contributing and Security

License

This project is released under the MIT License. Copyright holder: Wang Junjie (NeuroS).

Release Stage and Extension Direction

Current version: v0.9.4.0-Preview. It is suitable as an online-verification preview Skill for controlled Agent environments. Keeping the project in preview is recommended while collecting real-world feedback on audits, conflicts, network failures, restricted sources, and journal submission workflows.

Future extensions should remain lightweight:

  • More journal submission style checklists
  • More guidance for BibTeX, RIS, and CSL JSON conversion
  • More mixed Chinese-English reference examples
  • Optional use of host-Agent browsing, search, or database connectors when already authorized

About

Lightweight Agent Skill for academic reference verification, citation normalization, and audit-ready reporting

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages