docs: add zhihu post (MigoXLab#362)

e06084 · web-flow · commit 4b7164f78af4 · 2026-03-17T13:38:45.000+08:00
diff --git a/docs/posts/v2.1.0_hacker_news.md b/docs/posts/v2.1.0_hacker_news.md
@@ -25,4 +25,4 @@ Plugin architecture with decorator-based registration. Pydantic data model with
 
 SaaS: https://dingo.openxlab.org.cn/
 Install: `pip install dingo-python`
-GitHub: https://github.com/MigoXLab/dingo (650+ stars, Apache-2.0)
+GitHub: https://github.com/MigoXLab/dingo
diff --git a/docs/posts/v2.1.0_reddit.md b/docs/posts/v2.1.0_reddit.md
@@ -1,40 +1,24 @@
-# Dingo v2.1.0 — Open-source AI data quality evaluation now available as SaaS
+# [P] How we automated data quality checks for LLM training data — 100+ metrics across rules, LLM, VLM, and agents
 
-We just released Dingo v2.1.0. Alongside the open-source SDK, Dingo is now available as a hosted SaaS platform at **https://dingo.openxlab.org.cn/** — no local setup required, evaluate your data quality directly in the browser.
+We've been working on evaluating data quality for LLM training pipelines — SFT datasets, RAG outputs, OCR-parsed documents, etc. The problem we kept running into: simple heuristics catch formatting issues but miss semantic problems, and LLM-as-a-Judge alone isn't great at things that need external verification (e.g. factual accuracy).
 
-## Dingo SaaS Platform
+So we built a layered approach:
 
-- Upload datasets (JSONL, CSV, HuggingFace) and run evaluations through a web UI
-- Configure evaluation metrics, manage experiments, and view detailed reports online
-- API key support for programmatic access — integrate Dingo evaluation into your CI/CD or data pipelines
-- Free to use during the public preview
+1. **Rule layer** — fast, deterministic checks for the obvious stuff (encoding issues, repetition, special characters, formatting). ~50 rules, runs in milliseconds.
+2. **LLM layer** — any OpenAI-compatible model as evaluator, customizable prompts per use case.
+3. **VLM layer** — for OCR/document parsing: render the parsed output back to an image, then visually compare against the original. Catches layout issues that text-level diffing misses.
+4. **Agent layer** — this is the newest addition. Instead of a single prompt, we have an autonomous agent that can use tools (ArXiv search, claims extraction) to fact-check content step by step. Useful when evaluation requires external knowledge.
 
-👉 Try it now: https://dingo.openxlab.org.cn/
+For RAG specifically, we have metrics for faithfulness, context precision/recall, and answer relevancy — similar to RAGAS but integrated into the same framework.
 
-## Key updates in v2.1.0
+The Agent layer was the most interesting part to build. The problem: when you need to verify factual claims in an article, a single LLM prompt can't reliably do it — the model just guesses based on its training data. So we built an agent (LangChain ReAct) that autonomously: (1) extracts verifiable claims from text, (2) categorizes them (institutional, temporal, statistical, attribution...), and (3) verifies each claim using the right tool — ArXiv search for academic claims, web search for news/product claims. It generates a structured report with evidence for each claim. We've tested it on academic articles, news, product reviews, and tech blogs — the adaptive tool selection per claim type was key to getting reliable results.
 
-**Agent-as-a-Judge**
-Autonomous evaluation agents with tool use — the article fact-checking agent leverages ArXiv search and claims extraction to verify factual accuracy.
+The whole thing is plugin-based — adding a new evaluator is just a decorated class. Data model uses Pydantic `extra="allow"` so it adapts to whatever schema your dataset has. Multi-field evaluation lets you run different checks on different columns of the same dataset (e.g. check prompt quality and answer quality separately).
 
-**VLMRenderJudge**
-Visual comparison metric for OCR quality: a VLM directly compares rendered document output against the original image to detect parsing errors.
+We also recently put up a hosted version so teams can run evaluations without setting up Python environments locally — useful for non-engineering stakeholders who want to inspect data quality: https://dingo.openxlab.org.cn/
 
-**RAG evaluation**
-Built-in metrics for end-to-end RAG quality: Faithfulness, Context Precision, Context Recall, Answer Relevancy, Context Relevancy.
+The project is open source (Apache-2.0): https://github.com/MigoXLab/dingo
 
-**Framework improvements**
-- Excel output with summary sheets
-- Evaluators declare `_required_fields` for input validation
-- Gradio UI auto-displays required fields per evaluator
+`pip install dingo-python`
 
-## Quick start
-
-```bash
-pip install dingo-python
-```
-
-- SaaS: https://dingo.openxlab.org.cn/
-- GitHub: https://github.com/MigoXLab/dingo (650+ stars, Apache-2.0)
-- PyPI: https://pypi.org/project/dingo-python/
-
-Feedback and PRs welcome.
+Curious how others are handling data quality at scale — whether it's RAG evaluation (retrieval + generation), or using agents to automate fact-checking and content verification. Has anyone else tried agent-based approaches for quality assurance? What worked and what didn't?
diff --git a/docs/posts/v2.1.0_zhihu.md b/docs/posts/v2.1.0_zhihu.md
@@ -0,0 +1,114 @@
+# Dingo v2.1.0 发布：Agent-as-a-Judge、RAG 全链路评估，SaaS 平台同步上线
+
+## 背景
+
+大模型应用中，数据质量直接决定模型效果。无论是预训练语料、SFT 指令数据、RAG 检索结果，还是 OCR 文档解析，都需要系统化的质量评估手段。
+
+Dingo 是一个开源的 AI 数据质量评估框架，提供 70+ 内置评估指标，覆盖文本、RAG、OCR、多模态等场景。v2.1.0 是一个重要版本，引入了 Agent 自主评估能力，并同步上线了 SaaS 平台。
+
+## SaaS 平台上线
+
+Dingo SaaS 平台已正式开放：**https://dingo.openxlab.org.cn/**
+
+无需本地安装 Python 环境，浏览器中即可完成数据质量评估的全流程：
+
+- 上传数据集（JSONL、CSV、HuggingFace）
+- 配置评估指标组合
+- 查看评估报告和详细数据项
+- 支持 API Key 编程接入，可集成到现有的数据处理 pipeline
+
+公测期间免费使用。
+
+## 核心技术更新
+
+### 四层评估架构
+
+Dingo 采用 Rule → LLM → VLM → Agent 的四层评估架构，v2.1.0 补齐了 Agent 层：
+
+```
+Rule (50+ 规则)      确定性检查，速度快，零成本
+  ↓
+LLM-as-a-Judge       任意 OpenAI 兼容模型作为评估器
+  ↓
+VLM-as-a-Judge       视觉模型评估文档/图像质量
+  ↓
+Agent-as-a-Judge     自主 Agent，具备工具调用能力    ← v2.1.0 新增
+```
+
+### Agent-as-a-Judge
+
+v2.1.0 新增的文章事实核查 Agent（`AgentArticleFactChecker`），基于 LangChain 构建，具备以下工具：
+
+- **ArXiv Search**：搜索学术论文验证事实性声明
+- **Claims Extractor**：从文本中提取可验证的事实性声明
+
+Agent 会自主规划验证路径：提取声明 → 检索证据 → 逐条核实 → 输出评估结果。相比固定 prompt 的 LLM 评估，Agent 能处理需要外部知识验证的复杂场景。
+
+### VLMRenderJudge
+
+OCR 文档解析质量评估的传统方法依赖文本层面的对比，但很多排版问题只有视觉对比才能发现。VLMRenderJudge 的思路是：将解析结果重新渲染为图片，与原始文档图片进行视觉对比，由 VLM 判断是否存在解析偏差。
+
+### RAG 全链路评估
+
+v2.1.0 内置了完整的 RAG 评估指标体系：
+
+| 指标 | 评估目标 |
+|---|---|
+| Faithfulness | 回答是否忠实于检索到的上下文 |
+| Context Precision | 检索结果中相关内容的排序质量 |
+| Context Recall | 检索是否覆盖了回答所需的关键信息 |
+| Answer Relevancy | 回答与用户问题的相关性 |
+| Context Relevancy | 检索内容与问题的相关性 |
+
+### 框架改进
+
+**评估器字段声明**：所有评估器现在通过 `_required_fields` 声明所需输入字段，框架在评估前自动校验，避免运行时因字段缺失报错。
+
+**多字段评估**：同一数据集的不同字段可以分配不同的评估器。例如 QA 数据集中，`prompt` 字段用指令质量规则检查，`content` 字段用回答质量规则检查：
+
+```python
+"evaluator": [
+    {"fields": {"content": "content"}, "evals": [{"name": "RuleColonEnd"}]},
+    {"fields": {"content": "prompt"}, "evals": [{"name": "RuleColonEnd"}]}
+]
+```
+
+**插件架构**：评估器通过装饰器注册，扩展新规则只需定义一个类：
+
+```python
+@Model.rule_register('QUALITY_BAD', ['default'])
+class RuleMyCheck(BaseRule):
+    @classmethod
+    def eval(cls, input_data: Data) -> EvalDetail:
+        # 评估逻辑
+        ...
+```
+
+## 快速开始
+
+```bash
+pip install dingo-python
+```
+
+```python
+from dingo.config import InputArgs
+from dingo.exec import Executor
+
+if __name__ == '__main__':
+    input_data = {
+        "input_path": "your_data.jsonl",
+        "dataset": {"source": "local", "format": "jsonl"},
+        "evaluator": [
+            {"evals": [{"name": "RuleColonEnd"}, {"name": "RuleSpecialCharacter"}]}
+        ]
+    }
+    input_args = InputArgs(**input_data)
+    executor = Executor.exec_map["local"](input_args)
+    result = executor.execute()
+```
+
+## 链接
+
+- SaaS 平台：https://dingo.openxlab.org.cn/
+- GitHub：https://github.com/MigoXLab/dingo
+- PyPI：https://pypi.org/project/dingo-python/