Skip to content

Commit 4b7164f

Browse files
authored
docs: add zhihu post (MigoXLab#362)
1 parent dc17286 commit 4b7164f

3 files changed

Lines changed: 129 additions & 31 deletions

File tree

docs/posts/v2.1.0_hacker_news.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,4 +25,4 @@ Plugin architecture with decorator-based registration. Pydantic data model with
2525

2626
SaaS: https://dingo.openxlab.org.cn/
2727
Install: `pip install dingo-python`
28-
GitHub: https://github.com/MigoXLab/dingo (650+ stars, Apache-2.0)
28+
GitHub: https://github.com/MigoXLab/dingo

docs/posts/v2.1.0_reddit.md

Lines changed: 14 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,40 +1,24 @@
1-
# Dingo v2.1.0 — Open-source AI data quality evaluation now available as SaaS
1+
# [P] How we automated data quality checks for LLM training data — 100+ metrics across rules, LLM, VLM, and agents
22

3-
We just released Dingo v2.1.0. Alongside the open-source SDK, Dingo is now available as a hosted SaaS platform at **https://dingo.openxlab.org.cn/** — no local setup required, evaluate your data quality directly in the browser.
3+
We've been working on evaluating data quality for LLM training pipelines — SFT datasets, RAG outputs, OCR-parsed documents, etc. The problem we kept running into: simple heuristics catch formatting issues but miss semantic problems, and LLM-as-a-Judge alone isn't great at things that need external verification (e.g. factual accuracy).
44

5-
## Dingo SaaS Platform
5+
So we built a layered approach:
66

7-
- Upload datasets (JSONL, CSV, HuggingFace) and run evaluations through a web UI
8-
- Configure evaluation metrics, manage experiments, and view detailed reports online
9-
- API key support for programmatic access — integrate Dingo evaluation into your CI/CD or data pipelines
10-
- Free to use during the public preview
7+
1. **Rule layer** — fast, deterministic checks for the obvious stuff (encoding issues, repetition, special characters, formatting). ~50 rules, runs in milliseconds.
8+
2. **LLM layer** — any OpenAI-compatible model as evaluator, customizable prompts per use case.
9+
3. **VLM layer** for OCR/document parsing: render the parsed output back to an image, then visually compare against the original. Catches layout issues that text-level diffing misses.
10+
4. **Agent layer** — this is the newest addition. Instead of a single prompt, we have an autonomous agent that can use tools (ArXiv search, claims extraction) to fact-check content step by step. Useful when evaluation requires external knowledge.
1111

12-
👉 Try it now: https://dingo.openxlab.org.cn/
12+
For RAG specifically, we have metrics for faithfulness, context precision/recall, and answer relevancy — similar to RAGAS but integrated into the same framework.
1313

14-
## Key updates in v2.1.0
14+
The Agent layer was the most interesting part to build. The problem: when you need to verify factual claims in an article, a single LLM prompt can't reliably do it — the model just guesses based on its training data. So we built an agent (LangChain ReAct) that autonomously: (1) extracts verifiable claims from text, (2) categorizes them (institutional, temporal, statistical, attribution...), and (3) verifies each claim using the right tool — ArXiv search for academic claims, web search for news/product claims. It generates a structured report with evidence for each claim. We've tested it on academic articles, news, product reviews, and tech blogs — the adaptive tool selection per claim type was key to getting reliable results.
1515

16-
**Agent-as-a-Judge**
17-
Autonomous evaluation agents with tool use — the article fact-checking agent leverages ArXiv search and claims extraction to verify factual accuracy.
16+
The whole thing is plugin-based — adding a new evaluator is just a decorated class. Data model uses Pydantic `extra="allow"` so it adapts to whatever schema your dataset has. Multi-field evaluation lets you run different checks on different columns of the same dataset (e.g. check prompt quality and answer quality separately).
1817

19-
**VLMRenderJudge**
20-
Visual comparison metric for OCR quality: a VLM directly compares rendered document output against the original image to detect parsing errors.
18+
We also recently put up a hosted version so teams can run evaluations without setting up Python environments locally — useful for non-engineering stakeholders who want to inspect data quality: https://dingo.openxlab.org.cn/
2119

22-
**RAG evaluation**
23-
Built-in metrics for end-to-end RAG quality: Faithfulness, Context Precision, Context Recall, Answer Relevancy, Context Relevancy.
20+
The project is open source (Apache-2.0): https://github.com/MigoXLab/dingo
2421

25-
**Framework improvements**
26-
- Excel output with summary sheets
27-
- Evaluators declare `_required_fields` for input validation
28-
- Gradio UI auto-displays required fields per evaluator
22+
`pip install dingo-python`
2923

30-
## Quick start
31-
32-
```bash
33-
pip install dingo-python
34-
```
35-
36-
- SaaS: https://dingo.openxlab.org.cn/
37-
- GitHub: https://github.com/MigoXLab/dingo (650+ stars, Apache-2.0)
38-
- PyPI: https://pypi.org/project/dingo-python/
39-
40-
Feedback and PRs welcome.
24+
Curious how others are handling data quality at scale — whether it's RAG evaluation (retrieval + generation), or using agents to automate fact-checking and content verification. Has anyone else tried agent-based approaches for quality assurance? What worked and what didn't?

docs/posts/v2.1.0_zhihu.md

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
# Dingo v2.1.0 发布:Agent-as-a-Judge、RAG 全链路评估,SaaS 平台同步上线
2+
3+
## 背景
4+
5+
大模型应用中,数据质量直接决定模型效果。无论是预训练语料、SFT 指令数据、RAG 检索结果,还是 OCR 文档解析,都需要系统化的质量评估手段。
6+
7+
Dingo 是一个开源的 AI 数据质量评估框架,提供 70+ 内置评估指标,覆盖文本、RAG、OCR、多模态等场景。v2.1.0 是一个重要版本,引入了 Agent 自主评估能力,并同步上线了 SaaS 平台。
8+
9+
## SaaS 平台上线
10+
11+
Dingo SaaS 平台已正式开放:**https://dingo.openxlab.org.cn/**
12+
13+
无需本地安装 Python 环境,浏览器中即可完成数据质量评估的全流程:
14+
15+
- 上传数据集(JSONL、CSV、HuggingFace)
16+
- 配置评估指标组合
17+
- 查看评估报告和详细数据项
18+
- 支持 API Key 编程接入,可集成到现有的数据处理 pipeline
19+
20+
公测期间免费使用。
21+
22+
## 核心技术更新
23+
24+
### 四层评估架构
25+
26+
Dingo 采用 Rule → LLM → VLM → Agent 的四层评估架构,v2.1.0 补齐了 Agent 层:
27+
28+
```
29+
Rule (50+ 规则) 确定性检查,速度快,零成本
30+
31+
LLM-as-a-Judge 任意 OpenAI 兼容模型作为评估器
32+
33+
VLM-as-a-Judge 视觉模型评估文档/图像质量
34+
35+
Agent-as-a-Judge 自主 Agent,具备工具调用能力 ← v2.1.0 新增
36+
```
37+
38+
### Agent-as-a-Judge
39+
40+
v2.1.0 新增的文章事实核查 Agent(`AgentArticleFactChecker`),基于 LangChain 构建,具备以下工具:
41+
42+
- **ArXiv Search**:搜索学术论文验证事实性声明
43+
- **Claims Extractor**:从文本中提取可验证的事实性声明
44+
45+
Agent 会自主规划验证路径:提取声明 → 检索证据 → 逐条核实 → 输出评估结果。相比固定 prompt 的 LLM 评估,Agent 能处理需要外部知识验证的复杂场景。
46+
47+
### VLMRenderJudge
48+
49+
OCR 文档解析质量评估的传统方法依赖文本层面的对比,但很多排版问题只有视觉对比才能发现。VLMRenderJudge 的思路是:将解析结果重新渲染为图片,与原始文档图片进行视觉对比,由 VLM 判断是否存在解析偏差。
50+
51+
### RAG 全链路评估
52+
53+
v2.1.0 内置了完整的 RAG 评估指标体系:
54+
55+
| 指标 | 评估目标 |
56+
|---|---|
57+
| Faithfulness | 回答是否忠实于检索到的上下文 |
58+
| Context Precision | 检索结果中相关内容的排序质量 |
59+
| Context Recall | 检索是否覆盖了回答所需的关键信息 |
60+
| Answer Relevancy | 回答与用户问题的相关性 |
61+
| Context Relevancy | 检索内容与问题的相关性 |
62+
63+
### 框架改进
64+
65+
**评估器字段声明**:所有评估器现在通过 `_required_fields` 声明所需输入字段,框架在评估前自动校验,避免运行时因字段缺失报错。
66+
67+
**多字段评估**:同一数据集的不同字段可以分配不同的评估器。例如 QA 数据集中,`prompt` 字段用指令质量规则检查,`content` 字段用回答质量规则检查:
68+
69+
```python
70+
"evaluator": [
71+
{"fields": {"content": "content"}, "evals": [{"name": "RuleColonEnd"}]},
72+
{"fields": {"content": "prompt"}, "evals": [{"name": "RuleColonEnd"}]}
73+
]
74+
```
75+
76+
**插件架构**:评估器通过装饰器注册,扩展新规则只需定义一个类:
77+
78+
```python
79+
@Model.rule_register('QUALITY_BAD', ['default'])
80+
class RuleMyCheck(BaseRule):
81+
@classmethod
82+
def eval(cls, input_data: Data) -> EvalDetail:
83+
# 评估逻辑
84+
...
85+
```
86+
87+
## 快速开始
88+
89+
```bash
90+
pip install dingo-python
91+
```
92+
93+
```python
94+
from dingo.config import InputArgs
95+
from dingo.exec import Executor
96+
97+
if __name__ == '__main__':
98+
input_data = {
99+
"input_path": "your_data.jsonl",
100+
"dataset": {"source": "local", "format": "jsonl"},
101+
"evaluator": [
102+
{"evals": [{"name": "RuleColonEnd"}, {"name": "RuleSpecialCharacter"}]}
103+
]
104+
}
105+
input_args = InputArgs(**input_data)
106+
executor = Executor.exec_map["local"](input_args)
107+
result = executor.execute()
108+
```
109+
110+
## 链接
111+
112+
- SaaS 平台:https://dingo.openxlab.org.cn/
113+
- GitHub:https://github.com/MigoXLab/dingo
114+
- PyPI:https://pypi.org/project/dingo-python/

0 commit comments

Comments
 (0)