From 6d6affad7c2b40045de29af57bd56883445c5864 Mon Sep 17 00:00:00 2001 From: Piar <2741277534@qq.com> Date: Sat, 14 Feb 2026 00:35:09 +0800 Subject: [PATCH] =?UTF-8?q?docs:=20=E6=B7=BB=E5=8A=A0=20Google=20Colab=20?= =?UTF-8?q?=E6=95=99=E7=A8=8B=E9=93=BE=E6=8E=A5?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- docs/en/notes/guide/agent/operator_assemble_line.md | 3 +++ docs/en/notes/guide/agent/operator_qa.md | 2 ++ docs/en/notes/guide/agent/operator_write.md | 2 ++ docs/en/notes/guide/agent/pipeline_prompt.md | 2 ++ docs/en/notes/guide/agent/pipeline_rec&refine.md | 4 ++++ docs/zh/notes/guide/agent/operator_assemble_line.md | 3 +++ docs/zh/notes/guide/agent/operator_qa.md | 2 ++ docs/zh/notes/guide/agent/operator_write.md | 2 ++ docs/zh/notes/guide/agent/pipeline_prompt.md | 2 ++ docs/zh/notes/guide/agent/pipeline_rec&refine.md | 4 ++++ 10 files changed, 26 insertions(+) diff --git a/docs/en/notes/guide/agent/operator_assemble_line.md b/docs/en/notes/guide/agent/operator_assemble_line.md index af9bf2f45..27b84da06 100644 --- a/docs/en/notes/guide/agent/operator_assemble_line.md +++ b/docs/en/notes/guide/agent/operator_assemble_line.md @@ -111,6 +111,9 @@ After the script is executed, the console will print: - **[Execution]**: Execution status. #### 5. Practical Case: General Text Reasoning and Pseudo-Answer Generation + +You can refer to the following tutorials for learning, and also use the sample of [Google Colab](https://colab.research.google.com/drive/1W3Wb1sTyea1xDAGmVu3Tyn7fcvrsppAp?usp=sharing) we provide to run the program: + We have a `tests/test.jsonl` file, where each line contains a `"raw_content"` field. Our goal is: based on the general English text content of this field, first invoke the large language model to generate reasoning-based answers for the text content, then generate pseudo-answers by generating candidate answers in multiple rounds and selecting the optimal one through statistics, and finally output key fields such as the list of candidate answers, optimal pseudo-answer, corresponding reasoning processes, and typical correct reasoning examples. Therefore, we select the `ReasoningAnswerGenerator` and `ReasoningPseudoAnswerGenerator` operators to orchestrate the Pipeline. The following is a complete configuration example: diff --git a/docs/en/notes/guide/agent/operator_qa.md b/docs/en/notes/guide/agent/operator_qa.md index 7dd6fff9e..b16db2f7e 100644 --- a/docs/en/notes/guide/agent/operator_qa.md +++ b/docs/en/notes/guide/agent/operator_qa.md @@ -116,6 +116,8 @@ After the script is executed, the console behaves differently depending on the m #### 4. Practical Case: Find Operators for "Data Cleaning" +You can refer to the following tutorials for learning, and also use the sample of [Google Colab](https://colab.research.google.com/drive/1maDKWp-3zEQNScmL_S7MHUdUC1xyCIcK?usp=sharing) we provide to run the program: + Suppose you need to clean data when developing a Pipeline and want to know if there are ready-made operators in the DataFlow library for processing. **Scenario Configuration**: We set it to one-time query mode and specify to save the results locally for viewing detailed parameters in the code later. diff --git a/docs/en/notes/guide/agent/operator_write.md b/docs/en/notes/guide/agent/operator_write.md index d772db39b..f304fea3a 100644 --- a/docs/en/notes/guide/agent/operator_write.md +++ b/docs/en/notes/guide/agent/operator_write.md @@ -152,6 +152,8 @@ During script execution, the following key information will be output: #### 4. Practical Case: Writing a Sentiment Analysis Operator +You can refer to the following tutorials for learning, and also use the sample of [Google Colab](https://colab.research.google.com/drive/1oTkwMNwxMFGAe9rNtYCC47CQ9HxsA0uH?usp=sharing) we provide to run the program: + We have a log file `tests/test.jsonl` containing the field `"raw_content"`. We want to create an operator to perform sentiment analysis on the text content of this field. **Configuration Example:** diff --git a/docs/en/notes/guide/agent/pipeline_prompt.md b/docs/en/notes/guide/agent/pipeline_prompt.md index fa033c4b0..6c65d83af 100644 --- a/docs/en/notes/guide/agent/pipeline_prompt.md +++ b/docs/en/notes/guide/agent/pipeline_prompt.md @@ -126,6 +126,8 @@ After the script is executed, the console will print the generation process. You #### 4. Practical Case: Reuse the ReasoningQuestionFilter to Write a Filter Prompt for Financial Questions +You can refer to the following tutorials for learning, and also use the sample of [Google Colab](https://colab.research.google.com/drive/1cU5Eg6tuc7WVDG33tU9Wplza52e54kts?usp=sharing) we provide to run the program: + Suppose we want to reuse the `ReasoningQuestionFilter` operator in the system and turn it into a filter for financial domain questions. Open the script and modify the configuration as follows: ```python diff --git a/docs/en/notes/guide/agent/pipeline_rec&refine.md b/docs/en/notes/guide/agent/pipeline_rec&refine.md index 2292c7428..1f3f8d26d 100644 --- a/docs/en/notes/guide/agent/pipeline_rec&refine.md +++ b/docs/en/notes/guide/agent/pipeline_rec&refine.md @@ -185,6 +185,8 @@ After the script is executed, the console will print the execution logs and the ##### 4. Practical Case: Pre-training Data Cleaning Pipeline +You can refer to the following tutorials for learning, and also use the sample of [Google Colab](https://colab.research.google.com/drive/1MMJxRpfYi7Zd-jc_pyhvM1Y2WoQXOFcu?usp=sharing) we provide to run the program: + Suppose we have pre-training data `tests/test.jsonl` containing dirty data, and we want to clean it to obtain high-quality data. Open the script and modify the configuration as follows: **Scenario Configuration:** @@ -300,6 +302,8 @@ python script/run_dfa_pipeline_refine.py ##### 3. Practical Case: Simplify the Pipeline +You can refer to the following tutorials for learning, and also use the sample of [Google Colab](https://colab.research.google.com/drive/1MMJxRpfYi7Zd-jc_pyhvM1Y2WoQXOFcu?usp=sharing) we provide to run the program: + Suppose the Pipeline generated in the previous step is too complex and contains redundant "cleaning" operators, and we want to remove them to simplify the Pipeline. **Scenario Configuration:** diff --git a/docs/zh/notes/guide/agent/operator_assemble_line.md b/docs/zh/notes/guide/agent/operator_assemble_line.md index 018b4dbd3..c63847dcc 100644 --- a/docs/zh/notes/guide/agent/operator_assemble_line.md +++ b/docs/zh/notes/guide/agent/operator_assemble_line.md @@ -111,6 +111,9 @@ python script/run_dfa_op_assemble.py - **[Execution]**: 执行情况。 #### 5. 实战 Case:通用文本推理与伪答案生成 + +你可以参考以下教程学习,也可以参考我们提供的[Google Colab](https://colab.research.google.com/drive/1W3Wb1sTyea1xDAGmVu3Tyn7fcvrsppAp?usp=sharing)样例来运行: + 我们有一个 `tests/test.jsonl` 文件,里面每行都有一个 `"raw_content"` 字段。我们希望:基于该字段的通用英文文本内容,先调用大语言模型针对文本内容生成推理式答案,再通过多轮生成候选答案并统计选优的方式生成伪答案,最终输出候选答案列表、最优伪答案、对应推理过程及典型正确推理示例等关键字段。所以我们选择 `ReasoningAnswerGenerator` 和 `ReasoningPseudoAnswerGenerator` 两个算子来编排 Pipeline。 以下是完整的配置示例: diff --git a/docs/zh/notes/guide/agent/operator_qa.md b/docs/zh/notes/guide/agent/operator_qa.md index 572677afe..06a310d25 100644 --- a/docs/zh/notes/guide/agent/operator_qa.md +++ b/docs/zh/notes/guide/agent/operator_qa.md @@ -118,6 +118,8 @@ python script/run_dfa_operator_qa.py #### 4. 实战 Case:查找“清洗数据”的算子 +你可以参考以下教程学习,也可以参考我们提供的[Google Colab](https://colab.research.google.com/drive/1maDKWp-3zEQNScmL_S7MHUdUC1xyCIcK?usp=sharing)样例来运行: + 假设您在开发 Pipeline 时遇到数据需要清洗,想知道 DataFlow 库里有没有现成的算子可以处理。 **场景配置:** 我们将其设置为单次查询模式,并指定将结果保存到本地,以便后续在代码中查看详细参数。 diff --git a/docs/zh/notes/guide/agent/operator_write.md b/docs/zh/notes/guide/agent/operator_write.md index 62caab2be..270a48b43 100644 --- a/docs/zh/notes/guide/agent/operator_write.md +++ b/docs/zh/notes/guide/agent/operator_write.md @@ -132,6 +132,8 @@ python script/run_dfa_operator_write.py #### 4. 实战 Case:编写一个情感分析算子 +你可以参考以下教程学习,也可以参考我们提供的[Google Colab](https://colab.research.google.com/drive/1oTkwMNwxMFGAe9rNtYCC47CQ9HxsA0uH?usp=sharing)样例来运行: + 我们有一个日志文件 `tests/test.jsonl`,其中包含字段 `"raw_content"`。我们希望创建一个算子,对该字段的文本内容进行情感分析。 **配置示例:** diff --git a/docs/zh/notes/guide/agent/pipeline_prompt.md b/docs/zh/notes/guide/agent/pipeline_prompt.md index e3ff1bff1..ae6e062f1 100644 --- a/docs/zh/notes/guide/agent/pipeline_prompt.md +++ b/docs/zh/notes/guide/agent/pipeline_prompt.md @@ -126,6 +126,8 @@ python script/run_dfa_pipeline_prompt.py #### 4. 实战 Case:复用ReasoningQuestionFilter过滤器,编写适用金融问题的过滤器提示词 +你可以参考以下教程学习,也可以参考我们提供的[Google Colab](https://colab.research.google.com/drive/1cU5Eg6tuc7WVDG33tU9Wplza52e54kts?usp=sharing)样例来运行: + 假设我们想复用系统中的 `ReasoningQuestionFilter` 算子,让它变成为一个金融领域问题的过滤器。打开脚本修改如下配置: ```python diff --git a/docs/zh/notes/guide/agent/pipeline_rec&refine.md b/docs/zh/notes/guide/agent/pipeline_rec&refine.md index f70f0d773..b3dcc5108 100644 --- a/docs/zh/notes/guide/agent/pipeline_rec&refine.md +++ b/docs/zh/notes/guide/agent/pipeline_rec&refine.md @@ -170,6 +170,8 @@ python script/run_dfa_pipeline_recommend.py ##### 4. 实战 Case:预训练数据清洗流水线 +你可以参考以下教程学习,也可以参考我们提供的[Google Colab](https://colab.research.google.com/drive/1MMJxRpfYi7Zd-jc_pyhvM1Y2WoQXOFcu?usp=sharing)样例来运行: + 假设我们有一个包含脏数据的预训练数据 `tests/test.jsonl`,我们希望清洗出一份高质量数据。打开脚本修改如下配置: **场景配置:** @@ -285,6 +287,8 @@ python script/run_dfa_pipeline_refine.py ##### 3. 实战 Case:简化流水线 +你可以参考以下教程学习,也可以参考我们提供的[Google Colab](https://colab.research.google.com/drive/1MMJxRpfYi7Zd-jc_pyhvM1Y2WoQXOFcu?usp=sharing)样例来运行: + 假设上一步生成的流水线太复杂,包含了多余的“清洗”算子,我们希望将其移除来简化 Pipeline。 **场景配置:**