Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/en/notes/guide/agent/operator_assemble_line.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,9 @@ After the script is executed, the console will print:
- **[Execution]**: Execution status.

#### 5. Practical Case: General Text Reasoning and Pseudo-Answer Generation

You can refer to the following tutorials for learning, and also use the sample of [Google Colab](https://colab.research.google.com/drive/1W3Wb1sTyea1xDAGmVu3Tyn7fcvrsppAp?usp=sharing) we provide to run the program:

We have a `tests/test.jsonl` file, where each line contains a `"raw_content"` field. Our goal is: based on the general English text content of this field, first invoke the large language model to generate reasoning-based answers for the text content, then generate pseudo-answers by generating candidate answers in multiple rounds and selecting the optimal one through statistics, and finally output key fields such as the list of candidate answers, optimal pseudo-answer, corresponding reasoning processes, and typical correct reasoning examples. Therefore, we select the `ReasoningAnswerGenerator` and `ReasoningPseudoAnswerGenerator` operators to orchestrate the Pipeline.

The following is a complete configuration example:
Expand Down
2 changes: 2 additions & 0 deletions docs/en/notes/guide/agent/operator_qa.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,8 @@ After the script is executed, the console behaves differently depending on the m

#### 4. Practical Case: Find Operators for "Data Cleaning"

You can refer to the following tutorials for learning, and also use the sample of [Google Colab](https://colab.research.google.com/drive/1maDKWp-3zEQNScmL_S7MHUdUC1xyCIcK?usp=sharing) we provide to run the program:

Suppose you need to clean data when developing a Pipeline and want to know if there are ready-made operators in the DataFlow library for processing.

**Scenario Configuration**: We set it to one-time query mode and specify to save the results locally for viewing detailed parameters in the code later.
Expand Down
2 changes: 2 additions & 0 deletions docs/en/notes/guide/agent/operator_write.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,8 @@ During script execution, the following key information will be output:

#### 4. Practical Case: Writing a Sentiment Analysis Operator

You can refer to the following tutorials for learning, and also use the sample of [Google Colab](https://colab.research.google.com/drive/1oTkwMNwxMFGAe9rNtYCC47CQ9HxsA0uH?usp=sharing) we provide to run the program:

We have a log file `tests/test.jsonl` containing the field `"raw_content"`. We want to create an operator to perform sentiment analysis on the text content of this field.

**Configuration Example:**
Expand Down
2 changes: 2 additions & 0 deletions docs/en/notes/guide/agent/pipeline_prompt.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,8 @@ After the script is executed, the console will print the generation process. You

#### 4. Practical Case: Reuse the ReasoningQuestionFilter to Write a Filter Prompt for Financial Questions

You can refer to the following tutorials for learning, and also use the sample of [Google Colab](https://colab.research.google.com/drive/1cU5Eg6tuc7WVDG33tU9Wplza52e54kts?usp=sharing) we provide to run the program:

Suppose we want to reuse the `ReasoningQuestionFilter` operator in the system and turn it into a filter for financial domain questions. Open the script and modify the configuration as follows:

```python
Expand Down
4 changes: 4 additions & 0 deletions docs/en/notes/guide/agent/pipeline_rec&refine.md
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,8 @@ After the script is executed, the console will print the execution logs and the

##### 4. Practical Case: Pre-training Data Cleaning Pipeline

You can refer to the following tutorials for learning, and also use the sample of [Google Colab](https://colab.research.google.com/drive/1MMJxRpfYi7Zd-jc_pyhvM1Y2WoQXOFcu?usp=sharing) we provide to run the program:

Suppose we have pre-training data `tests/test.jsonl` containing dirty data, and we want to clean it to obtain high-quality data. Open the script and modify the configuration as follows:

**Scenario Configuration:**
Expand Down Expand Up @@ -300,6 +302,8 @@ python script/run_dfa_pipeline_refine.py

##### 3. Practical Case: Simplify the Pipeline

You can refer to the following tutorials for learning, and also use the sample of [Google Colab](https://colab.research.google.com/drive/1MMJxRpfYi7Zd-jc_pyhvM1Y2WoQXOFcu?usp=sharing) we provide to run the program:

Suppose the Pipeline generated in the previous step is too complex and contains redundant "cleaning" operators, and we want to remove them to simplify the Pipeline.

**Scenario Configuration:**
Expand Down
3 changes: 3 additions & 0 deletions docs/zh/notes/guide/agent/operator_assemble_line.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,9 @@ python script/run_dfa_op_assemble.py
- **[Execution]**: 执行情况。

#### 5. 实战 Case:通用文本推理与伪答案生成

你可以参考以下教程学习,也可以参考我们提供的[Google Colab](https://colab.research.google.com/drive/1W3Wb1sTyea1xDAGmVu3Tyn7fcvrsppAp?usp=sharing)样例来运行:

我们有一个 `tests/test.jsonl` 文件,里面每行都有一个 `"raw_content"` 字段。我们希望:基于该字段的通用英文文本内容,先调用大语言模型针对文本内容生成推理式答案,再通过多轮生成候选答案并统计选优的方式生成伪答案,最终输出候选答案列表、最优伪答案、对应推理过程及典型正确推理示例等关键字段。所以我们选择 `ReasoningAnswerGenerator` 和 `ReasoningPseudoAnswerGenerator` 两个算子来编排 Pipeline。

以下是完整的配置示例:
Expand Down
2 changes: 2 additions & 0 deletions docs/zh/notes/guide/agent/operator_qa.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,8 @@ python script/run_dfa_operator_qa.py

#### 4. 实战 Case:查找“清洗数据”的算子

你可以参考以下教程学习,也可以参考我们提供的[Google Colab](https://colab.research.google.com/drive/1maDKWp-3zEQNScmL_S7MHUdUC1xyCIcK?usp=sharing)样例来运行:

假设您在开发 Pipeline 时遇到数据需要清洗,想知道 DataFlow 库里有没有现成的算子可以处理。

**场景配置:** 我们将其设置为单次查询模式,并指定将结果保存到本地,以便后续在代码中查看详细参数。
Expand Down
2 changes: 2 additions & 0 deletions docs/zh/notes/guide/agent/operator_write.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,8 @@ python script/run_dfa_operator_write.py

#### 4. 实战 Case:编写一个情感分析算子

你可以参考以下教程学习,也可以参考我们提供的[Google Colab](https://colab.research.google.com/drive/1oTkwMNwxMFGAe9rNtYCC47CQ9HxsA0uH?usp=sharing)样例来运行:

我们有一个日志文件 `tests/test.jsonl`,其中包含字段 `"raw_content"`。我们希望创建一个算子,对该字段的文本内容进行情感分析。

**配置示例:**
Expand Down
2 changes: 2 additions & 0 deletions docs/zh/notes/guide/agent/pipeline_prompt.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,8 @@ python script/run_dfa_pipeline_prompt.py

#### 4. 实战 Case:复用ReasoningQuestionFilter过滤器,编写适用金融问题的过滤器提示词

你可以参考以下教程学习,也可以参考我们提供的[Google Colab](https://colab.research.google.com/drive/1cU5Eg6tuc7WVDG33tU9Wplza52e54kts?usp=sharing)样例来运行:

假设我们想复用系统中的 `ReasoningQuestionFilter` 算子,让它变成为一个金融领域问题的过滤器。打开脚本修改如下配置:

```python
Expand Down
4 changes: 4 additions & 0 deletions docs/zh/notes/guide/agent/pipeline_rec&refine.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,8 @@ python script/run_dfa_pipeline_recommend.py

##### 4. 实战 Case:预训练数据清洗流水线

你可以参考以下教程学习,也可以参考我们提供的[Google Colab](https://colab.research.google.com/drive/1MMJxRpfYi7Zd-jc_pyhvM1Y2WoQXOFcu?usp=sharing)样例来运行:

假设我们有一个包含脏数据的预训练数据 `tests/test.jsonl`,我们希望清洗出一份高质量数据。打开脚本修改如下配置:

**场景配置:**
Expand Down Expand Up @@ -285,6 +287,8 @@ python script/run_dfa_pipeline_refine.py

##### 3. 实战 Case:简化流水线

你可以参考以下教程学习,也可以参考我们提供的[Google Colab](https://colab.research.google.com/drive/1MMJxRpfYi7Zd-jc_pyhvM1Y2WoQXOFcu?usp=sharing)样例来运行:

假设上一步生成的流水线太复杂,包含了多余的“清洗”算子,我们希望将其移除来简化 Pipeline。

**场景配置:**
Expand Down