OpenDCAI · haolpku · Feb 6, 2026 · Feb 5, 2026
diff --git a/docs/.vuepress/notes/en/guide.ts b/docs/.vuepress/notes/en/guide.ts
@@ -116,7 +116,12 @@ export const Guide: ThemeNote = defineNoteConfig({
             prefix: 'agent',
             items: [
                 "agent_for_data",
-                "DataFlow-AgentPipelineOrchestration"
+                "DataFlow-AgentPipelineOrchestration",
+                "operator_assemble_line",
+                "operator_qa",
+                "operator_write",
+                "pipeline_prompt",
+                "pipeline_rec&refine"
             ]
         },
     ],

diff --git a/docs/.vuepress/notes/zh/guide.ts b/docs/.vuepress/notes/zh/guide.ts
@@ -115,7 +115,12 @@ export const Guide: ThemeNote = defineNoteConfig({
             prefix: 'agent',
             items: [
                 "agent_for_data",
-                "DataFlow-AgentPipelineOrchestration"
+                "DataFlow-AgentPipelineOrchestration",
+                "operator_assemble_line",
+                "operator_qa",
+                "operator_write",
+                "pipeline_prompt",
+                "pipeline_rec&refine"
             ]
         },
         // {

diff --git a/docs/en/notes/guide/agent/operator_assemble_line.md b/docs/en/notes/guide/agent/operator_assemble_line.md
@@ -0,0 +1,108 @@
+---
+title: Visualized Operator Assemble Line
+createTime: 2026/02/05 22:11:00
+permalink: /en/guide/agent/operator_assemble_line/
+---
+
+## 1. Overview
+
+**Visualized Operator Assemble Line** is a "low-code/no-code" development tool provided by the DataFlow-Agent platform. It allows users to bypass complex Python coding or AI planning processes by directly browsing available operators in the system via a Graphical User Interface (GUI), manually configuring parameters, and assembling them into ordered data processing pipelines.
+
+The core value of this feature lies in:
+
+* **What You See Is What You Get**: Real-time viewing of operator parameter definitions and pipeline structures.
+* **Automatic Linking**: The system automatically attempts to match the output of a previous operator with the input of the next, simplifying data flow configuration.
+* **Code Generation and Execution**: Assembled logic is automatically converted into standard Python code and executed in the background.
+
+## 2. Features
+
+This functional module primarily consists of frontend interaction logic (`op_assemble_line.py`) and a backend execution workflow (`wf_df_op_usage.py`).
+
+### 2.1 Dynamic Operator Loading and Introspection
+
+The system automatically scans the `OPERATOR_REGISTRY` upon startup, loading all registered operators and categorizing them based on their module paths.
+
+* **Automatic Parameter Parsing**: Using Python's `inspect` module, the system automatically extracts method signatures from the `init` and `run` methods of operator classes to generate corresponding configuration boxes in the UI.
+* **Prompt Template Support**: For operators that support Prompts, the UI automatically reads `ALLOWED_PROMPTS` and provides a dropdown selection box.
+
+### 2.2 Intelligent Parameter Linking
+
+During the UI orchestration process, the system features "automatic wiring" capabilities. It analyzes the input-output relationships between adjacent operators, automatically matches keys with similar names, and displays the data flow through visualized connections.
+
+## 3. User Guide
+
+This feature provides two modes of use: the **Graphical Interface (Gradio UI)** and **Command-line Scripts**.
+
+### 3.1 UI Operation
+
+Ideal for interactive exploration and rapid verification.
+
+1. **Environment Configuration**: Enter API-related information and the input JSONL file path at the top of the page.
+2. **Orchestrate Pipeline**:
+    1. **Select Operator**: Choose an operator category and a specific operator from the left dropdown menu.
+    2. **Configure Parameters**: Enter parameters into the JSON edit box.
+    3. **Add Operator**: Click the "Add Operator to Pipeline" button and drag items in the list below to adjust the execution order.
+3. **Run and Results**: Click "Run Pipeline" to view the generated code and a preview of the processed results in the execution result section.
+
+### 3.2 Script Invocation and Explicit Configuration
+
+For automated tasks or batch processing, the `run_dfa_op_assemble.py` script can be used. This method bypasses the UI and defines the operator sequence directly through code.
+
+> **Note: Explicit Configuration Requirement**: Unlike the "Automatic Linking" in the UI, the script mode requires you to **explicitly configure** all parameters. You must ensure that the `output_key` of the previous operator strictly matches the `input_key` of the next; the script will not automatically correct parameter names for you.
+
+#### 1. Modify Configuration
+
+Open `run_dfa_op_assemble.py` and modify the configuration area at the top of the file.
+
+**Key Configuration Item**: **`PIPELINE_STEPS`**—a list defining the pipeline execution steps. Each element contains an `op_name` and `params`.
+
+```python
+# [Pipeline Definition]
+PIPELINE_STEPS = [
+    {
+        "op_name": "ReasoningAnswerGenerator",
+        "params": {
+            # __init__ parameters (Note: unified into 'params' in wf_df_op_usage)
+            "prompt_template": "dataflow.prompts.reasoning.math.MathAnswerGeneratorPrompt",
+            # run parameters
+            "input_key": "raw_content",
+            "output_key": "generated_cot"
+        }
+    },
+    {
+        "op_name": "ReasoningPseudoAnswerGenerator",  
+        "params": {
+            "max_times": 3,
+            "input_key": "generated_cot",
+            "output_key_answer": "pseudo_answers",
+            "output_key_answer_value": "pseudo_answer_value",
+            "output_key_solutions": "pseudo_solutions",
+            "output_key_correct_solution_example": "pseudo_correct_solution_example"
+        }
+    }
+]
+
+```
+
+**Other Required Configurations**:
+
+* `CACHE_DIR`: **Must use an absolute path** to avoid path errors when the generated Python script executes in a subprocess.
+* `INPUT_FILE`: The absolute path to the initial data file.
+
+#### 2. Run Script
+
+```bash
+python run_dfa_op_assemble.py
+
+```
+
+#### 3. Output Results
+
+After execution, the console will print:
+
+* **[Generation]**: The path of the generated Python script (e.g., `pipeline_script_pipeline_001.py`).
+* **[Code Preview]**: A preview of the first 20 lines of the generated code.
+* **[Execution]**:
+    * `Status: success` indicates successful execution.
+    * `STDOUT`: Prints the standard output logs from the pipeline runtime.
+
diff --git a/docs/en/notes/guide/agent/operator_qa.md b/docs/en/notes/guide/agent/operator_qa.md
@@ -0,0 +1,112 @@
+---
+title: Operator QA
+createTime: 2026/02/05 22:11:00
+permalink: /en/guide/agent/operator_qa/
+---
+
+## 1. Overview
+
+**Operator QA** is a built-in vertical domain expert assistant within the DataFlow-Agent platform. Its core mission is to help users quickly navigate the extensive DataFlow operator library to find required tools, understand their usage, and inspect underlying source code.
+
+Unlike generic chatbots, Operator QA integrates **RAG (Retrieval-Augmented Generation)** technology. It is equipped with a complete operator index (FAISS) and a metadata knowledge base of the DataFlow project. When a user asks a question, the Agent autonomously decides whether to retrieve information from the knowledge base, which operators to inspect, and provides accurate technical details—including code snippets and parameter descriptions—back to the user.
+
+## 2. Core Features
+
+This module is driven by a frontend UI (`operator_qa.py`), an entry script (`run_dfa_operator_qa.py`), and a backend agent (`operator_qa_agent.py`). It possesses the following core capabilities:
+
+### 2.1 Intelligent Retrieval and Recommendation
+
+The Agent does more than simple keyword matching; it identifies user needs based on semantic understanding.
+
+* **Semantic Search**: If a user describes a need like "I want to filter out missing values," the Agent uses vector retrieval to find relevant operators such as `ContentNullFilter`.
+* **On-Demand Invocation**: Based on the `BaseAgent` graph mode (`use_agent=True`), the Agent automatically determines whether to call the `search_operators` tool or respond directly based on the conversation context.
+
+### 2.2 Multi-turn Conversation
+
+Utilizing the `AdvancedMessageHistory` module, the system maintains a complete session context.
+
+* **Contextual Memory**: A user can ask, "Which operators can load data?" followed by "How do I fill in **its** parameters?" The Agent can recognize that "its" refers to the operator recommended in the previous turn.
+* **State Persistence**: In both script interaction and UI modes, by reusing the same `state` and `graph` instances, the `messages` list accumulates across multiple turns, ensuring the LLM maintains a full memory.
+
+### 2.3 Visualization and Interaction
+
+* **Gradio UI**: Provides code previews, operator highlighting, and quick-question buttons.
+* **Interaction**: Supports multi-turn Q&A, clearing history, and viewing history.
+
+## 3. Architectural Components
+
+### 3.1 OperatorQAAgent
+
+* Inherits from `BaseAgent` and is configured in ReAct/Graph mode.
+* Possesses Post-Tools permissions to call RAG services for data retrieval.
+* Responsible for parsing natural language, planning database queries, and generating final natural language responses.
+
+### 3.2 OperatorRAGService
+
+* A service layer decoupled from the Agent.
+* Manages the FAISS vector index and `ops.json` metadata.
+* Provides underlying capabilities such as `search` (vector search), `get_operator_info` (fetch details), and `get_operator_source` (fetch source code).
+
+## 4. User Guide
+
+### 4.1 UI Operation
+
+1. **Configure Model**: In the "Configuration" panel on the right, verify the API URL and Key, and select a model (defaults to `gpt-4o`).
+2. **Initiate Inquiry**:
+   1. **Dialogue Box**: Type your question.
+   2. **Quick Buttons**: Click "Quick Question" buttons, such as "Which operator filters missing values?" to start instantly.
+3. **View Results**:
+   1. **Chat Area**: Displays the Agent's response and citations.
+   2. **Right Panel**:
+      * `Related Operators`: Lists operator names retrieved by the Agent.
+      * `Code Snippets`: Displays Python source code if specific implementations are involved.
+
+### 4.2 Script Invocation and Explicit Configuration
+
+Beyond the UI, the system provides the `run_dfa_operator_qa.py` script, which supports running the Q&A service through explicit code configuration—ideal for development and debugging.
+
+**Configuration Method:** Directly modify the constant configuration area at the top of the script without passing command-line arguments:
+
+```python
+# ===== Example config (edit here) =====
+INTERACTIVE = False                  # True for multi-turn mode, False for single query
+QUERY = "Which operator should I use to filter missing values?" # Question for single query
+
+LANGUAGE = "en"
+SESSION_ID = "demo_operator_qa"
+CACHE_DIR = "dataflow_cache"
+TOP_K = 5                            # Number of retrieval results
+
+CHAT_API_URL = os.getenv("DF_API_URL", "http://123.129.219.111:3000/v1/")
+API_KEY = os.getenv("DF_API_KEY", "")
+MODEL = os.getenv("DF_MODEL", "gpt-4o")
+
+OUTPUT_JSON = ""  # e.g., "cache_local/operator_qa_result.json"; empty string means no file saving
+
+```
+
+**Execution Modes:**
+
+1. **Single Query Mode** (`INTERACTIVE = False`): Executes a single `QUERY`; results can be printed or saved as a JSON file.
+2. **Interactive Mode** (`INTERACTIVE = True`): Starts a terminal dialogue loop supporting `exit` to quit, `clear` to reset context, and `history` to view session history.
+
+**Core Logic:** The script demonstrates how to explicitly construct `DFRequest` and `MainState`, and manually build the execution graph:
+
+```python
+# 1. Explicitly construct the request
+req = DFRequest(
+    language=LANGUAGE,
+    chat_api_url=CHAT_API_URL,
+    api_key=API_KEY,
+    model=MODEL,
+    target="" # Populated before each query
+)
+
+# 2. Initialize state and graph
+state = MainState(request=req, messages=[])
+graph = create_operator_qa_graph().build()
+
+# 3. Execute
+result = await run_single_query(state, graph, QUERY)
+
+```
diff --git a/docs/en/notes/guide/agent/operator_write.md b/docs/en/notes/guide/agent/operator_write.md
@@ -0,0 +1,120 @@
+---
+title: Operator Write
+createTime: 2026/02/05 22:11:00
+permalink: /en/guide/agent/operator_write/
+---
+
+## 1. Overview
+
+**Operator Write** is the core productivity module of the DataFlow-Agent. It is not merely a tool for generating Python code based on user requirements but rather builds a closed-loop system for **generation, execution, and debugging**.
+
+This workflow enables:
+
+1. **Semantic Matching**: Understanding user intent (e.g., "filter missing values") and finding the best-matching base class within the existing operator library.
+2. **Code Generation**: Writing executable operator code based on the base class and user data samples.
+3. **Automatic Injection**: Automatically injecting LLM service capabilities into the operator if needed.
+4. **Subprocess Execution**: Instantiating and running the generated operator in a controlled environment.
+5. **Self-Healing**: Launching a Debugger to analyze stack traces if execution fails, automatically modifying the code, and retrying until success or the maximum retry limit is reached.
+
+## 2. Core Features
+
+### 2.1 Intelligent Code Generation
+
+* **Sample-Based Programming**: The Agent reads actual data samples (calling the pre-tool `local_tool_for_sample`) and the data Schema to ensure the generated code correctly handles real field names and data types.
+* **Operator Reuse**: The system prioritizes retrieving existing operator libraries (calling the pre-tool `match_operator`) to generate code inherited from existing base classes rather than starting from scratch, ensuring code standardization and maintainability.
+
+### 2.2 Automatic Debugging Loop
+
+This is a system equipped with self-reflection capabilities.
+
+* **Execution Monitoring**: At the `llm_instantiate` node, the system attempts to execute the generated code (`exec(code_str)`) and captures standard output and standard errors.
+* **Error Diagnosis**: If an exception occurs, the `code_debugger` Agent analyzes the error stack (`error_trace`) and the current code to generate repair suggestions (`debug_reason`).
+* **Auto-Rewrite**: The `rewriter` Agent regenerates the code based on the repair suggestions, automatically updates the file, and enters the next round of testing.
+
+### 2.3 LLM Service Injection
+
+For complex operators requiring Large Model calls (e.g., "generate summary based on content"), the `llm_append_serving` node automatically injects standard LLM call interfaces (`self.llm_serving`) into the operator code, empowering it with AI capabilities.
+
+## 3. Workflow Architecture
+
+This feature is orchestrated by `wf_pipeline_write.py`, forming a directed graph containing conditional loops.
+
+1. **Match Node**: Retrieves reference operators.
+2. **Write Node**: Writes the initial code.
+3. **Append Serving Node**: Injects LLM capabilities.
+4. **Instantiate Node**: Attempts to run the code.
+5. **Debugger Node** (Conditional Trigger): Analyzes errors.
+6. **Rewriter Node**: Fixes the code.
+
+## 4. User Guide
+
+This feature provides two modes of usage: **Graphical Interface (Gradio UI)** and **Command Line Script**.
+
+### 4.1 UI Operation
+
+The frontend page code is located in `operator_write.py`, offering a visualized interactive experience.
+
+#### 1. Configure Inputs
+
+Configure the following in the left panel of the page:
+
+* **Target Description**: Describe in detail the function and purpose of the operator you want to create.
+  * Example: "Create an operator that performs sentiment analysis on text."
+* **Operator Category**: The category the operator belongs to, used for matching similar operators as references. Defaults to `"Default"`. Options include `"filter"`, `"mapper"`, `"aggregator"`, etc..
+* **Test Data File**: Specify the `.jsonl` file path used for testing the generated operator. Defaults to the project's built-in `tests/test.jsonl`.
+* **Debug Settings**:
+  * `Enable Debug Mode`: If checked, the system automatically attempts to fix the code if an error occurs.
+  * `Max Debug Rounds`: Set the maximum number of automatic repair attempts (default is 3).
+* **Output Path**: Specify the save path for the generated code (optional).
+
+#### 2. View Results
+
+After clicking the **"Generate Operator"** button, the right panel displays detailed results:
+
+* **Generated Code**: Final usable Python code, supporting syntax highlighting.
+* **Matched Operators**: Displays the list of reference operators found by the system in the library (e.g., `"LangkitSampleEvaluator"`, `"LexicalDiversitySampleEvaluator"`, `"PresidioSampleEvaluator"`, `"PerspectiveSampleEvaluator"`, etc.).
+* **Execution Result**: Shows `success: true/false` and specific log information `stdout`/`stderr`.
+* **Debug Info**: If debugging was triggered, this displays the runtime captured `stdout`/`stderr` and the selected input field key (`input_key`).
+* **Agent Results**: Detailed execution results for each Agent node.
+* **Execution Log**: Complete execution log information, facilitating the troubleshooting of the Agent's thought process.
+
+### 4.2 Script Invocation and Explicit Configuration
+
+For developers or automated tasks, `run_dfa_operator_write.py` can be executed directly.
+
+#### 1. Modify Configuration
+
+Open `run_dfa_operator_write.py` and modify the parameters in the configuration area at the top of the file:
+
+```python
+CHAT_API_URL = os.getenv("DF_API_URL", "http://123.129.219.111:3000/v1/")
+MODEL = os.getenv("DF_MODEL", "gpt-4o")
+LANGUAGE = "en"
+
+TARGET = "Create an operator that filters out missing values and keeps rows with non-empty fields."
+CATEGORY = "Default"          # Fallback category (if classifier misses)
+OUTPUT_PATH = ""              # e.g., "cache_local/my_operator.py"; empty string means no file saving
+JSON_FILE = ""                # Empty string uses project built-in tests/test.jsonl
+
+NEED_DEBUG = False
+MAX_DEBUG_ROUNDS = 3
+
+```
+
+#### 2. Run Script
+
+```bash
+python run_dfa_operator_write.py
+
+```
+
+#### 3. Output Results
+
+The script will print key information to the console:
+
+* `Matched ops`: The matched reference operators.
+* `Code preview`: A preview fragment of the generated code.
+* `Execution Result`:
+  * `Success: True` indicates code generation and execution passed.
+  * `Success: False` will print `stderr preview` for troubleshooting.
+* `Debug Runtime Preview`: Displays the automatically selected `input_key` and runtime logs.