Handcrafted ReAct Engine with DeepResearch

Background Requirements

A "Moon Landing Project" ReAct engine completed in the gaps between 4 coding tests on a Saturday.

Content Introduction

Mainly includes:

Single ReAct agent execution engine
Observability
Simple visualization analysis tool (planned to be improved later). Implemented features are in todo.md.

Future plans:

Improve observability
Enhance visualization analysis tools
Perfect various schemes for detecting and handling circular thinking
Improve test sets
Enhance Evaluator & optimizer for self-improvement
Improve distributed multi-agent architecture
Enhance multiple context management mechanisms

Running Environment

python ReActAgent.py

Input/output references:

Extended Thoughts

Detection and Handling Schemes for Circular Thinking

Thinking manifests in two ways:

Circular thinking based on LLM output patterns,
- Advantages: Natural, no additional training required
- Disadvantages: More cycles, time-consuming
Circular thinking based on GIT-PLAN tools.
- Advantages: Fewer cycles, shorter time consumption
- Disadvantages: Dependent on additional training, uncontrollable number of cycles

For these two types of circular thinking, detection and handling schemes are proposed respectively:

Detection scheme for circular thinking based on LLM output patterns:
- Detection method: Analyze circular references, repeated statements, repeated tool parameters, etc., in LLM output.
- Handling schemes:
  - Ensure accurate prompt returns on the tool side, for example: complex Python sandbox errors that the model cannot understand may lead to persistent error correction attempts.
  - Ensure accurate prompts on the model side, for example: system prompts should avoid consuming大量 tokens on persistent error correction, but rather activate the model's thinking ability.
Detection scheme for circular thinking based on GIT-PLAN tools:
- Detection method: Analyze git diff and topology in GIT-PLAN tools output.
- Handling scheme: Adopt structured WBS for task decomposition and status marking to avoid circular dependencies.

Discussion on Quantitative Indicators of Thinking Quality

Three indicators:

Task completion quality:
- Definition: The quality of task completion, including task completion degree, task quality, task efficiency, etc.
- Indicators: Task completion rate, task quality, task efficiency, etc.
Thinking quality:
- Definition: Coherence refers to whether the model can maintain logical coherence when generating text, avoiding logical errors or inconsistencies.
- Indicator: Analysis of plan update status
Data quality:
- Definition: Evaluation of the quality of observable data and factual data, including data completeness, accuracy, consistency, etc.
- Indicators: Missing value ratio, outlier ratio, data duplication ratio, etc.

Strategies for Balancing Token Efficiency and Task Quality

Several dimensions related to token efficiency:

Multi/single agent architecture
Model selection
Prompt engineering
Tool design
Task plan decomposition and update methods
Context compression
Context window size

At this stage, appropriate dimensions are selected for optimization according to specific scenarios.

Analysis of Behavioral Pattern Differences Across Task Types

Prompts are divided into general and vertical.

Vertical prompts:

Definition: Customized prompts for specific task scenarios.
Advantages:
- Customization: Customized prompts for specific task scenarios can better meet task requirements.
- Efficiency: Customized prompts can reduce the number of model inferences and improve efficiency.
Disadvantages:
- Cost: Customized prompts require SOP customization for specific task scenarios, resulting in higher experiential learning costs.
- Disorder: When facing general problems, vertical domain prompts may cause model output disorder, inconsistency in model context, leading to model output errors and more prone to thinking errors:
  - Infinite loop errors
  - Function call errors

End of ReAct Execution

Prompt design: termination tokens
Prompt design: hand off token
Detection of no tool calls:
- Definition: Detecting that the model has no tool calls in a loop, indicating that the task cannot be completed.
- Handling scheme: Terminate the ReAct loop and return the current model output.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
__metadata_for_get_observation_info		__metadata_for_get_observation_info
agent_files		agent_files
terminal.output		terminal.output
.gitignore		.gitignore
AgentPrompt.py		AgentPrompt.py
CodeSandbox.py		CodeSandbox.py
EvaluationOptimization.py		EvaluationOptimization.py
ExecutionVisualizer.py		ExecutionVisualizer.py
FileSystem.py		FileSystem.py
GIT-PLAN-TREE.JSON		GIT-PLAN-TREE.JSON
README.cn.md		README.cn.md
README.md		README.md
ReActAgent.py		ReActAgent.py
WeatherSearch.py		WeatherSearch.py
WebSearch.py		WebSearch.py
app_errors.log		app_errors.log
error_log.txt		error_log.txt
exam.md		exam.md
llmLog.py		llmLog.py
observation.py		observation.py
observation_test.py		observation_test.py
todo.md		todo.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Handcrafted ReAct Engine with DeepResearch

Background Requirements

Content Introduction

Running Environment

Extended Thoughts

Detection and Handling Schemes for Circular Thinking

Discussion on Quantitative Indicators of Thinking Quality

Strategies for Balancing Token Efficiency and Task Quality

Analysis of Behavioral Pattern Differences Across Task Types

End of ReAct Execution

About

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Handcrafted ReAct Engine with DeepResearch

Background Requirements

Content Introduction

Running Environment

Extended Thoughts

Detection and Handling Schemes for Circular Thinking

Discussion on Quantitative Indicators of Thinking Quality

Strategies for Balancing Token Efficiency and Task Quality

Analysis of Behavioral Pattern Differences Across Task Types

End of ReAct Execution

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages