Skip to content

1850298154/Handcrafted_ReAct_Agent_Engine_with_DeepResearch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Handcrafted ReAct Engine with DeepResearch

Background Requirements

A "Moon Landing Project" ReAct engine completed in the gaps between 4 coding tests on a Saturday.

Content Introduction

Mainly includes:

  1. Single ReAct agent execution engine
  2. Observability
  3. Simple visualization analysis tool (planned to be improved later). Implemented features are in todo.md.

Future plans:

  1. Improve observability
  2. Enhance visualization analysis tools
  3. Perfect various schemes for detecting and handling circular thinking
  4. Improve test sets
  5. Enhance Evaluator & optimizer for self-improvement
  6. Improve distributed multi-agent architecture
  7. Enhance multiple context management mechanisms

Running Environment

python ReActAgent.py

Input/output references:

Extended Thoughts

Detection and Handling Schemes for Circular Thinking

Thinking manifests in two ways:

  1. Circular thinking based on LLM output patterns,

    • Advantages: Natural, no additional training required
    • Disadvantages: More cycles, time-consuming
  2. Circular thinking based on GIT-PLAN tools.

    • Advantages: Fewer cycles, shorter time consumption
    • Disadvantages: Dependent on additional training, uncontrollable number of cycles

For these two types of circular thinking, detection and handling schemes are proposed respectively:

  1. Detection scheme for circular thinking based on LLM output patterns:

    • Detection method: Analyze circular references, repeated statements, repeated tool parameters, etc., in LLM output.
    • Handling schemes:
      • Ensure accurate prompt returns on the tool side, for example: complex Python sandbox errors that the model cannot understand may lead to persistent error correction attempts.
      • Ensure accurate prompts on the model side, for example: system prompts should avoid consuming大量 tokens on persistent error correction, but rather activate the model's thinking ability.
  2. Detection scheme for circular thinking based on GIT-PLAN tools:

    • Detection method: Analyze git diff and topology in GIT-PLAN tools output.
    • Handling scheme: Adopt structured WBS for task decomposition and status marking to avoid circular dependencies.

Discussion on Quantitative Indicators of Thinking Quality

Three indicators:

  1. Task completion quality:

    • Definition: The quality of task completion, including task completion degree, task quality, task efficiency, etc.
    • Indicators: Task completion rate, task quality, task efficiency, etc.
  2. Thinking quality:

    • Definition: Coherence refers to whether the model can maintain logical coherence when generating text, avoiding logical errors or inconsistencies.
    • Indicator: Analysis of plan update status
  3. Data quality:

    • Definition: Evaluation of the quality of observable data and factual data, including data completeness, accuracy, consistency, etc.
    • Indicators: Missing value ratio, outlier ratio, data duplication ratio, etc.

Strategies for Balancing Token Efficiency and Task Quality

Several dimensions related to token efficiency:

  • Multi/single agent architecture
  • Model selection
  • Prompt engineering
  • Tool design
  • Task plan decomposition and update methods
  • Context compression
  • Context window size

At this stage, appropriate dimensions are selected for optimization according to specific scenarios.

Analysis of Behavioral Pattern Differences Across Task Types

Prompts are divided into general and vertical.

Vertical prompts:

  • Definition: Customized prompts for specific task scenarios.
  • Advantages:
    • Customization: Customized prompts for specific task scenarios can better meet task requirements.
    • Efficiency: Customized prompts can reduce the number of model inferences and improve efficiency.
  • Disadvantages:
    • Cost: Customized prompts require SOP customization for specific task scenarios, resulting in higher experiential learning costs.
    • Disorder: When facing general problems, vertical domain prompts may cause model output disorder, inconsistency in model context, leading to model output errors and more prone to thinking errors:
      • Infinite loop errors
      • Function call errors

End of ReAct Execution

  1. Prompt design: termination tokens
  2. Prompt design: hand off token
  3. Detection of no tool calls:
    • Definition: Detecting that the model has no tool calls in a loop, indicating that the task cannot be completed.
    • Handling scheme: Terminate the ReAct loop and return the current model output.

About

Handcrafted ReAct Agent Engine | Supports FS / web page / weather query + code sandbox, including observability + visualization tools, focusing on circular thinking detection and task quality optimization

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages