Skip to content

feat: add interactive wait tool for human-in-the-loop tasks#109

Open
chenning9821 wants to merge 2 commits intogoogle-gemini:mainfrom
chenning9821:feat/interactive-wait
Open

feat: add interactive wait tool for human-in-the-loop tasks#109
chenning9821 wants to merge 2 commits intogoogle-gemini:mainfrom
chenning9821:feat/interactive-wait

Conversation

@chenning9821
Copy link

Summary
This PR introduces a wait_for_user_input tool to enable human-in-the-loop interactions. This is specifically designed to handle scenarios like QR code logins (e.g., Xiaohongshu/LittleRedBook) or CAPTCHA challenges where the agent gets stuck in a loop without human intervention.

Key Changes

  • New Tool: Added wait_for_user_input in agent.py which pauses the agent execution and waits for manual user confirmation in the terminal.
  • Bug Fix: Fixed a critical issue in run_one_iteration where safety_acknowledgement was missing for custom tool responses, which previously caused 400 INVALID_ARGUMENT errors from the Gemini API.
  • Logic Update: Updated handle_action to route the new interactive tool correctly.

Testing
Verified locally with a Xiaohongshu login scenario. The agent successfully paused for the QR scan, waited for my terminal input, and resumed execution seamlessly after confirmation.

@google-cla
Copy link

google-cla bot commented Feb 14, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @chenning9821, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the agent's capabilities by enabling human-in-the-loop interactions, allowing the agent to pause and await user input for external actions. This significantly improves the agent's robustness in scenarios requiring manual intervention, such as authentication flows, and includes a crucial fix for API communication when custom tools are used.

Highlights

  • New Interactive Tool: Introduced a wait_for_user_input tool that pauses agent execution and waits for manual user confirmation in the terminal, designed for human-in-the-loop tasks like QR code logins or CAPTCHA challenges.
  • API Bug Fix: Resolved a critical issue in run_one_iteration where safety_acknowledgement was missing for custom tool responses, preventing 400 INVALID_ARGUMENT errors from the Gemini API.
  • Tool Routing Update: Updated the handle_action method to correctly route and process calls to the new wait_for_user_input interactive tool.
Changelog
  • agent.py
    • Added a new wait_for_user_input function to facilitate human interaction by pausing execution until user confirmation.
    • Registered the wait_for_user_input function as a callable tool within the BrowserAgent's initialization.
    • Modified the handle_action method to include logic for invoking the wait_for_user_input tool.
    • Updated the run_one_iteration method to correctly include extra_fr_fields in function responses, addressing a Gemini API error.
Activity
  • The author, chenning9821, created this pull request to introduce a new feature for human-in-the-loop tasks.
  • The author has verified the changes locally using a Xiaohongshu login scenario, confirming the agent successfully paused, waited for input, and resumed execution.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a wait_for_user_input tool for human-in-the-loop interactions and includes a bug fix for safety acknowledgements. However, the new wait_for_user_input tool is vulnerable to terminal injection and social engineering due to printing unsanitized LLM-generated text. Furthermore, the bug fix for safety acknowledgements is incomplete, as it fails to pass the prepared response data to the final response object, which may lead to continued API errors. There are also minor suggestions for improving code consistency.

agent.py Outdated
prompt_text: The message to display to the user explaining what they need to do.
"""
print(f"\n" + "="*40)
print(f"🛑 AGENT REQUEST: {prompt_text}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The prompt_text is printed directly to the terminal without sanitization. Since this text is generated by the LLM, it can be influenced by untrusted content (Indirect Prompt Injection), allowing an attacker to manipulate the terminal display or craft deceptive messages. It is recommended to sanitize the input to remove control characters and ANSI escape sequences before printing.

Comment on lines +373 to +374
response_data = fc_result.copy()
response_data.update(extra_fr_fields)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The response_data dictionary, which correctly incorporates the safety_acknowledgement field, is prepared on these lines but is not actually used in the subsequent FunctionResponse constructor call on line 376. Instead, the original fc_result is passed. This results in the safety_acknowledgement being omitted for custom tools, which can lead to 400 INVALID_ARGUMENT errors from the Gemini API. To complete the fix, line 376 should be updated to use response_data.

agent.py Outdated
return multiply_numbers(x=action.args["x"], y=action.args["y"])
elif action.name == wait_for_user_input.__name__:
return wait_for_user_input(prompt_text=action.args["prompt_text"])
# --------------------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This comment appears to be a temporary separator and doesn't add any explanatory value. It should be removed to keep the code clean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant