feat: add interactive wait tool for human-in-the-loop tasks#109
feat: add interactive wait tool for human-in-the-loop tasks#109chenning9821 wants to merge 2 commits intogoogle-gemini:mainfrom
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Summary of ChangesHello @chenning9821, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the agent's capabilities by enabling human-in-the-loop interactions, allowing the agent to pause and await user input for external actions. This significantly improves the agent's robustness in scenarios requiring manual intervention, such as authentication flows, and includes a crucial fix for API communication when custom tools are used. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a wait_for_user_input tool for human-in-the-loop interactions and includes a bug fix for safety acknowledgements. However, the new wait_for_user_input tool is vulnerable to terminal injection and social engineering due to printing unsanitized LLM-generated text. Furthermore, the bug fix for safety acknowledgements is incomplete, as it fails to pass the prepared response data to the final response object, which may lead to continued API errors. There are also minor suggestions for improving code consistency.
agent.py
Outdated
| prompt_text: The message to display to the user explaining what they need to do. | ||
| """ | ||
| print(f"\n" + "="*40) | ||
| print(f"🛑 AGENT REQUEST: {prompt_text}") |
There was a problem hiding this comment.
The prompt_text is printed directly to the terminal without sanitization. Since this text is generated by the LLM, it can be influenced by untrusted content (Indirect Prompt Injection), allowing an attacker to manipulate the terminal display or craft deceptive messages. It is recommended to sanitize the input to remove control characters and ANSI escape sequences before printing.
| response_data = fc_result.copy() | ||
| response_data.update(extra_fr_fields) |
There was a problem hiding this comment.
The response_data dictionary, which correctly incorporates the safety_acknowledgement field, is prepared on these lines but is not actually used in the subsequent FunctionResponse constructor call on line 376. Instead, the original fc_result is passed. This results in the safety_acknowledgement being omitted for custom tools, which can lead to 400 INVALID_ARGUMENT errors from the Gemini API. To complete the fix, line 376 should be updated to use response_data.
agent.py
Outdated
| return multiply_numbers(x=action.args["x"], y=action.args["y"]) | ||
| elif action.name == wait_for_user_input.__name__: | ||
| return wait_for_user_input(prompt_text=action.args["prompt_text"]) | ||
| # -------------------- |
31a891b to
005c537
Compare
Summary
This PR introduces a
wait_for_user_inputtool to enable human-in-the-loop interactions. This is specifically designed to handle scenarios like QR code logins (e.g., Xiaohongshu/LittleRedBook) or CAPTCHA challenges where the agent gets stuck in a loop without human intervention.Key Changes
wait_for_user_inputinagent.pywhich pauses the agent execution and waits for manual user confirmation in the terminal.run_one_iterationwheresafety_acknowledgementwas missing for custom tool responses, which previously caused400 INVALID_ARGUMENTerrors from the Gemini API.handle_actionto route the new interactive tool correctly.Testing
Verified locally with a Xiaohongshu login scenario. The agent successfully paused for the QR scan, waited for my terminal input, and resumed execution seamlessly after confirmation.