44 lines (39 loc) · 2.28 KB

Resources

Benchmark papers

m&m’s: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks
- multi-modal
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents
TOOLLLM: FACILITATING LARGE LANGUAGE MODELS TO MASTER 16000+ REAL-WORLD APIS
- toolbench
Gorilla: Large Language Model Connected with Massive APIs
- code output
AGENTVERSE: FACILITATING MULTI-AGENT COLLABORATION AND EXPLORING EMERGENT BEHAVIORS
CRAFT: CUSTOMIZING LLMS BY CREATING AND RETRIEVING FROM SPECIALIZED TOOLSETS
α-UMi: Small LLMs Are Weak Tool Learners: A Multi-LLM Agent
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls
WISSNYF: TOOL GROUNDED LLM AGENTS FOR BLACK BOX SETTING
τ -bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

Papers

GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction
AI Agents That Matter

Autogen

groupchat, nestedchat example
- https://microsoft.github.io/autogen/docs/tutorial/conversation-patterns/#two-agent-chat-and-chat-result
- https://microsoft.github.io/autogen/docs/notebooks/agentchat_nestedchat/
Custom llm + function calling
Articles on autoen function calling
- https://medium.com/@sanjuvenky246/-3c15bfa077da
- https://medium.com/@coldstart_coder/autogen-essentials-function-integration-for-smarter-agents-7c3b4a0fdc12

Code & Data

Survey

LLM tool survey

Etc

Openai function calling