Skip to content

Latest commit

 

History

History
44 lines (39 loc) · 2.28 KB

File metadata and controls

44 lines (39 loc) · 2.28 KB

Resources

Benchmark papers

  • m&m’s: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks
    • multi-modal
  • AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents
  • TOOLLLM: FACILITATING LARGE LANGUAGE MODELS TO MASTER 16000+ REAL-WORLD APIS
    • toolbench
  • Gorilla: Large Language Model Connected with Massive APIs
    • code output
  • AGENTVERSE: FACILITATING MULTI-AGENT COLLABORATION AND EXPLORING EMERGENT BEHAVIORS
  • CRAFT: CUSTOMIZING LLMS BY CREATING AND RETRIEVING FROM SPECIALIZED TOOLSETS
  • α-UMi: Small LLMs Are Weak Tool Learners: A Multi-LLM Agent
  • AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls
  • WISSNYF: TOOL GROUNDED LLM AGENTS FOR BLACK BOX SETTING
  • τ -bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

Papers

  • GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction
  • AI Agents That Matter

Autogen

Code & Data

Survey

Etc