This repository contains the projects related to an agentic AI conference booking use case, demonstrating different architectures for service orchestration and evaluation.
This project is divided into three main components:
-
mcp-server-conference-use-case/: This directory contains all the Node.js-based MCP (Model Context Protocol) servers required for the use case. This includes a service registry, mock servers for conference discovery and booking, and live server implementations that connect to real-world APIs. See theREADME.mdinside this directory for detailed setup and configuration instructions. -
use-case-test-agentsdk/: This directory contains a Python-based test framework for running and evaluating the conference booking agent. It includes different agent architectures (static vs. dynamic service discovery) and supports various language models. See theREADME.mdinside this directory for instructions on how to set up the environment and run the tests. -
use-case-test-a2a/: This directory contains an A2A (Agent-to-Agent) server that exposes the entire conference booking workflow as a single "skill". This allows other A2A-compatible agents to delegate the task to this specialized server.
The use-case-test-agentsdk/ directory demonstrates two different approaches for agent and service orchestration:
-
In this approach, the agent is initialized with a hardcoded list of all the MCP servers it needs to complete its task. This is a simpler, more direct approach but is less flexible if services change.
-
This is a more advanced approach where the agent is only given the address of a service registry. It is responsible for first querying the registry to discover the endpoints of the specific services (e.g., for flight booking, conference search) required to fulfill the user's request. This architecture is more robust and flexible.
You can run either of these architectures using the run_test.py script in the use-case-test-agentsdk/ directory, by using the --architecture flag (second for static, third for dynamic).
When you run a test using run_test.py, detailed logs and an evaluation summary are generated inside the use-case-test-agentsdk/logs/ directory. The logs for each test run are saved in a subdirectory corresponding to the architecture used:
use-case-test-agentsdk/logs/second/: Contains logs for runs of the static (secondconference) architecture.use-case-test-agentsdk/logs/third/: Contains logs for runs of the dynamic (thirdconference) architecture.
Each of these directories will contain files like mcp.log (the full log), mcp_summary.log, and evaluation.log (the results of the evaluation script).
- Set up the MCP Servers: Begin by navigating to the
mcp-server-conference-use-case/directory and following the setup instructions in itsREADME.mdto install and build the necessary servers. - Set up the Test Framework: Next, navigate to the
use-case-test-agentsdk/directory and follow the setup instructions in itsREADME.mdto configure the Python environment and API keys. - Run the Tests: From the
use-case-test-agentsdk/directory, you can run the evaluation scripts as described in the documentation to test the different agent architectures.
The use-case-test-a2a directory provides an alternative way to run the conference agent, exposing it as a network service (or "skill").
- Set up the MCP Servers: Ensure the servers in
mcp-server-conference-use-case/are built. - Set up the A2A Server: Navigate to the
use-case-test-a2a/directory and follow the setup instructions in itsREADME.md. - Run the Server: Start the A2A server. You can then interact with it using the provided
test_client.pyor another A2A-compatible client.
The dynamic architecture (architectures/thirdconference.py) uses a two-phase process. First, a specialized "discovery" agent finds the necessary tools. Second, a "main" agent uses those tools to solve the user's request. This appendix details the prompts used in each phase.
The first agent's only goal is to find and configure the right set of tools for the job. Its prompt is composed of two parts: a system prompt defining its role and a user prompt defining its task.
This is the core identity given to the agent. It tells the agent what it is, what its capabilities are, and what its final output must be.
"You are a discovery agent. Your job is to find ALL MCP servers "
"needed to fulfill a user's request. First, analyze the user's query "
"to determine what capabilities are needed (e.g., conference search, "
"flight booking, hotel booking, helper tools like geocoding). For each "
"capability, use the RegistryServer to search for a relevant server. "
"After finding all servers, get their addresses. Your final output "
"must be only a JSON list of server configurations. Each configuration "
"must contain 'name' and 'params' (with 'command' and 'args')."This is the first message sent to the agent, which contextualizes the user's original query and gives the agent a clear, actionable goal.
"The user wants to plan a conference trip. The query is: "
"'i want to go to the INTERNATIONAL SEMANTIC WEB CONFERENCE from vienna. "
"Book the flight and hotel for me, you dont need to get my permission for booking'. "
"Find all servers needed for this, including conference "
"discovery, booking (flights and hotels), and any mediation helpers. "
"Return the final list of server configurations as a JSON object."Here is a step-by-step breakdown of how the discovery process unfolds with the prompts above.
-
Agent Receives the Goal: The
DiscoveryAgentis activated with the User Prompt. -
Agent Formulates a Search Query: Based on its instructions and the task, the agent creates a broad search query.
LOG: Agent decides to call the search tool with the query:
"INTERNATIONAL SEMANTIC WEB CONFERENCE flight hotel booking" -
Agent Selects the Best Tool: The agent examines the tools on the
RegistryServerand correctly choosessearch_serversas the most efficient option to find what it needs. -
Agent Calls the Registry Server: The agent executes the tool call.
TOOL_CALL - Calling
mcp_RegistryServer_search_serverswith its query. -
Registry Performs Fuzzy Search (Partial Success): The
RegistryServerreceives the query. In this specific run, the fuzzy search is only partially successful and returns just one of the three required servers (conference_mediation_helpers). -
Agent Adapts and Completes Its Task: Recognizing the initial search was incomplete, the agent adapts its strategy. The logs show it then proceeds with a more cautious, multi-turn approach:
- First, it calls
get_servers()to get a complete list of all available servers. - Then, in the following turns, it calls
get_server_addressfor each of the other required servers (conference_discoveryandbooking-mock) one by one. - After successfully gathering all three server configurations, it assembles the final JSON list and outputs it, completing its task.
- First, it calls
Once the discovery phase is complete, the list of servers is used to initialize a new, main agent. This agent is much simpler because it now has all the tools it needs.
This agent's instructions are focused purely on the user-facing task, as the tool discovery problem has already been solved.
"You are a helpful travel agent that can book flights and hotels for a conference."This main agent is then activated with the original user query and proceeds to solve the conference booking task using the dynamically discovered servers for conference search, booking, and geocoding.