Goal: Enable your local model to call external functions (tools) so it can retrieve real-time data, perform calculations, or interact with APIs — all running privately on your device.
Tool calling (also known as function calling) lets a language model request the execution of functions you define. Instead of guessing an answer, the model recognises when a tool would help and returns a structured request for your code to execute. Your application runs the function, sends the result back, and the model incorporates that information into its final response.
This pattern is essential for building agents that can:
- Look up live data (weather, stock prices, database queries)
- Perform precise calculations (maths, unit conversions)
- Take actions (send emails, create tickets, update records)
- Access private systems (internal APIs, file systems)
The tool-calling flow has four stages:
| Stage | What Happens |
|---|---|
| 1. Define tools | You describe available functions using JSON Schema — name, description, and parameters |
| 2. Model decides | The model receives your message plus the tool definitions. If a tool would help, it returns a tool_calls response instead of a text answer |
| 3. Execute locally | Your code parses the tool call, runs the function, and collects the result |
| 4. Final answer | You send the tool result back to the model, which produces its final response |
Key point: The model never executes code. It only requests that a tool be called. Your application decides whether to honour that request — this keeps you in full control.
Not every model supports tool calling. In the current Foundry Local catalogue, the following models have tool-calling capability:
| Model | Size | Tool Calling |
|---|---|---|
| qwen2.5-0.5b | 822 MB | ✅ |
| qwen2.5-1.5b | 1.8 GB | ✅ |
| qwen2.5-7b | 6.3 GB | ✅ |
| qwen2.5-14b | 11.3 GB | ✅ |
| qwen2.5-coder-0.5b | 822 MB | ✅ |
| qwen2.5-coder-1.5b | 1.8 GB | ✅ |
| qwen2.5-coder-7b | 6.3 GB | ✅ |
| qwen2.5-coder-14b | 11.3 GB | ✅ |
| phi-4-mini | 4.6 GB | ✅ |
| phi-3.5-mini | 2.6 GB | ❌ |
| phi-4 | 10.4 GB | ❌ |
Tip: For this lab we use qwen2.5-0.5b — it is small (822 MB download), fast, and has reliable tool-calling support.
By the end of this lab you will be able to:
- Explain the tool-calling pattern and why it matters for AI agents
- Define tool schemas using the OpenAI function-calling format
- Handle the multi-turn tool-calling conversation flow
- Execute tool calls locally and return results to the model
- Choose the right model for tool-calling scenarios
| Requirement | Details |
|---|---|
| Foundry Local CLI | Installed and on your PATH (Part 1) |
| Foundry Local SDK | Python, JavaScript, or C# SDK installed (Part 2) |
| A tool-calling model | qwen2.5-0.5b (will be downloaded automatically) |
Before writing code, study this sequence diagram:
Key observations:
- You define the tools upfront as JSON Schema objects
- The model's response contains
tool_callsinstead of regular content - Each tool call has a unique
idyou must reference when returning results - The model sees all previous messages plus the tool results when generating the final answer
- Multiple tool calls can happen in a single response
Discussion: Why does the model return tool calls rather than executing functions directly? What security advantages does this provide?
Tools are defined using the standard OpenAI function-calling format. Each tool needs:
type: Always"function"function.name: A descriptive function name (e.g.get_weather)function.description: A clear description — the model uses this to decide when to call the toolfunction.parameters: A JSON Schema object describing the expected arguments
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a given city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city name, e.g. London"
}
},
"required": ["city"]
}
}
}Best practices for tool descriptions:
- Be specific: "Get the current weather for a given city" is better than "Get weather"
- Describe parameters clearly: the model reads these descriptions to fill in the right values
- Mark required vs optional parameters — this helps the model decide what to ask for
Each language sample defines two tools (get_weather and get_population), sends a question that triggers tool use, executes the tool locally, and sends the result back for a final answer.
🐍 Python
Prerequisites:
cd python
python -m venv venv
# Windows (PowerShell):
venv\Scripts\Activate.ps1
# macOS / Linux:
source venv/bin/activate
pip install -r requirements.txtRun:
python foundry-local-tool-calling.pyExpected output:
Starting Foundry Local service...
User: What is the weather like in London?
Model requested 1 tool call(s):
→ get_weather({'city': 'London'})
Final response:
The current weather in London is 18°C and partly cloudy.
Code walkthrough (python/foundry-local-tool-calling.py):
# Define tools as a list of function schemas
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a given city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "The city name"}
},
"required": ["city"]
}
}
}
]
# Send with tools — the model may return tool_calls instead of content
response = client.chat.completions.create(
model=model_id,
messages=messages,
tools=tools,
tool_choice="auto",
)
# Check if the model wants to call a tool
if response.choices[0].message.tool_calls:
# Execute the tool and send the result back
...🟨 JavaScript (Node.js)
Prerequisites:
cd javascript
npm installRun:
node foundry-local-tool-calling.mjsExpected output:
Starting Foundry Local service...
User: What is the weather like in London?
Model requested 1 tool call(s):
→ get_weather({"city":"London"})
Final response:
The current weather in London is 18°C and partly cloudy.
Code walkthrough (javascript/foundry-local-tool-calling.mjs):
This example uses the native Foundry Local SDK's ChatClient rather than the OpenAI SDK, demonstrating the convenience createChatClient() method:
// Get a ChatClient directly from the model object
const chatClient = model.createChatClient();
// Send with tools — ChatClient handles the OpenAI-compatible format
const response = await chatClient.completeChat(messages, tools);
const assistantMessage = response.choices[0].message;
// Check for tool calls
if (assistantMessage.tool_calls && assistantMessage.tool_calls.length > 0) {
// Execute tools and send results back
...
}🟦 C# (.NET)
Prerequisites:
cd csharp
dotnet restoreRun:
dotnet run toolcallExpected output:
Starting Foundry Local service...
Loading model: qwen2.5-0.5b...
User: What is the weather like in London?
Model requested 1 tool call(s):
→ get_weather({"city":"London"})
Final response:
The current weather in London is 18°C and partly cloudy.
Code walkthrough (csharp/ToolCalling.cs):
C# uses the ChatTool.CreateFunctionTool helper to define tools:
ChatTool getWeatherTool = ChatTool.CreateFunctionTool(
functionName: "get_weather",
functionDescription: "Get the current weather for a given city",
functionParameters: BinaryData.FromString("""
{
"type": "object",
"properties": {
"city": { "type": "string", "description": "The city name" }
},
"required": ["city"]
}
"""));
var options = new ChatCompletionOptions();
options.Tools.Add(getWeatherTool);
// Check FinishReason to see if tools were called
if (completion.Value.FinishReason == ChatFinishReason.ToolCalls)
{
// Execute tools and send results back
...
}Understanding the message structure is critical. Here is the complete flow, showing the messages array at each stage:
Stage 1 — Initial request:
[
{"role": "system", "content": "You are a helpful assistant. Use the provided tools."},
{"role": "user", "content": "What is the weather like in London?"}
]Stage 2 — Model responds with tool_calls (not content):
{
"role": "assistant",
"tool_calls": [
{
"id": "call_abc123",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"city\": \"London\"}"
}
}
]
}Stage 3 — You add the assistant message AND the tool result:
[
{"role": "system", "content": "..."},
{"role": "user", "content": "What is the weather like in London?"},
{"role": "assistant", "tool_calls": [...]},
{
"role": "tool",
"tool_call_id": "call_abc123",
"content": "{\"city\": \"London\", \"temperature\": \"18°C\", \"condition\": \"Partly cloudy\"}"
}
]Stage 4 — Model produces the final answer using the tool result.
Important: The
tool_call_idin the tool message must match theidfrom the tool call. This is how the model associates results with requests.
A model can request several tool calls in a single response. Try changing the user message to trigger multiple calls:
# In Python — change the user message:
messages = [
{"role": "system", "content": "You are a helpful assistant. Use the provided tools to answer questions."},
{"role": "user", "content": "What is the weather and population of London?"},
]// In JavaScript — change the user message:
const messages = [
{ role: "system", content: "You are a helpful assistant. Use the provided tools to answer questions." },
{ role: "user", content: "What is the weather and population of London?" },
];The model should return two tool_calls — one for get_weather and one for get_population. Your code already handles this because it loops through all tool calls.
Try it: Modify the user message and run the sample again. Does the model call both tools?
Extend one of the samples with a new tool. For example, add a get_time tool:
- Define the tool schema:
{
"type": "function",
"function": {
"name": "get_time",
"description": "Get the current time in a given city's timezone",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city name, e.g. Tokyo"
}
},
"required": ["city"]
}
}
}- Add the execution logic:
# Python
def execute_tool(name, arguments):
if name == "get_time":
city = arguments.get("city", "Unknown")
# In a real app, use a timezone library
return json.dumps({"city": city, "time": "14:30 GMT"})
# ... existing tools ...- Add the tool to the
toolsarray and test with: "What time is it in Tokyo?"
Challenge: Add a tool that performs a calculation, such as
convert_temperaturethat converts between Celsius and Fahrenheit. Test it with: "Convert 100°F to Celsius."
The JavaScript sample already uses the SDK's native ChatClient instead of the OpenAI SDK. This is a convenience feature that removes the need to construct an OpenAI client yourself:
import { FoundryLocalManager } from "foundry-local-sdk";
// ChatClient is created directly from the model object
const model = await manager.catalog.getModel("qwen2.5-0.5b");
await model.load();
const chatClient = model.createChatClient();
// completeChat accepts tools as a second parameter
const response = await chatClient.completeChat(messages, tools);Compare this with the Python approach which uses the OpenAI SDK explicitly:
client = openai.OpenAI(base_url=manager.endpoint, api_key=manager.api_key)
response = client.chat.completions.create(model=model_id, messages=messages, tools=tools)Both patterns are valid. The ChatClient is more convenient; the OpenAI SDK gives you access to the full range of OpenAI parameters.
Try it: Modify the JavaScript sample to use the OpenAI SDK instead of
ChatClient. You will needimport OpenAI from "openai"and construct the client with the endpoint frommanager.urls[0].
The tool_choice parameter controls whether the model must use a tool or can choose freely:
| Value | Behaviour |
|---|---|
"auto" |
Model decides whether to call a tool (default) |
"none" |
Model will not call any tools, even if provided |
"required" |
Model must call at least one tool |
{"type": "function", "function": {"name": "get_weather"}} |
Model must call the specified tool |
Try each option in the Python sample:
# Force the model to call get_weather
response = client.chat.completions.create(
model=model_id,
messages=messages,
tools=tools,
tool_choice={"type": "function", "function": {"name": "get_weather"}},
)Note: Not all
tool_choiceoptions may be supported by every model. If a model does not support"required", it may ignore the setting and behave as"auto".
| Problem | Solution |
|---|---|
| Model never calls tools | Ensure you are using a tool-calling model (e.g. qwen2.5-0.5b). Check the table above. |
tool_call_id mismatch |
Always use the id from the tool call response, not a hardcoded value |
Model returns malformed JSON in arguments |
Smaller models occasionally produce invalid JSON. Wrap JSON.parse() in a try/catch |
| Model calls a tool that does not exist | Add a default handler in your execute_tool function |
| Infinite tool-calling loop | Set a maximum number of rounds (e.g. 5) to prevent runaway loops |
- Tool calling lets models request function execution rather than guessing answers
- The model never executes code; your application decides what to run
- Tools are defined as JSON Schema objects following the OpenAI function-calling format
- The conversation uses a multi-turn pattern: user, then assistant (tool_calls), then tool (results), then assistant (final answer)
- Always use a model that supports tool calling (Qwen 2.5, Phi-4-mini)
- The SDK's
createChatClient()provides a convenient way to make tool-calling requests without constructing an OpenAI client
Continue to Part 12: Building a Web UI for the Zava Creative Writer to add a browser-based front end to the multi-agent pipeline with real-time streaming.