Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ httpx
isort
Jira
jsonlines
Kaggle('s)?
Langfuse
LangSmith
# libcudf isn't styled in the way that cuDF is https://docs.rapids.ai/api/libcudf/stable/
Expand Down
218 changes: 218 additions & 0 deletions examples/kaggle_mcp/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,218 @@
<!--
SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

# Kaggle MCP Example

**Complexity:** Intermediate

This is a snapshot of the NVIDIA NeMo Agent Toolkit 1.7 `kaggle_mcp` example. It demonstrates how to use the Kaggle MCP server with NVIDIA NeMo Agent Toolkit to interact with Kaggle's datasets, notebooks, models, and competitions.

This example is intentionally hosted in the examples repository because it requires a Kaggle API key and depends on an external MCP server whose data, schema, availability, and responses are not controlled by the toolkit. It is useful as a reference integration, but the primary MCP end-to-end examples in the toolkit are self-contained client/server examples.

## Prerequisites

- Clone this repository and create the development environment described in the root [README](../../README.md).
- NeMo Agent Toolkit installed with MCP support.
- A Kaggle account and API token

### Getting Your Kaggle Bearer Token

The Kaggle MCP server uses bearer token authentication. Obtain your Kaggle bearer token from [Kaggle Account Settings](https://www.kaggle.com/settings/account).

## Configuration

The `config.yml` file uses the built-in `api_key` authentication provider with Bearer token scheme:

```yaml
authentication:
kaggle:
_type: api_key
raw_key: ${KAGGLE_BEARER_TOKEN}
auth_scheme: Bearer
```

### Environment Variables

Set the following environment variable:

```bash
export KAGGLE_BEARER_TOKEN="your_kaggle_api_key_here"
```

## Usage

Install the example from the root directory of this repository:

```bash
uv pip install -e examples/kaggle_mcp
```

Run the workflow with a query:

```bash
nat run --config_file examples/kaggle_mcp/configs/config.yml \
--input "list the IMDB datasets"
```

### Per-User Mode (Multi-User Server)

For multi-user deployments where each user needs their own isolated workflow and MCP client instance, use the per-user configuration:

```bash
export KAGGLE_BEARER_TOKEN="your_kaggle_api_key_here"
nat serve --config_file examples/kaggle_mcp/configs/config-per-user.yml
```

Test requests with different users:

User Alice:
```bash
curl -X POST http://localhost:8000/generate \
-H "Content-Type: application/json" \
-H "Cookie: nat-session=user-alice" \
-d '{"messages": [{"role": "user", "content": "Search for titanic datasets"}]}'
```

User Bob (has a separate MCP client instance):
```bash
curl -X POST http://localhost:8000/generate \
-H "Content-Type: application/json" \
-H "Cookie: nat-session=user-bob" \
-d '{"messages": [{"role": "user", "content": "List the IMDB datasets"}]}'
```

Each user identified by their `nat-session` cookie gets their own workflow instance and MCP client.

## Configuration Details

### MCP Client Setup

The configuration connects to Kaggle's MCP server using:
- **Transport**: `streamable-http` (recommended for HTTP-based MCP servers)
- **URL**: `https://www.kaggle.com/mcp`
- **Authentication**: Bearer token via the built-in `api_key` authentication provider

## CLI Commands

You can use the following CLI commands to interact with the Kaggle MCP server. This is useful for prototyping and debugging.

### Discover Tools (No Authentication Required)

To list available tools from the Kaggle MCP server:

```bash
nat mcp client tool list --url https://www.kaggle.com/mcp
```

### Get Tool Schema (No Authentication Required)

To validate the tool schema:

```bash
nat mcp client tool list --url https://www.kaggle.com/mcp --tool search_datasets
```

### Authenticated Tool Calls

The Kaggle MCP server requires bearer token authentication for some tool calls.

#### Using Environment Variable (Recommended)

```bash
# Set your Kaggle bearer token
export KAGGLE_BEARER_TOKEN="your_kaggle_api_key_here"

# Search for Titanic datasets
nat mcp client tool call search_datasets \
--url https://www.kaggle.com/mcp \
--bearer-token-env KAGGLE_BEARER_TOKEN \
--json-args '{"request": {"search": "titanic"}}'
```

#### Using Direct Token

```bash
# Search for Titanic datasets with direct token (less secure)
nat mcp client tool call search_datasets \
--url https://www.kaggle.com/mcp \
--bearer-token "your_kaggle_api_key_here" \
--json-args '{"request": {"search": "titanic"}}'
```

**Note**: The `--bearer-token-env` approach is more secure because it doesn't expose the token in command history or process lists.

## Troubleshooting

### Agent Uses Wrong Parameter Names

**Problem**: The agent generates tool calls with incorrect parameter names, such as using `query` instead of `search` for `search_datasets`.

**Cause**: The default tool descriptions from Kaggle MCP are generic and don't specify parameter names, causing the LLM to infer incorrect names.

**Solution**: Check the tool schema and add tool overrides in your `config.yml` to provide explicit parameter guidance:

```bash
nat mcp client tool list --url https://www.kaggle.com/mcp --tool search_datasets
```

After getting the tool schema, add the following tool overrides to your `config.yml`:

```yaml
function_groups:
kaggle_mcp_tools:
tool_overrides:
search_datasets:
description: >
Search for datasets on Kaggle. Use the 'search' parameter (not 'query')
to search by keywords. Example: {"request": {"search": "titanic"}}
```

### Permission Denied Errors

**Problem**: Tool calls fail with "Permission 'datasets.get' was denied" or similar errors.

**Cause**: Your Kaggle API token lacks the required permissions for certain operations.

**Solution**:
- Ensure you're using a valid Kaggle API key from https://www.kaggle.com/settings/account
- Some operations require dataset ownership or special permissions
- Use `search_datasets` for browsing (requires minimal permissions)
- Use `list_dataset_files` only for datasets you own or have access to

### CLI Tool Calls Work but Workflow Fails

**Problem**: `nat mcp client tool call` succeeds but `nat run` with a workflow fails with the same tool.

**Possible causes**:
1. **Parameter validation**: CLI bypasses some validation that workflows enforce
2. **Agent parameter inference**: Agent might use wrong parameter names (see "Agent Uses Wrong Parameter Names" above)

**Solution**: Use `--direct` mode to test the raw MCP server behavior, then add tool overrides to guide the agent.

## References

- [Kaggle MCP Documentation](https://www.kaggle.com/docs/mcp)
- [NeMo Agent Toolkit MCP Documentation](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/main/docs/source/build-workflows/mcp-client.md)

## Related Examples

For deterministic MCP release-validation examples owned end to end by the toolkit, use:

- `simple_calculator_fastmcp`
- `simple_calculator_fastmcp_protected`
- `simple_calculator_mcp`
- `simple_calculator_mcp_protected`
52 changes: 52 additions & 0 deletions examples/kaggle_mcp/configs/config-per-user.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# SPDX-FileCopyrightText: Copyright (c) 2024-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

general:
per_user_workflow_timeout: 300
per_user_workflow_cleanup_interval: 1800

llms:
nim_llm:
_type: nim
model_name: meta/llama-3.1-70b-instruct
temperature: 0.0

function_groups:
kaggle_mcp_tools:
_type: per_user_mcp_client
server:
transport: streamable-http
url: https://www.kaggle.com/mcp
auth_provider: kaggle
tool_overrides:
search_datasets:
description: >
Search for datasets on Kaggle. Use the 'search' parameter to search by keywords.
Returns a list of datasets with metadata including
title, owner, download count, and URL. Example: {"request": {"search": "titanic"}}

authentication:
kaggle:
_type: api_key
raw_key: ${KAGGLE_BEARER_TOKEN}
auth_scheme: Bearer

workflow:
_type: per_user_react_agent
tool_names:
- kaggle_mcp_tools
llm_name: nim_llm
verbose: true
parse_agent_response_max_retries: 3
52 changes: 52 additions & 0 deletions examples/kaggle_mcp/configs/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# SPDX-FileCopyrightText: Copyright (c) 2024-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

llms:
# Tell NeMo Agent Toolkit which LLM to use for the agent
nim_llm:
_type: nim
model_name: meta/llama-3.1-70b-instruct
temperature: 0.0
function_groups:
kaggle_mcp_tools:
_type: mcp_client
server:
transport: streamable-http
url: https://www.kaggle.com/mcp
auth_provider: kaggle
tool_overrides:
search_datasets:
description: >
Search for datasets on Kaggle. Use the 'search' parameter to search by keywords.
Returns a list of datasets with metadata including
title, owner, download count, and URL. Example: {"request": {"search": "titanic"}}

authentication:
kaggle:
_type: api_key
raw_key: ${KAGGLE_BEARER_TOKEN}
auth_scheme: Bearer

workflow:
# Use an agent that 'reasons' and 'acts'
_type: react_agent
# Give it access to our kaggle MCP tools
tool_names: [kaggle_mcp_tools]
# Tell it which LLM to use
llm_name: nim_llm
# Make it verbose
verbose: true
# Retry up to 3 times
parse_agent_response_max_retries: 3
41 changes: 41 additions & 0 deletions examples/kaggle_mcp/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

[build-system]
build-backend = "setuptools.build_meta"
requires = ["setuptools >= 64", "setuptools-scm>=8"]

[tool.setuptools_scm]
git_describe_command = "git describe --long --first-parent"
root = "../.."

[tool.setuptools]
packages = []

[tool.uv]
prerelease = "allow"

[project]
name = "nat_kaggle_mcp"
dynamic = ["version"]
requires-python = ">=3.11,<3.14"
description = "Kaggle MCP integration example with bearer token authentication"
dependencies = [
"nvidia-nat[mcp,test]>=1.7.0a0,<1.8.0",
]
keywords = ["ai", "mcp", "protocol", "agents", "kaggle", "datasets"]
classifiers = ["Programming Language :: Python"]
authors = [{ name = "NVIDIA Corporation" }]
maintainers = [{ name = "NVIDIA Corporation" }]
Loading
Loading