Forge MCP Server

Swarm agents that turn slow PyTorch into fast CUDA/Triton kernels, from any AI coding agent.

Installation · Tools · Resources · Prompts · Security · Development

Overview

Forge transforms PyTorch models into production-grade CUDA/Triton kernels through automated multi-agent optimization. Using 32 parallel AI agents with inference-time scaling, it achieves up to 14x faster inference than torch.compile(mode='max-autotune-no-cudagraphs') while maintaining 100% numerical correctness.

This MCP server connects any MCP-compatible AI coding agent to Forge. Your agent submits PyTorch code, Forge optimizes it with swarm agents on real datacenter GPUs, and returns the fastest kernel as a drop-in replacement.

What it does

Optimize existing kernels - Submit PyTorch code, get back an optimized Triton/CUDA kernel benchmarked against torch.compile(max-autotune)
Generate new kernels - Describe an operation (e.g. "fused LayerNorm + GELU + Dropout"), get a production-ready optimized kernel
32 parallel swarm agents - Coder+Judge agent pairs compete to discover optimal kernels, exploring tensor core utilization, memory coalescing, shared memory tiling, and kernel fusion simultaneously
Real datacenter GPU benchmarking - Every kernel is compiled, tested for correctness, and profiled on actual datacenter hardware
250k tokens/sec inference - Results in minutes, not hours
Smart detection - The agent automatically recognizes when your code would benefit from GPU optimization
One-click auth - Browser-based OAuth sign-in. No API keys to manage.

Supported GPUs

All optimization and benchmarking runs on datacenter-grade hardware:

GPU	Architecture
B200	Blackwell
H200	Hopper
H100	Hopper
L40S	Ada Lovelace
A100	Ampere
L4	Ada Lovelace
A10	Ampere
T4	Turing

Supported clients

Client	Status
Claude Code	Fully supported
Claude Desktop	Fully supported
OpenCode	Fully supported
Cursor	Fully supported
Windsurf	Fully supported
VS Code + Copilot	Fully supported
Any MCP client	Fully supported via stdio

Installation

Claude Code

macOS / Linux:

claude mcp add forge-mcp -- npx -y @rightnow/forge-mcp-server

Windows:

claude mcp add forge-mcp -- cmd /c npx -y @rightnow/forge-mcp-server

Claude Desktop

Add to your claude_desktop_config.json:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "forge": {
      "command": "npx",
      "args": ["-y", "@rightnow/forge-mcp-server"]
    }
  }
}

Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "forge": {
      "command": "cmd",
      "args": ["/c", "npx", "-y", "@rightnow/forge-mcp-server"]
    }
  }
}

VS Code / Copilot

Add to your .vscode/mcp.json (workspace) or user settings:

{
  "servers": {
    "forge": {
      "command": "npx",
      "args": ["-y", "@rightnow/forge-mcp-server"]
    }
  }
}

Windows: Use "command": "cmd" with "args": ["/c", "npx", "-y", "@rightnow/forge-mcp-server"]

Cursor

Add to your Cursor MCP settings (~/.cursor/mcp.json):

{
  "mcpServers": {
    "forge": {
      "command": "npx",
      "args": ["-y", "@rightnow/forge-mcp-server"]
    }
  }
}

Windows: Use "command": "cmd" with "args": ["/c", "npx", "-y", "@rightnow/forge-mcp-server"]

Windsurf

Add to your Windsurf MCP configuration:

{
  "mcpServers": {
    "forge": {
      "command": "npx",
      "args": ["-y", "@rightnow/forge-mcp-server"]
    }
  }
}

Windows: Use "command": "cmd" with "args": ["/c", "npx", "-y", "@rightnow/forge-mcp-server"]

OpenCode

Add to your opencode.json:

{
  "mcp": {
    "forge": {
      "command": "npx",
      "args": ["-y", "@rightnow/forge-mcp-server"]
    }
  }
}

Tools

`forge_auth`

Authenticate with the Forge service. Opens your browser to sign in via the RightNow dashboard. Required before using any other tool.

Inputs:
- force (boolean, optional): Force re-authentication even if valid tokens exist
Returns: Authentication status, email, plan type, and credit balance

`forge_optimize`

Submit PyTorch code for GPU kernel optimization. 32 swarm agents generate optimized Triton or CUDA kernels, evaluate them on real datacenter GPUs, and return the best result with speedup metrics.

The agent will automatically use this tool when it detects:

PyTorch custom operations (torch.autograd.Function, custom forward/backward)
Manual CUDA kernels that could be faster
Performance-critical tensor operations (attention, convolution, normalization, softmax)
Code with comments like "slow", "bottleneck", "optimize"
torch.compile() targets or triton.jit kernels
Any nn.Module with significant compute in forward()
Matrix multiplication, reduction, or scan operations
Custom loss functions with reduction operations
Fused operation opportunities (e.g., LayerNorm + activation)
Inputs:
- pytorch_code (string, required): Complete PyTorch code to optimize. Max 500 KB.
- kernel_name (string, required): Short name for the kernel (e.g., "flash_attention")
- output_format (enum, optional): "triton" (default) or "native_cuda"
- target_speedup (number, optional): Target speedup multiplier. Default 2.0
- max_iterations (number, optional): Max optimization iterations (1-100). Default 10
- gpu (enum, optional): Target GPU. Default "H100". Options: B200, H200, H100, L40S, A100, L4, A10, T4
- user_prompt (string, optional): Guidance for the optimizer (e.g., "focus on memory bandwidth")
Returns: Optimized kernel code, speedup metrics, latency comparison, iteration history

`forge_generate`

Generate an optimized GPU kernel from scratch based on a natural-language specification. Forge creates a PyTorch baseline, then optimizes it into Triton or CUDA.

Inputs:
- operation (string, required): Operation name (e.g., "fused_attention", "softmax")
- description (string, required): Detailed description of what the kernel should do
- input_shapes (number[][], required): Input tensor shapes (e.g., [[8, 512, 768]])
- output_shape (number[], optional): Expected output shape
- dtype (string, optional): Data type. Default "float16"
- output_format (enum, optional): "triton" (default) or "native_cuda"
- target_speedup (number, optional): Target speedup. Default 2.0
- max_iterations (number, optional): Max iterations (1-100). Default 10
- gpu (enum, optional): Target GPU. Default "H100"
- user_prompt (string, optional): Additional guidance
Returns: Generated kernel code, speedup metrics, iteration history

`forge_credits`

Check your current Forge credit balance.

Inputs: None
Returns: Credit balance, total purchased, total used, plan type

`forge_status`

Check the status of a running or completed optimization job.

Inputs:
- session_id (string, required): Session ID from forge_optimize or forge_generate
Returns: Job status, current iteration, best speedup

`forge_cancel`

Cancel a running optimization job.

Inputs:
- session_id (string, required): Session ID of the job to cancel
Returns: Cancellation confirmation

`forge_sessions`

List past optimization sessions with results.

Inputs:
- limit (number, optional): Number of sessions to return (1-100). Default 10
- status (enum, optional): Filter by status: "all", "completed", "failed", "running". Default "all"
Returns: Table of sessions with task name, GPU, speedup, status, and date

Tool Annotations

Tool	Read-only	Idempotent	Destructive
`forge_auth`	No	Yes	No
`forge_optimize`	No	No	No
`forge_generate`	No	No	No
`forge_credits`	Yes	Yes	No
`forge_status`	Yes	Yes	No
`forge_cancel`	No	No	Yes
`forge_sessions`	Yes	Yes	No

Resources

URI	Description
`forge://auth/status`	Current authentication state (authenticated, token expiry, has refresh token)
`forge://credits`	Credit balance, usage, and plan information

Prompts

`forge-optimize`

Guided workflow for optimizing a GPU kernel. Instructs the agent to:

Check credit balance
Analyze the code for optimization targets
Call forge_optimize with appropriate parameters
Explain the results and suggest integration

`forge-analyze`

Teaches the agent to scan a codebase for GPU optimization opportunities, ranked by expected impact:

Priority	Pattern
HIGH	Custom autograd functions, attention mechanisms, fused operations
MEDIUM	Standard `nn.Module` compositions, normalization + activation fusion
LOW	Element-wise operations, simple reductions

How It Works

┌──────────────┐     stdio      ┌──────────────────┐     HTTPS      ┌──────────────────┐
│  AI Agent    │ ──────────────>│  Forge MCP       │ ──────────────>│  Forge API       │
│  (Claude,    │                │  Server          │                │  (RightNow AI)   │
│   Cursor,    │<──────────────│                  │<──────────────│                  │
│   etc.)      │   MCP result   │  - OAuth + PKCE  │   SSE stream   │  - 32 swarm      │
└──────────────┘                │  - SSE streaming │                │    agents        │
                                │  - Token mgmt    │                │  - Real GPU      │
                                └──────────────────┘                │    benchmarking  │
                                                                    └──────────────────┘

Authenticate: The agent calls forge_auth, which opens your browser. Sign in once, tokens are stored locally at ~/.forge/tokens.json and auto-refresh.
Optimize: The agent sends your PyTorch code via forge_optimize. The MCP server POSTs to the Forge API and streams SSE events in real time.
Benchmark: 32 parallel Coder+Judge agents generate kernels, compile them, test correctness against the PyTorch reference, and profile performance on real datacenter GPUs.
Return: The MCP server collects all results and returns the optimized code, speedup metrics, and iteration history. The output is a drop-in replacement for your original code.

Each optimization costs 1 credit. Credits are only charged for successful runs (speedup >= 1.1x). Failed runs and cancelled jobs are not charged.

Configuration

Authentication

No API keys needed. The server uses OAuth 2.0 with PKCE for secure browser-based authentication:

Agent calls forge_auth
Your default browser opens to dashboard.rightnowai.co
Sign in or create an account
Authorization completes automatically
Tokens are stored locally at ~/.forge/tokens.json (mode 0600)
Access tokens auto-refresh, you only sign in once

Credits

Forge uses a pay-as-you-go credit system. Each optimization or generation run costs 1 credit.

Credits	Price	Per Credit
1-9	$15.00 each	$15.00
10+	25% off	$11.25
50	$562.50	$11.25
Enterprise	Custom volume pricing	Contact us

Free trial: optimize 1 kernel, no credit card required.

100% refund guarantee: if Forge doesn't beat torch.compile, you get your credit back.

Purchase credits at dashboard.rightnowai.co.

Benchmarks

End-to-end latency on NVIDIA B200. Forge vs torch.compile(mode='max-autotune-no-cudagraphs'):

Model	torch.compile	Forge	Speedup
Llama-3.1-8B	42.3ms	8.2ms	5.16x
Qwen2.5-7B	38.5ms	9.1ms	4.23x
Mistral-7B	35.2ms	10.4ms	3.38x
Phi-3-mini	18.7ms	6.8ms	2.75x
SDXL UNet	89.4ms	31.2ms	2.87x
Whisper-large	52.1ms	19.8ms	2.63x
BERT-large	12.4ms	5.1ms	2.43x

See the full benchmarks at rightnowai.co/forge.

Security

Token Protection

No tokens in errors: All error messages are sanitized through regex filters that strip JWTs, Bearer tokens, hex tokens, and credential parameters before reaching the agent
Local storage only: Tokens are stored at ~/.forge/tokens.json with file mode 0600 (owner read/write only)
Auto-refresh: Access tokens expire in 1 hour and auto-refresh using the stored refresh token
PKCE flow: OAuth uses Proof Key for Code Exchange (SHA-256), preventing authorization code interception
No secrets in config: The MCP server requires zero environment variables or API keys

Input Validation

PyTorch code input is capped at 500 KB to prevent memory exhaustion
User prompts are capped at 10 KB
All string inputs have maximum length validation via Zod schemas
Numeric inputs have min/max bounds (e.g., max_iterations: 1-100)

Network Security

All API communication uses HTTPS
Non-SSE requests have a 30-second timeout to prevent hanging
SSE streams have a 10-minute timeout with automatic cleanup
Token refresh uses a mutex to prevent race conditions from concurrent requests

What the server can access

Network: Only dashboard.rightnowai.co and forge-api.rightnowai.co
Filesystem: Only reads/writes ~/.forge/tokens.json
No codebase access: The MCP server never reads your files. The agent passes code to it explicitly through tool parameters.

Development

Build from source

git clone https://github.com/RightNow-AI/forge-mcp-server.git
cd forge-mcp-server
npm install
npm run build

Run locally

npm run dev

Type check

npm run typecheck

Debug with MCP Inspector

npx @modelcontextprotocol/inspector node dist/index.js

This opens a web UI where you can invoke each tool, inspect inputs/outputs, and debug the server interactively.

Project structure

forge-mcp-server/
├── src/
│   ├── index.ts              # Entry point (McpServer + StdioServerTransport)
│   ├── server.ts             # Registers all tools, resources, prompts
│   ├── constants.ts          # URLs, client IDs, timeouts, limits
│   ├── types.ts              # TypeScript interfaces + type guards + sanitization
│   ├── auth/
│   │   ├── oauth-client.ts   # PKCE flow, token refresh, access token management
│   │   └── token-store.ts    # ~/.forge/tokens.json read/write/clear
│   ├── api/
│   │   ├── forge-client.ts   # HTTP client for all Forge API endpoints
│   │   └── sse-consumer.ts   # SSE stream parser via native fetch + ReadableStream
│   ├── tools/                # 7 MCP tools
│   ├── resources/            # 2 MCP resources
│   └── prompts/              # 2 MCP prompts
├── .github/workflows/
│   ├── ci.yml                # Typecheck + build on push/PR
│   └── release.yml           # npm publish on version tags
├── package.json
├── tsconfig.json
└── tsup.config.ts

Contributing

Contributions are welcome. Please open an issue first to discuss what you'd like to change.

Fork the repo
Create a branch (git checkout -b feature/my-feature)
Make your changes
Run npm run typecheck and npm run build
Commit and push
Open a pull request

License

MIT

Part of the RightNow AI ecosystem. Member of the NVIDIA Inception Program.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
bin		bin
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
forge-logo.jpg		forge-logo.jpg
package-lock.json		package-lock.json
package.json		package.json
server.json		server.json
smithery.yaml		smithery.yaml
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts

License

RightNow-AI/forge-mcp-server

Folders and files

Latest commit

History

Repository files navigation

Forge MCP Server

Overview

What it does

Supported GPUs

Supported clients

Installation

Claude Code

Claude Desktop

VS Code / Copilot

Cursor

Windsurf

OpenCode

Tools

forge_auth

forge_optimize

forge_generate

forge_credits

forge_status

forge_cancel

forge_sessions

Tool Annotations

Resources

Prompts

forge-optimize

forge-analyze

How It Works

Configuration

Authentication

Credits

Benchmarks

Security

Token Protection

Input Validation

Network Security

What the server can access

Development

Build from source

Run locally

Type check

Debug with MCP Inspector

Project structure

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Languages

`forge_auth`

`forge_optimize`

`forge_generate`

`forge_credits`

`forge_status`

`forge_cancel`

`forge_sessions`

`forge-optimize`

`forge-analyze`

Packages