Troubleshooting Guide

This guide covers common issues when running AgentGate, organized by Symptom → Cause → Solution.

Webhook Failures
429 Rate Limit Errors
Decision Token Errors
Slack Bot Not Posting
Discord Bot Not Responding
Policies Not Matching
Database Migration Failures
Docker Networking Issues

Webhook Failures

Symptom: Webhook deliveries silently fail

Cause: The webhook URL is unreachable from the server, or the target returns a non-2xx status code.

Solution:

Check webhook delivery status via GET /api/webhooks — each webhook tracks delivery attempts.
Verify the URL is reachable from the server container (see Docker Networking).
Check that WEBHOOK_TIMEOUT_MS (default: 5000) is sufficient for your endpoint. Increase if your endpoint is slow:
```
WEBHOOK_TIMEOUT_MS=10000
```
Failed deliveries are retried via a DB-based retry scanner (runs every 30s) with exponential backoff: 2^attempt × 1000ms — so attempt 1 retries after ~2s, attempt 2 after ~4s, attempt 3 after ~8s. Maximum 3 total attempts (1 initial + 2 retries). Check server logs for retry attempts.

Symptom: Webhook signature verification fails on receiver

Cause: The secret used when creating the webhook doesn't match what the receiver is using to verify X-AgentGate-Signature.

Solution:

The signature is HMAC-SHA256(secret, raw_body) — ensure you're hashing the raw request body, not a parsed/re-serialized version.
Re-create the webhook with a known secret if unsure.
If using WEBHOOK_ENCRYPTION_KEY, note that this encrypts secrets at rest in the database — it does not affect the HMAC signature sent to your endpoint.

Symptom: `Invalid webhook URL` error (400)

Cause: AgentGate validates webhook URLs against SSRF attacks. Private/internal IPs and non-HTTP(S) schemes are rejected.

Solution:

Use a publicly routable URL or a DNS name that resolves to a public IP.
In development, if you need to target localhost, you may need to use the Docker service name (e.g., http://host.docker.internal:4000).

429 Rate Limit Errors

Symptom: API returns `429 Too Many Requests`

Cause: The API key has exceeded its per-minute request limit.

Solution:

Check the rate limit headers in the response:
- X-RateLimit-Limit — max requests per minute
- X-RateLimit-Remaining — remaining in current window
- X-RateLimit-Reset — seconds until window resets
Wait for the reset window, then retry.
Increase the per-key rate limit via POST /api/keys (set rateLimit to a higher value, or null for unlimited).
The global default is controlled by RATE_LIMIT_RPM (default: 60). Adjust if needed:
```
RATE_LIMIT_RPM=120
```
To disable rate limiting entirely (not recommended for production):
```
RATE_LIMIT_ENABLED=false
```

Symptom: Rate limiting not working (no 429s despite high traffic)

Cause: Rate limiting may be disabled, or using the memory backend which resets on server restart and doesn't share state across instances.

Solution:

Verify RATE_LIMIT_ENABLED=true (default).
For multi-instance deployments, switch to Redis backend:
```
RATE_LIMIT_BACKEND=redis
REDIS_URL=redis://redis:6379
```

Decision Token Errors

Symptom: `This decision link has expired`

Cause: Decision tokens have a configurable TTL. Default is 24 hours (DECISION_TOKEN_EXPIRY_HOURS=24).

Solution:

Request a new approval — the agent or user must create a new request.
Increase the token expiry if your approval workflows are long-running:
```
DECISION_TOKEN_EXPIRY_HOURS=72
```

Symptom: `This decision link is invalid or has been removed`

Cause: The token was already used, doesn't exist, or was cleaned up by the retention policy.

Solution:

Each decision token is single-use. Once a request is approved or denied, all sibling tokens are cross-invalidated.
Check if the request was already decided: GET /api/requests/:id.
Expired tokens are cleaned up after CLEANUP_RETENTION_DAYS (default: 30).

Symptom: `This request has already been approved/denied` (409)

Cause: Another approver (via dashboard, Slack, Discord, or another token) already decided this request.

Solution:

This is expected behavior — AgentGate uses atomic conditional updates to prevent double-decisions. Check the request status via GET /api/requests/:id to see who decided and when.

Slack Bot Not Posting

Symptom: Slack bot starts but doesn't post approval requests

Cause: Missing or incorrect channel configuration, or the bot lacks permissions.

Solution:

Ensure required environment variables are set:

SLACK_BOT_TOKEN=xoxb-...        # Bot User OAuth Token
SLACK_SIGNING_SECRET=...         # From Slack app settings
AGENTGATE_API_KEY=agk_...       # API key for server communication

Set a default channel: SLACK_DEFAULT_CHANNEL=C01234ABCDE (use channel ID, not name).
Verify the bot is invited to the channel (/invite @YourBot in Slack).
Check the bot has these OAuth scopes: chat:write, channels:read, groups:read.
Review bot logs — the bot prints configuration status on startup (✓/✗ for each setting).

Symptom: Slack bot crashes on startup with `Missing SLACK_BOT_TOKEN`

Cause: The SLACK_BOT_TOKEN environment variable is not set.

Solution:

Set the token from your Slack app's OAuth & Permissions page.
If using Docker secrets, use SLACK_BOT_TOKEN_FILE pointing to the secret file.

Symptom: Slack interactive buttons don't work

Cause: Slack can't reach the bot's HTTP endpoint for interactivity.

Solution:

The Slack bot runs on SLACK_BOT_PORT (default: 3001). Ensure this port is accessible from the internet (or use a tunnel like ngrok in development).
Set the Request URL in your Slack app's Interactivity settings to point to the bot.

Discord Bot Not Responding

Symptom: Discord bot starts but doesn't post or respond to buttons

Cause: Missing bot token, wrong channel ID, or insufficient bot permissions.

Solution:

Ensure required environment variables are set:

DISCORD_BOT_TOKEN=...              # Bot token from Discord Developer Portal
AGENTGATE_URL=http://localhost:3000 # AgentGate server URL

Set a default channel: DISCORD_CHANNEL_ID=123456789012345678.
Verify the bot has these permissions in the target channel:
- Send Messages
- Embed Links
- Use External Emojis
- Read Message History
Ensure the bot is added to your server with the correct OAuth2 scopes (bot, applications.commands).

Symptom: Discord bot crashes with `Missing DISCORD_BOT_TOKEN`

Cause: The DISCORD_BOT_TOKEN environment variable is not set.

Solution:

Get the token from Discord Developer Portal → Bot → Token.
If using Docker secrets, use DISCORD_BOT_TOKEN_FILE.

Symptom: Decision links in Discord messages don't work

Cause: DECISION_LINK_BASE_URL is not configured, so the bot can't generate clickable approve/deny URLs.

Solution:

Set the base URL to your AgentGate server's public address:
```
DECISION_LINK_BASE_URL=https://gate.example.com
```
To disable links entirely: DISCORD_INCLUDE_LINKS=false.

Policies Not Matching

Symptom: Requests always go to `pending` even though a matching policy exists

Cause: Policy priority ordering, disabled policies, or match criteria not aligning with the request.

Solution:

List policies via GET /api/policies and verify:
- The policy is enabled: true.
- The priority is correct — lower numbers = higher priority. The first matching policy wins.
- The match criteria actually match the request's action, params, or context.
Check match syntax:
- Exact match: { "action": "send_email" }
- Regex match: { "action": { "$regex": "^send_" } } — note that regex patterns are validated for ReDoS safety; overly complex patterns are rejected.
Review the audit trail (GET /api/requests/:id/audit) to see which policy was evaluated.

Symptom: `Unsafe regex pattern` error when creating a policy

Cause: AgentGate uses safe-regex2 to reject patterns vulnerable to ReDoS (Regular Expression Denial of Service).

Solution:

Simplify the regex. Avoid nested quantifiers like (a+)+ or (a|b)*c*.
Use exact string matches when possible — they're faster and safer.

Database Migration Failures

Symptom: Server fails to start with database errors

Cause: Migrations haven't been run, or there's a schema mismatch.

Solution:

SQLite (default): Run migrations manually:

pnpm --filter @agentgate/server db:migrate

PostgreSQL: Ensure the database exists and is reachable:

DB_DIALECT=postgres
DATABASE_URL=postgresql://agentgate:agentgate@localhost:5432/agentgate

In Docker, migrations run automatically on server startup. Check server logs for migration errors:
```
docker-compose logs server | grep -i migrat
```

Symptom: `SQLITE_CANTOPEN` or permission errors

Cause: The SQLite database file path is not writable.

Solution:

Default path is ./data/agentgate.db. Ensure the data/ directory exists and is writable.
In Docker, the data directory is mounted as a volume. Verify volume permissions.

Symptom: PostgreSQL `connection refused`

Cause: PostgreSQL is not running or the connection string is wrong.

Solution:

Verify PostgreSQL is healthy:

docker-compose ps postgres
docker-compose logs postgres

Check DATABASE_URL format: postgresql://USER:PASSWORD@HOST:PORT/DBNAME.
In Docker Compose, use the service name as host: postgresql://agentgate:agentgate@postgres:5432/agentgate.

Docker Networking Issues

Symptom: Server can't connect to PostgreSQL or Redis

Cause: Services are on different Docker networks, or using localhost instead of Docker service names.

Solution:

Use Docker Compose service names for inter-container communication:

DATABASE_URL=postgresql://agentgate:agentgate@postgres:5432/agentgate
REDIS_URL=redis://redis:6379

All services must be on the same network. The default docker-compose.yml uses agentgate-internal for backend services and agentgate-public for exposed services.
Never use localhost inside containers to reach other containers — localhost refers to the container itself.

Symptom: Dashboard can't reach the API server

Cause: The dashboard (nginx) proxies API requests to the server container. If the server isn't running or isn't on the same network, requests fail.

Solution:

Verify both services are running: docker-compose ps.
The dashboard typically expects the API at http://server:3000 via Docker networking. Check the nginx config in the dashboard image.
If accessing the dashboard from outside Docker, ensure CORS_ALLOWED_ORIGINS includes the dashboard's public URL.

Symptom: Bot containers can't reach the AgentGate server

Cause: Bot services (Slack, Discord) need to communicate with the server over the Docker network.

Solution:

Set AGENTGATE_URL=http://server:3000 in the bot container environment.
Ensure bot services are on the agentgate-internal network (they should be if using the --profile bots flag).

Verify with:

docker-compose exec slack wget -qO- http://server:3000/health

General Debugging Tips

Check server health

curl http://localhost:3000/health

View structured logs

Set LOG_LEVEL=debug and LOG_FORMAT=json for detailed, parseable logs:

LOG_LEVEL=debug LOG_FORMAT=json pnpm --filter @agentgate/server dev

Validate configuration

The server validates all environment variables at startup using Zod schemas. If a required variable is missing or invalid, the error message will indicate which field failed validation.

File-based secrets

For Docker deployments, use _FILE suffixed environment variables to read secrets from mounted files (e.g., ADMIN_API_KEY_FILE=/run/secrets/admin_api_key). The explicit env var takes precedence if both are set.

Production checklist

Run NODE_ENV=production to enable production validations. The server will warn if:

ADMIN_API_KEY is not set
JWT_SECRET is not set
CORS_ALLOWED_ORIGINS is not configured
WEBHOOK_ENCRYPTION_KEY is not set

FilesExpand file tree

troubleshooting.md

Latest commit

History

troubleshooting.md

File metadata and controls

Troubleshooting Guide

Table of Contents

Webhook Failures

Symptom: Webhook deliveries silently fail

Symptom: Webhook signature verification fails on receiver

Symptom: Invalid webhook URL error (400)

429 Rate Limit Errors

Symptom: API returns 429 Too Many Requests

Symptom: Rate limiting not working (no 429s despite high traffic)

Decision Token Errors

Symptom: This decision link has expired

Symptom: This decision link is invalid or has been removed

Symptom: This request has already been approved/denied (409)

Slack Bot Not Posting

Symptom: Slack bot starts but doesn't post approval requests

Symptom: Slack bot crashes on startup with Missing SLACK_BOT_TOKEN

Symptom: Slack interactive buttons don't work

Discord Bot Not Responding

Symptom: Discord bot starts but doesn't post or respond to buttons

Symptom: Discord bot crashes with Missing DISCORD_BOT_TOKEN

Symptom: Decision links in Discord messages don't work

Policies Not Matching

Symptom: Requests always go to pending even though a matching policy exists

Symptom: Unsafe regex pattern error when creating a policy

Database Migration Failures

Symptom: Server fails to start with database errors

Symptom: SQLITE_CANTOPEN or permission errors

Symptom: PostgreSQL connection refused

Docker Networking Issues

Symptom: Server can't connect to PostgreSQL or Redis

Symptom: Dashboard can't reach the API server

Symptom: Bot containers can't reach the AgentGate server

General Debugging Tips

Check server health

View structured logs

Validate configuration

File-based secrets

Production checklist

Symptom: `Invalid webhook URL` error (400)

Symptom: API returns `429 Too Many Requests`

Symptom: `This decision link has expired`

Symptom: `This decision link is invalid or has been removed`

Symptom: `This request has already been approved/denied` (409)

Symptom: Slack bot crashes on startup with `Missing SLACK_BOT_TOKEN`

Symptom: Discord bot crashes with `Missing DISCORD_BOT_TOKEN`

Symptom: Requests always go to `pending` even though a matching policy exists

Symptom: `Unsafe regex pattern` error when creating a policy

Symptom: `SQLITE_CANTOPEN` or permission errors

Symptom: PostgreSQL `connection refused`