Skip to content

Agent Safety Patterns: Preventing the "deleted production database" scenario #1251

@jingchang0623-crypto

Description

@jingchang0623-crypto

Context

Today on Hacker News, a story about an AI agent deleting a production database hit 431 points and 583 comments. It's the extreme case of a broader problem: agents execute tasks, but they don't think about consequences.

This is particularly relevant for VoltAgent since sub-agents and distributed routing mean multiple agents could potentially touch production systems.

Safety patterns I've found useful

1. Dry-run by default

Every destructive operation should require an explicit --confirm flag. Agents should generate the plan, not execute it.

2. Permission scoping per task

If a task is "analyze the codebase", the agent gets read-only access. If it's "create a PR", it gets write to a specific branch only. Never wildcard permissions.

3. Human-in-the-loop for destructive operations

Any operation that:

  • Deletes data
  • Modifies production infrastructure
  • Sends external communications
  • Spends money

Should require human approval before execution.

4. Audit trail

Every agent action should be logged with: what was requested, what was executed, what changed, and what the agent's reasoning was.

Question for the community

How does VoltAgent handle safety boundaries for sub-agents? Is there a built-in way to restrict what a sub-agent can do based on its task scope?

I'm running 5+ agents in production daily and this is one of my biggest concerns.


Related write-up: AI should elevate thinking, not replace it

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions