Skip to content

Carlos-Projects/palisade-scanner

Palisade Scanner πŸ”

PyPI Python 3.11+ License: MIT CI HuggingFace Space

Try it live on HuggingFace Spaces β€” paste a URL. Detect whether it contains hidden instructions targeting AI agents.

Scan web content for prompt injection, hidden instructions, and adversarial content targeting AI agents.

AI agents browse the web, read documents, and consume external content. Adversaries hide instructions in invisible text, HTML metadata, encoded payloads, and zero-width characters β€” Palisade finds them all.


Risk examples

Scenario Risk level What Palisade finds
Clean marketing page βœ… Low No hidden text, no injection patterns, no exfiltration
Hidden CSS prompt injection πŸ”΄ High display:none text with role override instructions
Metadata exfiltration prompt 🚨 Critical HTML comment + JSON-LD + base64-encoded data theft payload

What makes Palisade unique

Capability Palisade Scanner Manual review Generic scrapers
Hidden text detection βœ… 20+ CSS/HTML techniques ❌ ❌
Injection pattern matching βœ… 100+ regexes, 5 categories ❌ ❌
LLM-as-judge classifier βœ… understands adversarial intent N/A ❌
Metadata analysis βœ… comments, JSON-LD, meta, data attrs ❌ ❌
Exfiltration detection βœ… URLs, eval(), fetch(), redirects ❌ ❌
MCPGuard policy generation βœ… auto-generate rules ❌ ❌
CI/CD mode βœ… --ci --threshold high ❌ ❌
Zero-width character detection βœ… ❌ ❌

Why

AI agents browse the web, read documents, and consume external content. Adversaries can hide instructions in:

  • Invisible text (white-on-white, zero font size, off-screen positioning)
  • HTML comments and metadata
  • Base64 encoded payloads
  • Zero-width character injections
  • Instructions disguised as product descriptions or reviews

This scanner finds them all and tells you what to do about it.

Quick Start

# Install
pip install palisade-scanner

# CLI: scan a URL
pis scan https://example.com
# or
palisade scan https://example.com

# Web UI: open the dashboard
pis web

# Docker
docker compose up
# β†’ http://localhost:8000

Usage

CLI

# Scan a URL
pis scan https://example.com

# Scan a local file
pis scan --file suspicious.html

# Scan pasted text
pis scan --paste "<!-- ignore instructions -->"

# JSON output
pis scan https://example.com --format json

# CI/CD mode (exit code reflects risk)
pis scan https://example.com --ci --threshold high

# Generate MCPGuard policy rules
pis policies https://evil-site.com

API

# Scan via REST API
curl "http://localhost:8000/api/scan?url=https://example.com"

# HTML report
curl "http://localhost:8000/api/scan/https://example.com"

How It Works

Detection Layers

Layer What It Detects
Hidden Text Detector 20+ CSS/HTML hiding techniques (display:none, visibility, opacity, color matching, off-screen, zero-width chars, HTML comments)
Injection Pattern Matcher 100+ regex patterns across 5 categories (jailbreak, role override, exfiltration, tool manipulation, impersonation)
Instruction Classifier LLM-as-judge that understands adversarial intent (requires API key)
Metadata Analyzer HTML comments, JSON-LD, meta tags, data attributes, <noscript>, <template>
Exfiltration Detector URLs, endpoints, eval() patterns, redirect attempts, fetch() calls

Scoring

Risk Score: 0-100

Weighted formula:
  base = 100
  - critical * 25
  - high * 10
  - medium * 3
  - low * 1

Categories: none (0-5) β†’ low (6-20) β†’ medium (21-50) β†’ high (51-80) β†’ critical (81-100)

Architecture

User (CLI / Web / API)
        β”‚
        β–Ό
PipelineOrchestrator
        β”‚
        β”œβ”€β”€ Loader (URL / File / Paste / PDF)
        β”‚
        β”œβ”€β”€ Detector Pipeline (parallel)
        β”‚   β”œβ”€β”€ HiddenTextDetector
        β”‚   β”œβ”€β”€ InjectionPatternMatcher
        β”‚   β”œβ”€β”€ MetadataAnalyzer
        β”‚   β”œβ”€β”€ ExfiltrationDetector
        β”‚   └── InstructionClassifier (LLM)
        β”‚
        β”œβ”€β”€ ScoringEngine
        β”‚
        └── Reporters
            β”œβ”€β”€ JSON / Markdown / Simple
            β”œβ”€β”€ Policy Generator (MCPGuard)
            └── Web UI (HTMX)

Project Structure

src/scanner/
β”œβ”€β”€ cli.py              # Typer CLI
β”œβ”€β”€ api.py              # FastAPI web app
β”œβ”€β”€ config.py           # Settings (env vars)
β”œβ”€β”€ domain/
β”‚   β”œβ”€β”€ models.py       # Pydantic models
β”‚   └── scoring.py      # Risk score engine
β”œβ”€β”€ loaders/
β”‚   β”œβ”€β”€ url.py          # HTTP URL fetcher
β”‚   β”œβ”€β”€ pdf.py          # PDF extractor
β”‚   └── paste.py        # Raw text
β”œβ”€β”€ detectors/
β”‚   β”œβ”€β”€ hidden_text.py       # CSS/HTML hiding
β”‚   β”œβ”€β”€ injection_patterns.py # 100+ regex patterns
β”‚   β”œβ”€β”€ instruction_classifier.py  # LLM-as-judge
β”‚   β”œβ”€β”€ metadata_analyzer.py # Comments/meta/tags
β”‚   └── exfiltration.py     # Data theft patterns
β”œβ”€β”€ pipeline/
β”‚   └── orchestrator.py # Scan pipeline
β”œβ”€β”€ reporters/          # JSON/MD/Simple output
β”œβ”€β”€ policies/           # MCPGuard rule generation
└── utils/              # DOM helpers

Integration

MCPGuard

Generate rules compatible with MCPGuard:

pis scan https://evil-site.com --format mcpguard > rules.yaml
mcpguard load-rules rules.yaml

CI/CD

# .github/workflows/check-urls.yml
- name: Scan for prompt injection
  run: |
    pis scan ${{ matrix.url }} --ci --threshold medium

Roadmap

  • v0.1 β€” Scanner core: CLI, 5 detectors, scoring, policy generation
  • v0.2 β€” Live Monitor: scheduled re-scans, webhook alerts, diff detection
  • v0.3 β€” Agent Validator: Browser Use agent tests pages in real time
  • v0.4 β€” Content Safety Proxy: reverse proxy that strips injections
  • v0.5 β€” Reputation Engine: web of trust for agent-safe URLs
  • v0.6 β€” Red Team Lab: adversarial page generator + benchmark suite
  • v0.7 β€” Certification Pipeline: verified AgentSafe badges

Ecosystem

Palisade Scanner is part of the Carlos-Projects security infrastructure for AI agents:

Palisade Scanner    β†’  Scan content before agents consume it.  ← you are here
MCPwn               β†’  Attack MCP servers before attackers do.
AgentGate           β†’  Control how agents access your website.
MCPscop             β†’  Centralize scanner results and security posture.
MCPGuard            β†’  Runtime security proxy for MCP/A2A protocols.
  • MCPwn β€” Offensive security testing for MCP servers
  • AgentGate β€” Policy-based firewall and honeypot middleware for AI agents
  • MCPscop β€” Unified security dashboard for MCP/A2A scanner results
  • MCPGuard β€” Runtime security proxy for MCP/A2A protocols

License

MIT

About

Scan web content for prompt injection, hidden instructions, and adversarial content targeting AI agents

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages