💀 JailbreakLLM: Automated Red Team Arsenal

Research-grade jailbreaking & hardening platform for Frontier LLMs.

⚠️ Disclaimer: This tool is for authorized security testing and educational purposes only. The authors are not responsible for misuse.

🛡️ Security Warning

DO NOT DEPLOY THIS APPLICATION TO A PUBLIC URL. This application exposes powerful LLM capabilities and API endpoints that can be used to generate harmful content or incur significant costs. It is intended for local use only or deployment within a strictly controlled, authenticated private network.

No built-in authentication: The API routes are unprotected by default.
Cost risk: Malicious actors could trigger expensive fine-tuning or batch inference jobs.
Safety risk: The tool is designed to bypass safety filters; public exposure allows anyone to generate harmful content.

🚨 The Problem

Frontier models (DeepSeek, Llama 3, GPT-4) claim to be "aligned." They aren't.

Most red-teaming tools use toy prompts ("DAN mode") that are easily patched. JailbreakLLM implements 39 advanced attack vectors from top academic papers (USENIX '25, NeurIPS '24) to expose the real cracks in your model's safety.

⚡ Key Features

🔴 39 Research-Backed Attack Vectors

We implement the most effective attacks from recent literature:

Knowledge Decomposition (KDA): Breaks harmful tasks into benign sub-steps (96% Success).
Dual Intention Escape: Hides harm in benign "engineering briefs."
Chaos Chain: Iterative de-obfuscation that breaks reasoning models.
System Policy Override: Fakes "admin mode" privileges.

👉 View Full Attack Arsenal

🔬 Advanced Model-Based Scoring

We use a StrongREJECT-aligned evaluation system. Our dedicated LLM judge detects partial leaks and assigns granular risk scores (0-100) based on research benchmarks, ensuring that "successful" jailbreaks actually contain harmful information, not just refusals or incoherent text.

👉 Read the Research Basis

🔄 Configurable Resampling Strategy

Single-shot testing misses 40% of vulnerabilities. We allow customizable attack volume (default 10, up to 50 attempts) with research-optimized parameters (temp=0.2, top_p=0.95) to catch stochastic failures while managing API costs.

👉 See Parameter Optimization Guide

🛡️ Auto-Hardening Pipeline

Don't just break it. Fix it.

Generate Dataset: Converts successful jailbreaks into synthetic refusal samples.
Fine-Tune: One-click export to LoRA/SFT pipelines (via Nebius Token Factory).
Verify: Re-test the hardened model to ensure the vulnerability is closed.

🛠️ Quick Start

1. Clone & Install

git clone https://github.com/demianarc/jailbreakllm.git
cd jailbreakllm
npm install

2. Configure Environment

Create a .env.local file:

# Required for inference, fine-tuning, and judging
NEBIUS_API_KEY=your_key_here
NEBIUS_BASE_URL=https://api.tokenfactory.nebius.com/v1/

# Optional: Basic Auth for public deployment (Recommended)
BASIC_AUTH_USER=admin
BASIC_AUTH_PASSWORD=your_secure_password

3. Run the Platform

npm run dev

Open http://localhost:3000 and navigate to the Red Team Arsenal.

📚 Methodology

Our approach is grounded in "System 2 Thinking" and "Tree of Thoughts" methodologies applied to security testing:

Deep Analysis: We don't just spam prompts; we analyze the model's cognitive architecture (Reasoning vs Chat).
Iterative Refinement: Attacks like Chaos Chain and Reason Step-by-Step force the model to iterate on its own output, bypassing "System 1" safety filters.
Comprehensive Coverage: From simple fuzzing to complex persona hijacking, we test the entire surface area.

🤝 Contributing

We welcome contributions! If you've found a new jailbreak vector:

Add the definition to src/components/workflow/red-team-arsenal.tsx.
Implement the prompt generator in src/lib/pipeline.ts.
Submit a PR!

Please read CONTRIBUTING.md for details.

📜 License

MIT License. Hack responsibly.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
docs		docs
public		public
src		src
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

💀 JailbreakLLM: Automated Red Team Arsenal

🛡️ Security Warning

🚨 The Problem

⚡ Key Features

🔴 39 Research-Backed Attack Vectors

🔬 Advanced Model-Based Scoring

🔄 Configurable Resampling Strategy

🛡️ Auto-Hardening Pipeline

🛠️ Quick Start

1. Clone & Install

2. Configure Environment

3. Run the Platform

📚 Methodology

🤝 Contributing

📜 License

About

Uh oh!

Releases

Packages

Languages

License

demianarc/llmattack

Folders and files

Latest commit

History

Repository files navigation

💀 JailbreakLLM: Automated Red Team Arsenal

🛡️ Security Warning

🚨 The Problem

⚡ Key Features

🔴 39 Research-Backed Attack Vectors

🔬 Advanced Model-Based Scoring

🔄 Configurable Resampling Strategy

🛡️ Auto-Hardening Pipeline

🛠️ Quick Start

1. Clone & Install

2. Configure Environment

3. Run the Platform

📚 Methodology

🤝 Contributing

📜 License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages