GitHub - calvin-fei/llm-driven-red-teaming: Automated red teaming generator driven by LLM.

LLM Red Teaming

LLM-Driven Red Teaming: Evaluating the Robustness of Language Models

Report Bug . Request Feature

About The Project

LLMs Vulnerabilities

Before deploying Large Language Models (LLMs) in real-world applications, it is crucial to address their robustness and reliability. Without proactive red-teaming, there is a risk of overlooking vulnerabilities or failure scenarios that could significantly impact user experience or cause material harm. As LLMs evolve and integrate into essential applications, the repercussions of potential failures become increasingly severe. Red-teaming involves simulating attacks or problematic scenarios to identify weaknesses before they can be exploited in real-world settings. This practice is vital for ensuring that LLMs can handle unexpected inputs and resist manipulation, thereby safeguarding the technology from causing unintended consequences when deployed in critical and sensitive environments.

LLMs Red Teaming

Applying Large Language Models (LLMs) in automated red-teaming represents an innovative approach to enhancing cybersecurity and system robustness. In this application, LLMs are used to automate the generation of attack scenarios, simulate potential threats, and identify vulnerabilities across various digital infrastructures. By leveraging the natural language understanding and generation capabilities of LLMs, automated systems can rapidly construct and iterate over a diverse set of attack vectors and defensive strategies. This allows organizations to preemptively discover weaknesses in their systems and software before malicious actors can exploit them. Furthermore, the use of LLMs in automated red-teaming can provide more scalable, efficient, and dynamic security testing, adapting to new threats as they evolve and ensuring continuous protection in a rapidly changing digital landscape.

Red Teaming Strategies

The project simulates various red-teaming strategies to evaluate the robustness of LLMs, by challenging a target LLM with a red-team LLM in a series of interactions.

Performance Strategy

Performance Simulation: Simulate a potential user with non-toxic questions to test the normal behavior of the target LLM.
Toxicity Simulation: Simulate a user with toxic questions to test the target LLM's response to harmful inputs.

Manipulation Strategy

Gaslighting Simulation: Simulate an agent that manipulates the target LLM into performing harmful actions.
Guilt-Tripping Simulation: Simulate an agent that coerces the target LLM into undesired actions by inducing guilt.

Deception Strategy

Fraudulent Researcher Simulation: Simulate an agent that seeks harmful actions under the pretense of academic research.
Social Engineering Attack Simulation: Simulate an attacker seeking confidential information from a company.

Injection Strategy

Prompt Injection Simulation: Simulate prompt injection attacks by injecting malicious prefixes or suffixes to the original prompt to elicit harmful behavior.

Project Structure

.
├── LICENSE
├── README.md
├── examples
├── images
├── poetry.lock
├── pyproject.toml
├── redteam  # Main package
│   ├── __init__.py
│   ├── agents  # Agents that interact with the target LLM
│   ├── evaluators
│   ├── generators
│   ├── llms  # Large Language Models
│   └── simulators
├── references.md
├── simulator_mistral.py
├── simulator_rag.py
└── tests

Built With

Installation

Clone the repo

git clone https://github.com/ChengaFEI/llm-driven-red-teaming.git

Install Python packages
```
poetry install
```

Usage

# Step 1: Load the package
from redteam.simulators.performance_simulator import PerformanceSimulator

# Step 2: Set up environment variables or set it in `.env` file
openai_api_key = 'Your OpenAI API Key'
n_turns = 5
data_path = 'Your txt document for RAG'

# Step 3: Run the simulation
PerformanceSimulator(
    openai_api_key=openai_api_key,
    n_turns=n_turns,
    data_path = data_path,
).simulate()

Roadmap

See the open issues for a list of proposed features (and known issues).

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have suggestions for adding or removing projects, feel free to open an issue to discuss it, or directly create a pull request after you edit the README.md file with necessary changes.
Please make sure you check your spelling and grammar.
Create individual PR for each suggestion.
Please also read through the Code Of Conduct before posting your first idea as well.

Creating A Pull Request

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Red Teaming

Table Of Contents

About The Project

LLMs Vulnerabilities

LLMs Red Teaming

Red Teaming Strategies

Performance Strategy

Manipulation Strategy

Deception Strategy

Injection Strategy

Project Structure

Built With

Installation

Usage

Roadmap

Contributing

Creating A Pull Request

License

Authors

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
examples		examples
images		images
redteam		redteam
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
references.md		references.md
simulator_mistral.py		simulator_mistral.py
simulator_rag.py		simulator_rag.py

Folders and files

Latest commit

History

Repository files navigation

LLM Red Teaming

Table Of Contents

About The Project

LLMs Vulnerabilities

LLMs Red Teaming

Red Teaming Strategies

Performance Strategy

Manipulation Strategy

Deception Strategy

Injection Strategy

Project Structure

Built With

Installation

Usage

Roadmap

Contributing

Creating A Pull Request

License

Authors

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages