Skip to content

calvin-fei/llm-driven-red-teaming

Repository files navigation


Logo

LLM Red Teaming

LLM-Driven Red Teaming: Evaluating the Robustness of Language Models

Report Bug . Request Feature

Contributors Forks Stargazers Issues License

Table Of Contents

About The Project

red-teaming-diagram

LLMs Vulnerabilities

Before deploying Large Language Models (LLMs) in real-world applications, it is crucial to address their robustness and reliability. Without proactive red-teaming, there is a risk of overlooking vulnerabilities or failure scenarios that could significantly impact user experience or cause material harm. As LLMs evolve and integrate into essential applications, the repercussions of potential failures become increasingly severe. Red-teaming involves simulating attacks or problematic scenarios to identify weaknesses before they can be exploited in real-world settings. This practice is vital for ensuring that LLMs can handle unexpected inputs and resist manipulation, thereby safeguarding the technology from causing unintended consequences when deployed in critical and sensitive environments.

LLMs Red Teaming

Applying Large Language Models (LLMs) in automated red-teaming represents an innovative approach to enhancing cybersecurity and system robustness. In this application, LLMs are used to automate the generation of attack scenarios, simulate potential threats, and identify vulnerabilities across various digital infrastructures. By leveraging the natural language understanding and generation capabilities of LLMs, automated systems can rapidly construct and iterate over a diverse set of attack vectors and defensive strategies. This allows organizations to preemptively discover weaknesses in their systems and software before malicious actors can exploit them. Furthermore, the use of LLMs in automated red-teaming can provide more scalable, efficient, and dynamic security testing, adapting to new threats as they evolve and ensuring continuous protection in a rapidly changing digital landscape.

Red Teaming Strategies

The project simulates various red-teaming strategies to evaluate the robustness of LLMs, by challenging a target LLM with a red-team LLM in a series of interactions.

Performance Strategy

  • Performance Simulation: Simulate a potential user with non-toxic questions to test the normal behavior of the target LLM.
  • Toxicity Simulation: Simulate a user with toxic questions to test the target LLM's response to harmful inputs.

Manipulation Strategy

  • Gaslighting Simulation: Simulate an agent that manipulates the target LLM into performing harmful actions.
  • Guilt-Tripping Simulation: Simulate an agent that coerces the target LLM into undesired actions by inducing guilt.

Deception Strategy

  • Fraudulent Researcher Simulation: Simulate an agent that seeks harmful actions under the pretense of academic research.
  • Social Engineering Attack Simulation: Simulate an attacker seeking confidential information from a company.

Injection Strategy

  • Prompt Injection Simulation: Simulate prompt injection attacks by injecting malicious prefixes or suffixes to the original prompt to elicit harmful behavior.

Project Structure

.
├── LICENSE
├── README.md
├── examples
├── images
├── poetry.lock
├── pyproject.toml
├── redteam  # Main package
│   ├── __init__.py
│   ├── agents  # Agents that interact with the target LLM
│   ├── evaluators
│   ├── generators
│   ├── llms  # Large Language Models
│   └── simulators
├── references.md
├── simulator_mistral.py
├── simulator_rag.py
└── tests

Built With

Installation

  1. Clone the repo

    git clone https://github.com/ChengaFEI/llm-driven-red-teaming.git
  2. Install Python packages

    poetry install

Usage

# Step 1: Load the package
from redteam.simulators.performance_simulator import PerformanceSimulator

# Step 2: Set up environment variables or set it in `.env` file
openai_api_key = 'Your OpenAI API Key'
n_turns = 5
data_path = 'Your txt document for RAG'

# Step 3: Run the simulation
PerformanceSimulator(
    openai_api_key=openai_api_key,
    n_turns=n_turns,
    data_path = data_path,
).simulate()

Roadmap

See the open issues for a list of proposed features (and known issues).

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

  • If you have suggestions for adding or removing projects, feel free to open an issue to discuss it, or directly create a pull request after you edit the README.md file with necessary changes.
  • Please make sure you check your spelling and grammar.
  • Create individual PR for each suggestion.
  • Please also read through the Code Of Conduct before posting your first idea as well.

Creating A Pull Request

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the MIT License. See LICENSE for more information.

Authors

  • Cheng Fei - MEng CS student - Cheng Fei - Built the project

Acknowledgements

About

Automated red teaming generator driven by LLM.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages