LLM-Driven Red Teaming: Evaluating the Robustness of Language Models
Report Bug
.
Request Feature
- Table Of Contents
- About The Project
- Red Teaming Strategies
- Project Structure
- Built With
- Installation
- Roadmap
- Contributing
- License
- Authors
- Acknowledgements
Before deploying Large Language Models (LLMs) in real-world applications, it is crucial to address their robustness and reliability. Without proactive red-teaming, there is a risk of overlooking vulnerabilities or failure scenarios that could significantly impact user experience or cause material harm. As LLMs evolve and integrate into essential applications, the repercussions of potential failures become increasingly severe. Red-teaming involves simulating attacks or problematic scenarios to identify weaknesses before they can be exploited in real-world settings. This practice is vital for ensuring that LLMs can handle unexpected inputs and resist manipulation, thereby safeguarding the technology from causing unintended consequences when deployed in critical and sensitive environments.
Applying Large Language Models (LLMs) in automated red-teaming represents an innovative approach to enhancing cybersecurity and system robustness. In this application, LLMs are used to automate the generation of attack scenarios, simulate potential threats, and identify vulnerabilities across various digital infrastructures. By leveraging the natural language understanding and generation capabilities of LLMs, automated systems can rapidly construct and iterate over a diverse set of attack vectors and defensive strategies. This allows organizations to preemptively discover weaknesses in their systems and software before malicious actors can exploit them. Furthermore, the use of LLMs in automated red-teaming can provide more scalable, efficient, and dynamic security testing, adapting to new threats as they evolve and ensuring continuous protection in a rapidly changing digital landscape.
The project simulates various red-teaming strategies to evaluate the robustness of LLMs, by challenging a target LLM with a red-team LLM in a series of interactions.
- Performance Simulation: Simulate a potential user with non-toxic questions to test the normal behavior of the target LLM.
- Toxicity Simulation: Simulate a user with toxic questions to test the target LLM's response to harmful inputs.
- Gaslighting Simulation: Simulate an agent that manipulates the target LLM into performing harmful actions.
- Guilt-Tripping Simulation: Simulate an agent that coerces the target LLM into undesired actions by inducing guilt.
- Fraudulent Researcher Simulation: Simulate an agent that seeks harmful actions under the pretense of academic research.
- Social Engineering Attack Simulation: Simulate an attacker seeking confidential information from a company.
- Prompt Injection Simulation: Simulate prompt injection attacks by injecting malicious prefixes or suffixes to the original prompt to elicit harmful behavior.
.
├── LICENSE
├── README.md
├── examples
├── images
├── poetry.lock
├── pyproject.toml
├── redteam # Main package
│ ├── __init__.py
│ ├── agents # Agents that interact with the target LLM
│ ├── evaluators
│ ├── generators
│ ├── llms # Large Language Models
│ └── simulators
├── references.md
├── simulator_mistral.py
├── simulator_rag.py
└── tests-
Clone the repo
git clone https://github.com/ChengaFEI/llm-driven-red-teaming.git
-
Install Python packages
poetry install
# Step 1: Load the package
from redteam.simulators.performance_simulator import PerformanceSimulator
# Step 2: Set up environment variables or set it in `.env` file
openai_api_key = 'Your OpenAI API Key'
n_turns = 5
data_path = 'Your txt document for RAG'
# Step 3: Run the simulation
PerformanceSimulator(
openai_api_key=openai_api_key,
n_turns=n_turns,
data_path = data_path,
).simulate()See the open issues for a list of proposed features (and known issues).
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
- If you have suggestions for adding or removing projects, feel free to open an issue to discuss it, or directly create a pull request after you edit the README.md file with necessary changes.
- Please make sure you check your spelling and grammar.
- Create individual PR for each suggestion.
- Please also read through the Code Of Conduct before posting your first idea as well.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Distributed under the MIT License. See LICENSE for more information.
- Cheng Fei - MEng CS student - Cheng Fei - Built the project
