Skip to content
Change the repository type filter

All

    Repositories list

    • HarmfulSkillBench

      Public
      The Official Repository for Paper "HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents?"
      Python
      MIT License
      1500Updated Apr 20, 2026Apr 20, 2026
    • Prompt_Injection_Assessment

      Public
      Python
      0000Updated Apr 17, 2026Apr 17, 2026
    • DE-CLIP

      Public
      The Official Repository for ACL 2026 Paper "DE-CLIP: Few-Shot Anomaly Detection via Difference-Guided Embedding Editing"
      0000Updated Apr 17, 2026Apr 17, 2026
    • This is the official repository of the ACL 2026 Findings paper: InferPilot: Autonomous Inference Attacks Against ML Services With LLM-Based Agents
      Python
      0000Updated Apr 15, 2026Apr 15, 2026
    • PeerCheck

      Public
      PeerCheck: dataset for evaluating LLM-generated academic reviews
      Apache License 2.0
      0100Updated Apr 14, 2026Apr 14, 2026
    • AP-Test

      Public
      Official repo for of the ACL 2026 paper "Peering Behind the Shield: Guardrail Identification in Large Language Models"
      Python
      MIT License
      0100Updated Apr 11, 2026Apr 11, 2026
    • UnsafeMoE

      Public
      This repository is for the paper "Sparse Models, Sparse Safety: Unsafe Routes in Mixture-of-Experts LLMs."
      Python
      MIT License
      0300Updated Feb 10, 2026Feb 10, 2026
    • JAIL-CON

      Public
      [NeurIPS'25] Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency (https://arxiv.org/abs/2510.21189)
      Python
      Creative Commons Attribution 4.0 International
      1210Updated Jan 18, 2026Jan 18, 2026
    • Python
      Apache License 2.0
      0500Updated Jan 4, 2026Jan 4, 2026
    • Python
      0200Updated Dec 9, 2025Dec 9, 2025
    • Official Website of JADES
      SCSS
      MIT License
      0000Updated Sep 12, 2025Sep 12, 2025
    • T-GPS

      Public
      Python
      Apache License 2.0
      0300Updated Sep 7, 2025Sep 7, 2025
    • JADES

      Public
      This is the public code repository of paper 'JADES: A Universal Framework for Jailbreak Assessment via Decompositional Scoring'
      0600Updated Aug 27, 2025Aug 27, 2025
    • GPTracker

      Public
      [S&P'25] GPTracker: A Large-Scale Measurement of Misused GPTs
      Python
      GNU General Public License v3.0
      11200Updated Jul 25, 2025Jul 25, 2025
    • SaferVLM

      Public
      Python
      2910Updated Jul 19, 2025Jul 19, 2025
    • Python
      88700Updated Jun 8, 2025Jun 8, 2025
    • [ACL2025] Official repository for "Are We in the AI-Generated Text World Already? Quantifying and Monitoring AIGT on Social Media"
      Python
      1800Updated May 29, 2025May 29, 2025
    • This is the public code repository for the paper 'Reconstruct Your Previous Conversations! Comprehensively Investigating Privacy Leakage Risks in Conversations …
      Python
      11000Updated May 21, 2025May 21, 2025
    • HateBench

      Public
      [USENIX'25] HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns
      Apache License 2.0
      31400Updated Mar 1, 2025Mar 1, 2025
    • [Usenix Security 2025] Synthetic Artifact Auditing: Tracing LLM-Generated Synthetic Data Usage in Downstream Applications
      Python
      Apache License 2.0
      1500Updated Jan 29, 2025Jan 29, 2025
    • [Usenix Security 2025] On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts
      Python
      Apache License 2.0
      0510Updated Jan 29, 2025Jan 29, 2025
    • Apache License 2.0
      0100Updated Jan 28, 2025Jan 28, 2025
    • ModSCAN

      Public
      An official public repository of the paper "ModSCAN: Measuring Stereotypical Bias in Large Vision-Language Models from Vision and Language Modalities" (https://…
      Python
      MIT License
      1300Updated Jan 8, 2025Jan 8, 2025
    • ICL-MIA

      Public
      Python
      0510Updated Dec 19, 2024Dec 19, 2024
    • Python
      0900Updated Dec 18, 2024Dec 18, 2024
    • JavaScript
      MIT License
      0810Updated Oct 30, 2024Oct 30, 2024
    • ZeroFake

      Public
      Python
      21110Updated Oct 30, 2024Oct 30, 2024
    • homepage

      Public
      JavaScript
      MIT License
      0000Updated Oct 14, 2024Oct 14, 2024
    • MIT License
      0000Updated Aug 28, 2024Aug 28, 2024
    • ML-Doctor

      Public
      Code for ML Doctor
      Python
      MIT License
      0600Updated Aug 14, 2024Aug 14, 2024
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.