Skip to content

feat: add owasp llm04 data poisoning rule pack#18

Merged
ogulcanaydogan merged 1 commit into
mainfrom
feat/v1.4.0-llm04-data-poisoning-rules
May 27, 2026
Merged

feat: add owasp llm04 data poisoning rule pack#18
ogulcanaydogan merged 1 commit into
mainfrom
feat/v1.4.0-llm04-data-poisoning-rules

Conversation

@ogulcanaydogan
Copy link
Copy Markdown
Owner

Summary

  • New rule file rules/owasp-llm04-data-poisoning.yaml adds 6 detection patterns for OWASP LLM04 (2025) Data Poisoning scenarios: adversarial example construction, backdoor trigger phrases, cross-session memory contamination, federated learning gradient poisoning, synthetic training data injection, and RLHF reward hacking.
  • Starts the v1.4.0 ROADMAP item to close partial OWASP LLM03/04/06/08/09 coverage; LLM04 previously had only 2 patterns in owasp-llm-top10.yaml (training manipulation, persistent rule injection) -- the new file is additive, no existing rules changed.
  • pkg/rules/loader_test.go updated to reflect 4 rule packs in rules/ (was 3); TestLoadDir rule-count assertion updated accordingly.
  • No changes to loader, detector, or proxy code -- new YAML is auto-discovered by LoadDir at startup.

Test plan

  • go test ./pkg/rules/... passes (all 4 rule files loaded, all patterns compile, count assertion updated)
  • go test ./... full suite passes with no regressions
  • CI Lint (Go) + Security Scan green (YAML-only addition, no Go code changes)

New rule file rules/owasp-llm04-data-poisoning.yaml adds 6 patterns
for OWASP LLM04 (2025) Data Poisoning attack scenarios not covered by
the 2 existing LLM04 rules in owasp-llm-top10.yaml:

- PIF-LLM04-003: adversarial example construction (FGSM, C&W, PGD)
- PIF-LLM04-004: backdoor trigger phrase injection
- PIF-LLM04-005: cross-session memory contamination
- PIF-LLM04-006: federated learning gradient poisoning
- PIF-LLM04-007: synthetic training data injection to bias a model
- PIF-LLM04-008: RLHF reward hacking instruction

loader_test.go updated to expect 4 rule files (was 3).
Starts v1.4.0 OWASP partial-coverage closure per ROADMAP.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f6356309fc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@@ -0,0 +1,69 @@
name: "OWASP LLM04 - Data Poisoning (Extended)"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Wire the new rule pack into default loading paths

When users run the proxy with the default config, or run pif scan without explicit --rules, this new pack is not loaded: config.yaml/pkg/config.Default() still enumerate only owasp-llm-top10.yaml, jailbreak-patterns.yaml, and data-exfil.yaml, and internal/cli/scan.go hard-codes the same three default files. That means these detections only appear in tests or callers that happen to use LoadDir, not in the default product path; add this file to the configured/default rule path lists or switch those paths to directory loading.

Useful? React with 👍 / 👎.

description: "Detects requests to construct adversarial inputs designed to fool model outputs"
category: "prompt_injection"
severity: 3
pattern: '(?i)(adversarial\s+example|carefully\s+crafted\s+(input|prompt)|gradient[\s-]based\s+attack|fast\s+gradient\s+sign\s+method|fgsm\s+attack|adversarial\s+perturbation|projected\s+gradient\s+descent\s+attack|carlini[\s-]wagner\s+attack)'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require attack intent for adversarial-example matches

This pattern flags bare research terms such as adversarial example, FGSM attack, or Carlini-Wagner attack, so a benign prompt like “What is an adversarial example?” will be reported as a high-severity prompt injection even though the rule description says it detects requests to construct adversarial inputs. In educational or ML-support contexts this will create avoidable false positives; consider requiring action verbs such as create/generate/craft before these terms.

Useful? React with 👍 / 👎.

@ogulcanaydogan ogulcanaydogan merged commit dfaeb99 into main May 27, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant