feat: add owasp llm04 data poisoning rule pack by ogulcanaydogan · Pull Request #18 · ogulcanaydogan/Prompt-Injection-Firewall

ogulcanaydogan · 2026-05-27T11:48:38Z

Summary

New rule file rules/owasp-llm04-data-poisoning.yaml adds 6 detection patterns for OWASP LLM04 (2025) Data Poisoning scenarios: adversarial example construction, backdoor trigger phrases, cross-session memory contamination, federated learning gradient poisoning, synthetic training data injection, and RLHF reward hacking.
Starts the v1.4.0 ROADMAP item to close partial OWASP LLM03/04/06/08/09 coverage; LLM04 previously had only 2 patterns in owasp-llm-top10.yaml (training manipulation, persistent rule injection) -- the new file is additive, no existing rules changed.
pkg/rules/loader_test.go updated to reflect 4 rule packs in rules/ (was 3); TestLoadDir rule-count assertion updated accordingly.
No changes to loader, detector, or proxy code -- new YAML is auto-discovered by LoadDir at startup.

Test plan

go test ./pkg/rules/... passes (all 4 rule files loaded, all patterns compile, count assertion updated)
go test ./... full suite passes with no regressions
CI Lint (Go) + Security Scan green (YAML-only addition, no Go code changes)

New rule file rules/owasp-llm04-data-poisoning.yaml adds 6 patterns for OWASP LLM04 (2025) Data Poisoning attack scenarios not covered by the 2 existing LLM04 rules in owasp-llm-top10.yaml: - PIF-LLM04-003: adversarial example construction (FGSM, C&W, PGD) - PIF-LLM04-004: backdoor trigger phrase injection - PIF-LLM04-005: cross-session memory contamination - PIF-LLM04-006: federated learning gradient poisoning - PIF-LLM04-007: synthetic training data injection to bias a model - PIF-LLM04-008: RLHF reward hacking instruction loader_test.go updated to expect 4 rule files (was 3). Starts v1.4.0 OWASP partial-coverage closure per ROADMAP.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f6356309fc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-27T11:54:12Z

@@ -0,0 +1,69 @@
+name: "OWASP LLM04 - Data Poisoning (Extended)"


Wire the new rule pack into default loading paths

When users run the proxy with the default config, or run pif scan without explicit --rules, this new pack is not loaded: config.yaml/pkg/config.Default() still enumerate only owasp-llm-top10.yaml, jailbreak-patterns.yaml, and data-exfil.yaml, and internal/cli/scan.go hard-codes the same three default files. That means these detections only appear in tests or callers that happen to use LoadDir, not in the default product path; add this file to the configured/default rule path lists or switch those paths to directory loading.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-27T11:54:12Z

+    description: "Detects requests to construct adversarial inputs designed to fool model outputs"
+    category: "prompt_injection"
+    severity: 3
+    pattern: '(?i)(adversarial\s+example|carefully\s+crafted\s+(input|prompt)|gradient[\s-]based\s+attack|fast\s+gradient\s+sign\s+method|fgsm\s+attack|adversarial\s+perturbation|projected\s+gradient\s+descent\s+attack|carlini[\s-]wagner\s+attack)'


Require attack intent for adversarial-example matches

This pattern flags bare research terms such as adversarial example, FGSM attack, or Carlini-Wagner attack, so a benign prompt like “What is an adversarial example?” will be reported as a high-severity prompt injection even though the rule description says it detects requests to construct adversarial inputs. In educational or ML-support contexts this will create avoidable false positives; consider requiring action verbs such as create/generate/craft before these terms.

Useful? React with 👍 / 👎.

chatgpt-codex-connector Bot reviewed May 27, 2026

View reviewed changes

ogulcanaydogan merged commit dfaeb99 into main May 27, 2026
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add owasp llm04 data poisoning rule pack#18

feat: add owasp llm04 data poisoning rule pack#18
ogulcanaydogan merged 1 commit into
mainfrom
feat/v1.4.0-llm04-data-poisoning-rules

ogulcanaydogan commented May 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 27, 2026

Uh oh!

chatgpt-codex-connector Bot May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		@@ -0,0 +1,69 @@
		name: "OWASP LLM04 - Data Poisoning (Extended)"

Conversation

ogulcanaydogan commented May 27, 2026

Summary

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant