feat: add owasp llm04 data poisoning rule pack#18
Conversation
New rule file rules/owasp-llm04-data-poisoning.yaml adds 6 patterns for OWASP LLM04 (2025) Data Poisoning attack scenarios not covered by the 2 existing LLM04 rules in owasp-llm-top10.yaml: - PIF-LLM04-003: adversarial example construction (FGSM, C&W, PGD) - PIF-LLM04-004: backdoor trigger phrase injection - PIF-LLM04-005: cross-session memory contamination - PIF-LLM04-006: federated learning gradient poisoning - PIF-LLM04-007: synthetic training data injection to bias a model - PIF-LLM04-008: RLHF reward hacking instruction loader_test.go updated to expect 4 rule files (was 3). Starts v1.4.0 OWASP partial-coverage closure per ROADMAP.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f6356309fc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| @@ -0,0 +1,69 @@ | |||
| name: "OWASP LLM04 - Data Poisoning (Extended)" | |||
There was a problem hiding this comment.
Wire the new rule pack into default loading paths
When users run the proxy with the default config, or run pif scan without explicit --rules, this new pack is not loaded: config.yaml/pkg/config.Default() still enumerate only owasp-llm-top10.yaml, jailbreak-patterns.yaml, and data-exfil.yaml, and internal/cli/scan.go hard-codes the same three default files. That means these detections only appear in tests or callers that happen to use LoadDir, not in the default product path; add this file to the configured/default rule path lists or switch those paths to directory loading.
Useful? React with 👍 / 👎.
| description: "Detects requests to construct adversarial inputs designed to fool model outputs" | ||
| category: "prompt_injection" | ||
| severity: 3 | ||
| pattern: '(?i)(adversarial\s+example|carefully\s+crafted\s+(input|prompt)|gradient[\s-]based\s+attack|fast\s+gradient\s+sign\s+method|fgsm\s+attack|adversarial\s+perturbation|projected\s+gradient\s+descent\s+attack|carlini[\s-]wagner\s+attack)' |
There was a problem hiding this comment.
Require attack intent for adversarial-example matches
This pattern flags bare research terms such as adversarial example, FGSM attack, or Carlini-Wagner attack, so a benign prompt like “What is an adversarial example?” will be reported as a high-severity prompt injection even though the rule description says it detects requests to construct adversarial inputs. In educational or ML-support contexts this will create avoidable false positives; consider requiring action verbs such as create/generate/craft before these terms.
Useful? React with 👍 / 👎.
Summary
rules/owasp-llm04-data-poisoning.yamladds 6 detection patterns for OWASP LLM04 (2025) Data Poisoning scenarios: adversarial example construction, backdoor trigger phrases, cross-session memory contamination, federated learning gradient poisoning, synthetic training data injection, and RLHF reward hacking.owasp-llm-top10.yaml(training manipulation, persistent rule injection) -- the new file is additive, no existing rules changed.pkg/rules/loader_test.goupdated to reflect 4 rule packs inrules/(was 3);TestLoadDirrule-count assertion updated accordingly.LoadDirat startup.Test plan
go test ./pkg/rules/...passes (all 4 rule files loaded, all patterns compile, count assertion updated)go test ./...full suite passes with no regressionsLint (Go)+Security Scangreen (YAML-only addition, no Go code changes)