Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added
- OWASP LLM04 (2025) Data Poisoning rule pack (`rules/owasp-llm04-data-poisoning.yaml`) with 6 new patterns covering adversarial example construction, backdoor trigger phrases, cross-session memory contamination, federated learning poisoning, synthetic training data injection, and RLHF reward hacking. Closes the v1.4.0 ROADMAP item to extend LLM04 partial coverage; existing 2 LLM04 patterns in `rules/owasp-llm-top10.yaml` remain in place.

### Changed
- CI and release workflows now pin `go-version: '1.26.x'` (was `1.25.x`); unblocks Dependabot rollups that bump `go.mod` to require Go 1.26.0 (e.g. `hugot` v0.7.2).

Expand Down
2 changes: 1 addition & 1 deletion pkg/rules/loader_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ func TestLoadDir(t *testing.T) {

sets, err := LoadDir(rulesDir)
require.NoError(t, err)
assert.Len(t, sets, 3, "should load all 3 rule files")
assert.Len(t, sets, 4, "should load all 4 rule files")

totalRules := 0
for _, rs := range sets {
Expand Down
69 changes: 69 additions & 0 deletions rules/owasp-llm04-data-poisoning.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
name: "OWASP LLM04 - Data Poisoning (Extended)"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Wire the new rule pack into default loading paths

When users run the proxy with the default config, or run pif scan without explicit --rules, this new pack is not loaded: config.yaml/pkg/config.Default() still enumerate only owasp-llm-top10.yaml, jailbreak-patterns.yaml, and data-exfil.yaml, and internal/cli/scan.go hard-codes the same three default files. That means these detections only appear in tests or callers that happen to use LoadDir, not in the default product path; add this file to the configured/default rule path lists or switch those paths to directory loading.

Useful? React with 👍 / 👎.

version: "1.0.0"
description: "Extended detection patterns for OWASP LLM04 (2025) Data Poisoning attacks covering adversarial inputs, backdoor triggers, cross-session memory contamination, federated learning poisoning, synthetic training data injection, and RLHF reward hacking"
rules:
# ── Adversarial Example Construction ──

- id: "PIF-LLM04-003"
name: "Adversarial Example Construction"
description: "Detects requests to construct adversarial inputs designed to fool model outputs"
category: "prompt_injection"
severity: 3
pattern: '(?i)(adversarial\s+example|carefully\s+crafted\s+(input|prompt)|gradient[\s-]based\s+attack|fast\s+gradient\s+sign\s+method|fgsm\s+attack|adversarial\s+perturbation|projected\s+gradient\s+descent\s+attack|carlini[\s-]wagner\s+attack)'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require attack intent for adversarial-example matches

This pattern flags bare research terms such as adversarial example, FGSM attack, or Carlini-Wagner attack, so a benign prompt like “What is an adversarial example?” will be reported as a high-severity prompt injection even though the rule description says it detects requests to construct adversarial inputs. In educational or ML-support contexts this will create avoidable false positives; consider requiring action verbs such as create/generate/craft before these terms.

Useful? React with 👍 / 👎.

enabled: true
tags: ["owasp-llm04", "data-poisoning", "adversarial"]

# ── Backdoor Trigger Phrase ──

- id: "PIF-LLM04-004"
name: "Backdoor Trigger Phrase"
description: "Detects injection of backdoor trigger phrases intended to activate hidden model behaviour"
category: "prompt_injection"
severity: 4
pattern: '(?i)(trigger\s+(phrase|word|token|sequence)\s+(activates?|enables?|unlocks?|causes?)\s+(hidden|secret|backdoor|debug|override)|when\s+you\s+(see|read|encounter|receive)\s+.{0,40}(activate|reveal|bypass|ignore|execute|switch\s+to))'
enabled: true
tags: ["owasp-llm04", "data-poisoning", "backdoor"]

# ── Cross-Session Memory Contamination ──

- id: "PIF-LLM04-005"
name: "Cross-Session Memory Contamination"
description: "Detects attempts to persist or propagate injected context across sessions or users"
category: "prompt_injection"
severity: 3
pattern: '(?i)(carry|persist|propagate|inject|spread)\s+(this|these|the\s+following)\s+(memor(y|ies)|context|state|knowledge|instructions?|rules?)\s+(across|between|into)\s+(other|future|new|all)\s+(session|conversation|chat|user)s?'
enabled: true
tags: ["owasp-llm04", "data-poisoning", "memory-contamination"]

# ── Federated Learning Poisoning ──

- id: "PIF-LLM04-006"
name: "Federated Learning Poisoning"
description: "Detects attempts to submit poisoned model updates in federated or distributed training contexts"
category: "prompt_injection"
severity: 3
pattern: '(?i)(submit|upload|contribute|inject|send)\s+(poisoned|malicious|crafted|backdoored|corrupted)\s+(gradient|update|weight|delta|model\s+update|local\s+model)s?'
enabled: true
tags: ["owasp-llm04", "data-poisoning", "federated"]

# ── Synthetic Training Data Injection ──

- id: "PIF-LLM04-007"
name: "Synthetic Training Data Injection"
description: "Detects requests to generate synthetic training data designed to bias or manipulate a model"
category: "prompt_injection"
severity: 2
pattern: '(?i)(create|generate|produce|synthesize)\s+.{0,50}(training\s+data|fine[\s-]?tun(ing|e)\s+data|labeled\s+(examples?|samples?|pairs?))\s+.{0,50}(to\s+(teach|train|condition|bias|force)|that\s+(teach(es)?|train(s)?|bias(es)?|condition(s)?))\s+(the\s+)?(model|ai|llm|assistant)\s+to\s+(say|do|believe|output|generate|prefer|hate|attack|avoid)'
enabled: true
tags: ["owasp-llm04", "data-poisoning", "synthetic-data"]

# ── RLHF Reward Hacking ──

- id: "PIF-LLM04-008"
name: "RLHF Reward Hacking"
description: "Detects attempts to exploit or game reinforcement learning from human feedback reward signals"
category: "prompt_injection"
severity: 3
pattern: '(?i)(maximize|game|exploit|hack|manipulate|abuse)\s+(the\s+)?(reward|rlhf|preference|alignment|human\s+feedback)\s+(signal|score|model|function|feedback)\s+(to|by|via|through|and\s+make)'
enabled: true
tags: ["owasp-llm04", "data-poisoning", "reward-hacking"]