From 181da503ad0d52f6001ce82519fd3724c4e50fc3 Mon Sep 17 00:00:00 2001
From: Nelson Spence <nelson@projectnavi.ai>
Date: Tue, 17 Mar 2026 16:13:02 -0500
Subject: [PATCH 1/3] docs: add defense-in-depth security analyzer section

Companion to OpenHands/software-agent-sdk#2472.
---
 sdk/guides/security.mdx | 127 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 127 insertions(+)
diff --git a/sdk/guides/security.mdx b/sdk/guides/security.mdx
index bbd30fad..ddc49de1 100644
--- a/sdk/guides/security.mdx
+++ b/sdk/guides/security.mdx
@@ -444,6 +444,133 @@ agent = Agent(llm=llm, tools=tools, security_analyzer=security_analyzer)
     For more details on the base class implementation, see the [source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/analyzer.py).
 </Tip>
 
+### Defense-in-Depth Security Analyzer
+
+The LLM-based analyzer above relies on the model to assess risk. But what if
+the model itself is compromised, or the action contains encoding evasions that
+trick the LLM into rating a dangerous command as safe?
+
+A **defense-in-depth** approach stacks multiple independent layers so each
+covers the others' blind spots. The example below implements four layers in
+a single file, using the standard library plus the SDK and Pydantic — no
+model calls, no external services, and no extra dependencies beyond the
+SDK's normal runtime environment.
+
+1. **Extraction with two corpora** — separates *what the agent will do*
+   (tool metadata and tool-call content) from *what it thought about*
+   (reasoning, summary).
+   Shell-destructive patterns only scan executable fields, so an agent that
+   thinks "I should avoid rm -rf /" while running `ls /tmp` is correctly
+   rated LOW, not HIGH.
+
+2. **Unicode normalization** — strips invisible characters (zero-width spaces,
+   bidi controls, word joiners) and applies NFKC compatibility normalization
+   so fullwidth and ligature evasions collapse to ASCII before matching.
+
+3. **Deterministic policy rails** — fast, segment-aware rules that
+   short-circuit before pattern scanning. Composed conditions like "sudo AND
+   rm" require both tokens in the same extraction segment, preventing
+   cross-field false positives. At the SDK boundary, internal rail outcomes
+   like "DENY" and "CONFIRM" both map to `SecurityRisk.HIGH`. Under
+   `ConfirmRisky`, that means "ask before proceeding," not "hard-block
+   execution." True blocking requires hook-based enforcement.
+
+4. **Pattern scanning with ensemble fusion** — regex patterns categorized as
+   HIGH or MEDIUM, fused across analyzers via max-severity. UNKNOWN is
+   preserved as first-class, never promoted to HIGH.
+
+#### When to use this vs. the LLM analyzer
+
+The LLM analyzer generalizes to novel threats but costs an API call per
+action. The pattern analyzer is free, deterministic, and catches known threat
+categories reliably. In practice, you combine both in an ensemble — the
+pattern analyzer catches the obvious threats instantly, the LLM analyzer
+can cover novel or ambiguous cases the deterministic layer does not, and
+max-severity fusion takes the worst case.
+
+#### Wiring into a conversation
+
+The classes below (`PatternSecurityAnalyzer`, `EnsembleSecurityAnalyzer`)
+are defined in the [ready-to-run example](#ready-to-run-example):
+
+```python icon="python" focus={7-11}
+from openhands.sdk import Conversation
+from openhands.sdk.security.confirmation_policy import ConfirmRisky
+
+# PatternSecurityAnalyzer and EnsembleSecurityAnalyzer are defined
+# in the example file below — copy them into your project or import
+# from the example module.
+pattern = PatternSecurityAnalyzer()
+ensemble = EnsembleSecurityAnalyzer(analyzers=[pattern])
+
+conversation = Conversation(agent=agent, workspace=".")
+conversation.set_security_analyzer(ensemble)
+conversation.set_confirmation_policy(ConfirmRisky())
+
+# Every agent action now passes through the analyzer.
+# HIGH -> confirmation prompt. MEDIUM -> allowed.
+# UNKNOWN -> confirmed by default (confirm_unknown=True).
+```
+
+<Warning>
+`conversation.execute_tool()` bypasses the analyzer and confirmation policy.
+This example protects normal agent actions in the conversation loop; hard
+enforcement for direct tool calls requires hooks.
+</Warning>
+
+#### Key design decisions
+
+Understanding *why* the example is built this way helps you decide what to
+keep, modify, or replace when adapting it:
+
+- **Two corpora, not one.** Shell patterns on reasoning text produce false
+  positives whenever the model discusses dangerous commands it chose not to
+  run. Injection patterns (instruction overrides, mode switching) are
+  textual attacks that make sense in any field. The split eliminates the
+  first problem without losing the second.
+
+- **Max-severity, not noisy-OR.** The analyzers scan the same input, so
+  they're correlated. Noisy-OR assumes independence. Max-severity is
+  simpler, correct, and auditable.
+
+- **UNKNOWN is first-class.** Some analyzers may return UNKNOWN when they
+  cannot assess an action or are not fully configured. The ensemble
+  preserves UNKNOWN unless at least one analyzer returns a concrete risk.
+  If the ensemble promoted UNKNOWN to HIGH, composing with optional
+  analyzers would be unusable.
+
+- **Stdlib-only normalization.** NFKC normalization plus invisible/bidi
+  stripping covers the most common encoding evasions. Full confusable
+  detection (TR39) is documented as a known limitation, not silently
+  omitted.
+
+#### Known limitations
+
+The example documents its boundaries explicitly:
+
+| Limitation | Why it exists | What would fix it |
+|---|---|---|
+| No hard-deny at the `SecurityAnalyzer` boundary | The SDK analyzer returns `SecurityRisk`, not block/allow decisions | Hook-based enforcement |
+| `conversation.execute_tool()` bypasses analyzer checks | Direct tool execution skips the normal agent decision path | Avoid bypass path or enforce through hooks |
+| No Cyrillic/homoglyph detection | NFKC maps compatibility variants, not cross-script confusables | Unicode TR39 tables (not in stdlib) |
+| Content beyond the 30k extraction cap is not scanned | Hard cap prevents regex denial-of-service | Raise the cap (increases ReDoS exposure) |
+| `thinking_blocks` not scanned | Scanning reasoning artifacts would create high false-positive risk by treating internal deliberation as executable intent | Separate injection-only scan of CoT |
+| `curl \| node` not detected | Interpreter list covers sh/bash/python/perl/ruby only | Expand the list (increases false positives) |
+
+#### Ready-to-run example
+
+<Note>
+Full defense-in-depth example: [examples/01_standalone_sdk/45_defense_in_depth_security.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/45_defense_in_depth_security.py)
+</Note>
+
+The full example lives here:
+
+```python icon="python" expandable examples/01_standalone_sdk/45_defense_in_depth_security.py
+<code will be auto-synced from agent-sdk>
+```
+
+<RunExampleCode path_to_script="examples/01_standalone_sdk/45_defense_in_depth_security.py"/>
+
 
 ---
 

From 848c404ecedba56604840bd79d9f5aad39ea5b09 Mon Sep 17 00:00:00 2001
From: Nelson Spence <nelson@projectnavi.ai>
Date: Mon, 30 Mar 2026 18:03:34 -0500
Subject: [PATCH 2/3] docs: update defense-in-depth section for SDK promotion

Analyzers now live in openhands.sdk.security, not an example file.
Rewritten for adult learning theory: problem first, then solution,
then composition, then design rationale, then limitations.

Import paths updated, every example pairs analyzer with ConfirmRisky,
old example/noisy-OR references removed.
---
 sdk/guides/security.mdx | 217 +++++++++++++++++++++-------------------
 1 file changed, 113 insertions(+), 104 deletions(-)

diff --git a/sdk/guides/security.mdx b/sdk/guides/security.mdx
index ddc49de1..7f432f84 100644
--- a/sdk/guides/security.mdx
+++ b/sdk/guides/security.mdx
@@ -446,130 +446,139 @@ agent = Agent(llm=llm, tools=tools, security_analyzer=security_analyzer)
 
 ### Defense-in-Depth Security Analyzer
 
-The LLM-based analyzer above relies on the model to assess risk. But what if
-the model itself is compromised, or the action contains encoding evasions that
-trick the LLM into rating a dangerous command as safe?
-
-A **defense-in-depth** approach stacks multiple independent layers so each
-covers the others' blind spots. The example below implements four layers in
-a single file, using the standard library plus the SDK and Pydantic — no
-model calls, no external services, and no extra dependencies beyond the
-SDK's normal runtime environment.
-
-1. **Extraction with two corpora** — separates *what the agent will do*
-   (tool metadata and tool-call content) from *what it thought about*
-   (reasoning, summary).
-   Shell-destructive patterns only scan executable fields, so an agent that
-   thinks "I should avoid rm -rf /" while running `ls /tmp` is correctly
-   rated LOW, not HIGH.
-
-2. **Unicode normalization** — strips invisible characters (zero-width spaces,
-   bidi controls, word joiners) and applies NFKC compatibility normalization
-   so fullwidth and ligature evasions collapse to ASCII before matching.
-
-3. **Deterministic policy rails** — fast, segment-aware rules that
-   short-circuit before pattern scanning. Composed conditions like "sudo AND
-   rm" require both tokens in the same extraction segment, preventing
-   cross-field false positives. At the SDK boundary, internal rail outcomes
-   like "DENY" and "CONFIRM" both map to `SecurityRisk.HIGH`. Under
-   `ConfirmRisky`, that means "ask before proceeding," not "hard-block
-   execution." True blocking requires hook-based enforcement.
-
-4. **Pattern scanning with ensemble fusion** — regex patterns categorized as
-   HIGH or MEDIUM, fused across analyzers via max-severity. UNKNOWN is
-   preserved as first-class, never promoted to HIGH.
-
-#### When to use this vs. the LLM analyzer
-
-The LLM analyzer generalizes to novel threats but costs an API call per
-action. The pattern analyzer is free, deterministic, and catches known threat
-categories reliably. In practice, you combine both in an ensemble — the
-pattern analyzer catches the obvious threats instantly, the LLM analyzer
-can cover novel or ambiguous cases the deterministic layer does not, and
-max-severity fusion takes the worst case.
-
-#### Wiring into a conversation
-
-The classes below (`PatternSecurityAnalyzer`, `EnsembleSecurityAnalyzer`)
-are defined in the [ready-to-run example](#ready-to-run-example):
-
-```python icon="python" focus={7-11}
+#### The problem
+
+Your agent is about to run a tool call. Is it safe?
+
+The `LLMSecurityAnalyzer` asks the model itself — but the model can be
+manipulated, and encoding tricks can hide dangerous commands from it.
+You need a layer that does not depend on model judgment: something
+deterministic, local, and fast.
+
+#### What this gives you
+
+Three composable analyzers that classify actions at the boundary —
+before the tool runs, not after. No network calls, no model inference,
+no extra dependencies. They return a `SecurityRisk` level; your
+`ConfirmRisky` policy decides whether to prompt the user.
+
+| Analyzer | What it catches | How it works |
+|----------|----------------|--------------|
+| `PatternSecurityAnalyzer` | Known threat signatures (rm -rf, eval, curl\|sh) | Regex patterns on two corpora: shell patterns scan executable fields only; injection patterns scan all fields |
+| `PolicyRailSecurityAnalyzer` | Composed threats (fetch piped to exec, raw disk writes, catastrophic deletes) | Deterministic rules evaluated per-segment — both tokens must appear in the same field |
+| `EnsembleSecurityAnalyzer` | Nothing on its own — it combines the others | Takes the highest concrete risk across all child analyzers |
+
+#### Quick start
+
+You must configure both the analyzer and the confirmation policy.
+Setting an analyzer does not automatically change confirmation behavior.
+
+```python icon="python" focus={7-18}
 from openhands.sdk import Conversation
-from openhands.sdk.security.confirmation_policy import ConfirmRisky
+from openhands.sdk.security import (
+    PatternSecurityAnalyzer,
+    PolicyRailSecurityAnalyzer,
+    EnsembleSecurityAnalyzer,
+    ConfirmRisky,
+    SecurityRisk,
+)
 
-# PatternSecurityAnalyzer and EnsembleSecurityAnalyzer are defined
-# in the example file below — copy them into your project or import
-# from the example module.
-pattern = PatternSecurityAnalyzer()
-ensemble = EnsembleSecurityAnalyzer(analyzers=[pattern])
+# Create the analyzer — rails catch composed threats,
+# patterns catch individual signatures
+security_analyzer = EnsembleSecurityAnalyzer(
+    analyzers=[
+        PolicyRailSecurityAnalyzer(),
+        PatternSecurityAnalyzer(),
+    ]
+)
 
-conversation = Conversation(agent=agent, workspace=".")
-conversation.set_security_analyzer(ensemble)
-conversation.set_confirmation_policy(ConfirmRisky())
+# Tell the SDK when to ask the user
+confirmation_policy = ConfirmRisky(threshold=SecurityRisk.MEDIUM)
 
-# Every agent action now passes through the analyzer.
-# HIGH -> confirmation prompt. MEDIUM -> allowed.
-# UNKNOWN -> confirmed by default (confirm_unknown=True).
+# Wire both into the conversation
+conversation = Conversation(agent=agent, workspace=".")
+conversation.set_security_analyzer(security_analyzer)
+conversation.set_confirmation_policy(confirmation_policy)
 ```
 
-<Warning>
-`conversation.execute_tool()` bypasses the analyzer and confirmation policy.
-This example protects normal agent actions in the conversation loop; hard
-enforcement for direct tool calls requires hooks.
-</Warning>
+After this, every agent action passes through the analyzer before
+execution. HIGH triggers a confirmation prompt. MEDIUM triggers
+confirmation (threshold=MEDIUM). LOW is allowed. UNKNOWN is confirmed
+by default.
 
-#### Key design decisions
+#### Adding the LLM analyzer for deeper coverage
 
-Understanding *why* the example is built this way helps you decide what to
-keep, modify, or replace when adapting it:
+The pattern analyzer catches known threats instantly. The LLM analyzer
+can catch novel or ambiguous cases. Composing both gives you speed and
+breadth:
 
-- **Two corpora, not one.** Shell patterns on reasoning text produce false
-  positives whenever the model discusses dangerous commands it chose not to
-  run. Injection patterns (instruction overrides, mode switching) are
-  textual attacks that make sense in any field. The split eliminates the
-  first problem without losing the second.
+```python
+from openhands.sdk.security import LLMSecurityAnalyzer
 
-- **Max-severity, not noisy-OR.** The analyzers scan the same input, so
-  they're correlated. Noisy-OR assumes independence. Max-severity is
-  simpler, correct, and auditable.
+security_analyzer = EnsembleSecurityAnalyzer(
+    analyzers=[
+        PolicyRailSecurityAnalyzer(),
+        PatternSecurityAnalyzer(),
+        LLMSecurityAnalyzer(),
+    ]
+)
 
-- **UNKNOWN is first-class.** Some analyzers may return UNKNOWN when they
-  cannot assess an action or are not fully configured. The ensemble
-  preserves UNKNOWN unless at least one analyzer returns a concrete risk.
-  If the ensemble promoted UNKNOWN to HIGH, composing with optional
-  analyzers would be unusable.
+confirmation_policy = ConfirmRisky(threshold=SecurityRisk.HIGH)
+```
 
-- **Stdlib-only normalization.** NFKC normalization plus invisible/bidi
-  stripping covers the most common encoding evasions. Full confusable
-  detection (TR39) is documented as a known limitation, not silently
-  omitted.
+The ensemble takes the worst case across all analyzers. If the pattern
+analyzer says HIGH and the LLM says LOW, the result is HIGH.
 
-#### Known limitations
+<Warning>
+`conversation.execute_tool()` bypasses the analyzer and confirmation
+policy. These analyzers protect normal agent actions in the conversation
+loop. Hard enforcement for direct tool calls requires hooks.
+</Warning>
 
-The example documents its boundaries explicitly:
+#### Why it works this way
 
-| Limitation | Why it exists | What would fix it |
-|---|---|---|
-| No hard-deny at the `SecurityAnalyzer` boundary | The SDK analyzer returns `SecurityRisk`, not block/allow decisions | Hook-based enforcement |
-| `conversation.execute_tool()` bypasses analyzer checks | Direct tool execution skips the normal agent decision path | Avoid bypass path or enforce through hooks |
-| No Cyrillic/homoglyph detection | NFKC maps compatibility variants, not cross-script confusables | Unicode TR39 tables (not in stdlib) |
-| Content beyond the 30k extraction cap is not scanned | Hard cap prevents regex denial-of-service | Raise the cap (increases ReDoS exposure) |
-| `thinking_blocks` not scanned | Scanning reasoning artifacts would create high false-positive risk by treating internal deliberation as executable intent | Separate injection-only scan of CoT |
-| `curl \| node` not detected | Interpreter list covers sh/bash/python/perl/ruby only | Expand the list (increases false positives) |
+**Two corpora, not one.** An agent that runs `ls /tmp` but thinks
+"I should avoid rm -rf /" must not be flagged HIGH. Shell patterns
+only see what the agent will *execute*. Injection patterns like
+"ignore all previous instructions" scan everything, because they
+target the model's instruction-following regardless of where they
+appear.
 
-#### Ready-to-run example
+**Max-severity, not averaging.** The analyzers scan the same input —
+they are correlated, not independent. The highest concrete risk wins.
+That is simpler and more auditable than probabilistic fusion.
 
-<Note>
-Full defense-in-depth example: [examples/01_standalone_sdk/45_defense_in_depth_security.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/45_defense_in_depth_security.py)
-</Note>
+**UNKNOWN means "I don't know," not "safe."** If all analyzers return
+UNKNOWN, the ensemble preserves it. Under the default `ConfirmRisky`
+policy, UNKNOWN triggers confirmation. Promoting UNKNOWN to HIGH
+would make optional analyzers unusable.
 
-The full example lives here:
+**Confirm, don't block.** The analyzers return a risk level. The
+confirmation policy decides what happens. The analyzer does not
+prevent execution — it classifies risk for the policy layer to act on.
+Pair with Docker isolation for stronger safety guarantees.
 
-```python icon="python" expandable examples/01_standalone_sdk/45_defense_in_depth_security.py
-<code will be auto-synced from agent-sdk>
-```
+#### What this does not do
+
+This is a deterministic action-boundary control. It is not:
+
+- A complete prompt-injection solution
+- A full shell parser or AST interpreter
+- A sandbox replacement
+- A guarantee against novel threats the patterns do not cover
+
+It is additive to `LLMSecurityAnalyzer` and `GraySwanAnalyzer`, not a
+replacement for either.
 
-<RunExampleCode path_to_script="examples/01_standalone_sdk/45_defense_in_depth_security.py"/>
+#### Known limitations
+
+| Limitation | Why | What would fix it |
+|---|---|---|
+| No hard-deny at the analyzer boundary | SDK analyzers return `SecurityRisk`, not block/allow | Hook-based enforcement |
+| `execute_tool()` bypasses checks | Direct tool execution skips the conversation loop | Hooks |
+| No Cyrillic/homoglyph detection | NFKC maps compatibility forms, not cross-script confusables | Unicode TR39 confusable tables |
+| Content past 30k chars is invisible | Hard cap prevents regex denial-of-service | Raise the cap (increases ReDoS exposure) |
+| `thinking_blocks` not scanned | Scanning model reasoning risks false positives on deliberation | Separate injection-only CoT scan |
 
 
 ---

From 05b416a554b842bf3be6a3a717c40163fc93af07 Mon Sep 17 00:00:00 2001
From: Nelson Spence <nelson@projectnavi.ai>
Date: Mon, 30 Mar 2026 18:09:43 -0500
Subject: [PATCH 3/3] fix: clarify threshold-dependent confirmation behavior,
 add agent comment

---
 sdk/guides/security.mdx | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/sdk/guides/security.mdx b/sdk/guides/security.mdx
index 7f432f84..db96692d 100644
--- a/sdk/guides/security.mdx
+++ b/sdk/guides/security.mdx
@@ -496,15 +496,16 @@ security_analyzer = EnsembleSecurityAnalyzer(
 confirmation_policy = ConfirmRisky(threshold=SecurityRisk.MEDIUM)
 
 # Wire both into the conversation
+# Assumes `agent` is already configured — see Quick Start guide
 conversation = Conversation(agent=agent, workspace=".")
 conversation.set_security_analyzer(security_analyzer)
 conversation.set_confirmation_policy(confirmation_policy)
 ```
 
 After this, every agent action passes through the analyzer before
-execution. HIGH triggers a confirmation prompt. MEDIUM triggers
-confirmation (threshold=MEDIUM). LOW is allowed. UNKNOWN is confirmed
-by default.
+execution. Actions at or above the configured threshold trigger a
+confirmation prompt — in this example, both HIGH and MEDIUM. LOW is
+allowed. UNKNOWN is confirmed by default (`confirm_unknown=True`).
 
 #### Adding the LLM analyzer for deeper coverage