Skip to content

Commit 181da50

Browse files
docs: add defense-in-depth security analyzer section
Companion to OpenHands/software-agent-sdk#2472.
1 parent dab64db commit 181da50

1 file changed

Lines changed: 127 additions & 0 deletions

File tree

sdk/guides/security.mdx

Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -444,6 +444,133 @@ agent = Agent(llm=llm, tools=tools, security_analyzer=security_analyzer)
444444
For more details on the base class implementation, see the [source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/analyzer.py).
445445
</Tip>
446446

447+
### Defense-in-Depth Security Analyzer
448+
449+
The LLM-based analyzer above relies on the model to assess risk. But what if
450+
the model itself is compromised, or the action contains encoding evasions that
451+
trick the LLM into rating a dangerous command as safe?
452+
453+
A **defense-in-depth** approach stacks multiple independent layers so each
454+
covers the others' blind spots. The example below implements four layers in
455+
a single file, using the standard library plus the SDK and Pydantic — no
456+
model calls, no external services, and no extra dependencies beyond the
457+
SDK's normal runtime environment.
458+
459+
1. **Extraction with two corpora** — separates *what the agent will do*
460+
(tool metadata and tool-call content) from *what it thought about*
461+
(reasoning, summary).
462+
Shell-destructive patterns only scan executable fields, so an agent that
463+
thinks "I should avoid rm -rf /" while running `ls /tmp` is correctly
464+
rated LOW, not HIGH.
465+
466+
2. **Unicode normalization** — strips invisible characters (zero-width spaces,
467+
bidi controls, word joiners) and applies NFKC compatibility normalization
468+
so fullwidth and ligature evasions collapse to ASCII before matching.
469+
470+
3. **Deterministic policy rails** — fast, segment-aware rules that
471+
short-circuit before pattern scanning. Composed conditions like "sudo AND
472+
rm" require both tokens in the same extraction segment, preventing
473+
cross-field false positives. At the SDK boundary, internal rail outcomes
474+
like "DENY" and "CONFIRM" both map to `SecurityRisk.HIGH`. Under
475+
`ConfirmRisky`, that means "ask before proceeding," not "hard-block
476+
execution." True blocking requires hook-based enforcement.
477+
478+
4. **Pattern scanning with ensemble fusion** — regex patterns categorized as
479+
HIGH or MEDIUM, fused across analyzers via max-severity. UNKNOWN is
480+
preserved as first-class, never promoted to HIGH.
481+
482+
#### When to use this vs. the LLM analyzer
483+
484+
The LLM analyzer generalizes to novel threats but costs an API call per
485+
action. The pattern analyzer is free, deterministic, and catches known threat
486+
categories reliably. In practice, you combine both in an ensemble — the
487+
pattern analyzer catches the obvious threats instantly, the LLM analyzer
488+
can cover novel or ambiguous cases the deterministic layer does not, and
489+
max-severity fusion takes the worst case.
490+
491+
#### Wiring into a conversation
492+
493+
The classes below (`PatternSecurityAnalyzer`, `EnsembleSecurityAnalyzer`)
494+
are defined in the [ready-to-run example](#ready-to-run-example):
495+
496+
```python icon="python" focus={7-11}
497+
from openhands.sdk import Conversation
498+
from openhands.sdk.security.confirmation_policy import ConfirmRisky
499+
500+
# PatternSecurityAnalyzer and EnsembleSecurityAnalyzer are defined
501+
# in the example file below — copy them into your project or import
502+
# from the example module.
503+
pattern = PatternSecurityAnalyzer()
504+
ensemble = EnsembleSecurityAnalyzer(analyzers=[pattern])
505+
506+
conversation = Conversation(agent=agent, workspace=".")
507+
conversation.set_security_analyzer(ensemble)
508+
conversation.set_confirmation_policy(ConfirmRisky())
509+
510+
# Every agent action now passes through the analyzer.
511+
# HIGH -> confirmation prompt. MEDIUM -> allowed.
512+
# UNKNOWN -> confirmed by default (confirm_unknown=True).
513+
```
514+
515+
<Warning>
516+
`conversation.execute_tool()` bypasses the analyzer and confirmation policy.
517+
This example protects normal agent actions in the conversation loop; hard
518+
enforcement for direct tool calls requires hooks.
519+
</Warning>
520+
521+
#### Key design decisions
522+
523+
Understanding *why* the example is built this way helps you decide what to
524+
keep, modify, or replace when adapting it:
525+
526+
- **Two corpora, not one.** Shell patterns on reasoning text produce false
527+
positives whenever the model discusses dangerous commands it chose not to
528+
run. Injection patterns (instruction overrides, mode switching) are
529+
textual attacks that make sense in any field. The split eliminates the
530+
first problem without losing the second.
531+
532+
- **Max-severity, not noisy-OR.** The analyzers scan the same input, so
533+
they're correlated. Noisy-OR assumes independence. Max-severity is
534+
simpler, correct, and auditable.
535+
536+
- **UNKNOWN is first-class.** Some analyzers may return UNKNOWN when they
537+
cannot assess an action or are not fully configured. The ensemble
538+
preserves UNKNOWN unless at least one analyzer returns a concrete risk.
539+
If the ensemble promoted UNKNOWN to HIGH, composing with optional
540+
analyzers would be unusable.
541+
542+
- **Stdlib-only normalization.** NFKC normalization plus invisible/bidi
543+
stripping covers the most common encoding evasions. Full confusable
544+
detection (TR39) is documented as a known limitation, not silently
545+
omitted.
546+
547+
#### Known limitations
548+
549+
The example documents its boundaries explicitly:
550+
551+
| Limitation | Why it exists | What would fix it |
552+
|---|---|---|
553+
| No hard-deny at the `SecurityAnalyzer` boundary | The SDK analyzer returns `SecurityRisk`, not block/allow decisions | Hook-based enforcement |
554+
| `conversation.execute_tool()` bypasses analyzer checks | Direct tool execution skips the normal agent decision path | Avoid bypass path or enforce through hooks |
555+
| No Cyrillic/homoglyph detection | NFKC maps compatibility variants, not cross-script confusables | Unicode TR39 tables (not in stdlib) |
556+
| Content beyond the 30k extraction cap is not scanned | Hard cap prevents regex denial-of-service | Raise the cap (increases ReDoS exposure) |
557+
| `thinking_blocks` not scanned | Scanning reasoning artifacts would create high false-positive risk by treating internal deliberation as executable intent | Separate injection-only scan of CoT |
558+
| `curl \| node` not detected | Interpreter list covers sh/bash/python/perl/ruby only | Expand the list (increases false positives) |
559+
560+
#### Ready-to-run example
561+
562+
<Note>
563+
Full defense-in-depth example: [examples/01_standalone_sdk/45_defense_in_depth_security.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/45_defense_in_depth_security.py)
564+
</Note>
565+
566+
The full example lives here:
567+
568+
```python icon="python" expandable examples/01_standalone_sdk/45_defense_in_depth_security.py
569+
<code will be auto-synced from agent-sdk>
570+
```
571+
572+
<RunExampleCode path_to_script="examples/01_standalone_sdk/45_defense_in_depth_security.py"/>
573+
447574

448575
---
449576

0 commit comments

Comments
 (0)