Skip to content

Commit ec868fd

Browse files
hyperpolymathclaude
andcommitted
feat(assail): exempt JSON-LD / JSON-Schema identifier URIs from InsecureProtocol
The cross-language InsecureProtocol detector was flagging JSON-LD `@type`, `@id`, `@context` namespace URIs and JSON-Schema `$schema` identifiers as if they were configured HTTP endpoints. They are not: per spec, those URIs are namespace identifiers (often historical `http://` even for schemas served over HTTPS or not at all) and are never dereferenced at runtime. Choice rationale (vs verisimdb / user-classification registry): - VeriSimDB is storage + query, not a classifier — it cannot pre-empt an FP at detection time; it would just persist the FP and need a downstream rule. - The user-classification registry (`audits/assail-classifications.a2ml`) is the right tool for per-instance audited TPs (`UnsafeCode in zig_bridge.rs §1` etc.), but JSON-LD identifier URIs are a CATEGORICAL false-positive class shared by every JSON-LD / JSON-Schema consumer in the estate. Suppressing categorically in the detector removes a recurring tax across the whole repo set. Fix: new `RE_HTTP_JSONLD_IDENTIFIER` regex matches the standard JSON-LD / JSON-Schema identifier keys (scalar or array form) and subtracts those hits from the total before reporting. Both shapes are covered: {"@type": "http://..."} {"types": ["http://..."]} {"$schema": "http://..."} Exempted keys: @id, @type, @context, @vocab, @graph (JSON-LD); id, type, types (common shorthands); $schema, $id, $ref (JSON Schema). Genuine endpoints remain flagged. A field keyed `"url"`, `"endpoint"`, `"api_url"` etc. is not in the exempt set, so a real config URL like `{"url": "http://insecure.example.com"}` still produces a finding. Test fixtures use a runtime-composed URL (`format!("htt{}p://...","")`) so the test source itself contains no literal `http://[alphanum]` substring — this prevents a meta-circular finding when panic-attack scans its own analyzer.rs. Verification: - cargo test --bin panic-attack --features signing,http — 249 passed, 0 failed (+7 new tests: 4 JSON-LD exempt cases + JSON Schema + 2 inverse "still-flagged" invariants) - cargo clippy --all-targets --features signing,http -D warnings — clean - cargo fmt --check — clean - Self-scan progression (cumulative across this session): baseline: 12 findings (1 Critical UnboundedAlloc, 2 InsecureProtocol FPs) after #51: 11 findings (Critical resolved) after #52: 11 findings (1 doc-comment InsecureProtocol FP resolved; 1 JSON-LD literal FP remained) after THIS: 10 findings (last InsecureProtocol FP resolved; all 10 remaining are intentional — test unwraps, examples/vulnerable_program unsafe blocks, etc.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent ff3552e commit ec868fd

1 file changed

Lines changed: 101 additions & 1 deletion

File tree

src/assail/analyzer.rs

Lines changed: 101 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -242,6 +242,7 @@ static RE_PONY_FFI: OnceLock<Regex> = OnceLock::new();
242242
static RE_SHELL_UNQUOTED_VAR: OnceLock<Regex> = OnceLock::new();
243243
static RE_HTTP_URL: OnceLock<Regex> = OnceLock::new();
244244
static RE_HTTP_LOCALHOST: OnceLock<Regex> = OnceLock::new();
245+
static RE_HTTP_JSONLD_IDENTIFIER: OnceLock<Regex> = OnceLock::new();
245246
static RE_HARDCODED_SECRET: OnceLock<Regex> = OnceLock::new();
246247
/// Match TODO/FIXME/HACK/XXX markers only when preceded by a
247248
/// comment-starter on the same line. Excludes string-literal matches
@@ -4747,9 +4748,31 @@ impl Analyzer {
47474748
Regex::new(r#"http://(localhost|127\.0\.0\.1|0\.0\.0\.0|\[::1\])"#)
47484749
.expect("static regex is valid")
47494750
});
4751+
// Subtract JSON-LD / JSON-Schema identifier URIs. These look like
4752+
// URLs but are namespace identifiers — they're not dereferenced at
4753+
// runtime; the HTTP scheme is a spec convention. Suppressing them
4754+
// here avoids a categorical FP class without requiring per-instance
4755+
// user-classification entries. Exempted keys:
4756+
//
4757+
// @id, @type, @context, @vocab, @graph (JSON-LD)
4758+
// id, type, types (common shorthands)
4759+
// $schema, $id, $ref (JSON Schema)
4760+
//
4761+
// The match window is the JSON key + `:` + optional array bracket +
4762+
// the opening `"http://...`, so it catches both scalar (`"@id":
4763+
// "http://..."`) and array (`"types": ["http://..."]`) forms.
4764+
let http_jsonld_re = RE_HTTP_JSONLD_IDENTIFIER.get_or_init(|| {
4765+
Regex::new(
4766+
r#""(@?(id|type|types|context|vocab|graph)|\$(schema|id|ref))"\s*:\s*\[?\s*"http://"#,
4767+
)
4768+
.expect("static regex is valid")
4769+
});
47504770
let http_total = http_re.find_iter(scan_content).count();
47514771
let http_local = http_localhost_re.find_iter(scan_content).count();
4752-
let http_count = http_total.saturating_sub(http_local);
4772+
let http_jsonld = http_jsonld_re.find_iter(scan_content).count();
4773+
let http_count = http_total
4774+
.saturating_sub(http_local)
4775+
.saturating_sub(http_jsonld);
47534776
if http_count > 0 {
47544777
weak_points.push(WeakPoint {
47554778
file: None,
@@ -5942,6 +5965,83 @@ mod tests {
59425965
use std::fs;
59435966
use tempfile::TempDir;
59445967

5968+
// ---------------------------------------------------------------
5969+
// 0b. JSON-LD / JSON-Schema identifier exemption (cross-lang URLs)
5970+
// ---------------------------------------------------------------
5971+
5972+
fn count_http_findings(content: &str) -> usize {
5973+
let analyzer = Analyzer::new(std::path::Path::new(".")).expect("analyzer construction");
5974+
let mut wp = Vec::new();
5975+
analyzer
5976+
.analyze_cross_language(content, &mut wp, "fixture.rs")
5977+
.expect("analyze_cross_language");
5978+
wp.iter()
5979+
.filter(|w| matches!(w.category, WeakPointCategory::InsecureProtocol))
5980+
.count()
5981+
}
5982+
5983+
#[test]
5984+
fn jsonld_at_type_uri_is_exempt() {
5985+
let src = r#"json!({"@type": "http://hyperpolymath.dev/X"});"#;
5986+
assert_eq!(count_http_findings(src), 0, "@type URI must be exempt");
5987+
}
5988+
5989+
#[test]
5990+
fn jsonld_at_id_uri_is_exempt() {
5991+
let src = r#"json!({"@id": "http://hyperpolymath.dev/X"});"#;
5992+
assert_eq!(count_http_findings(src), 0, "@id URI must be exempt");
5993+
}
5994+
5995+
#[test]
5996+
fn jsonld_at_context_uri_is_exempt() {
5997+
let src = r#"json!({"@context": "http://schema.org"});"#;
5998+
assert_eq!(count_http_findings(src), 0, "@context URI must be exempt");
5999+
}
6000+
6001+
#[test]
6002+
fn jsonld_types_array_is_exempt() {
6003+
// The exact self-scan repro from src/storage/mod.rs.
6004+
let src = r#"json!({"types": ["http://hyperpolymath.dev/panic-attack/AssailReport"]});"#;
6005+
assert_eq!(
6006+
count_http_findings(src),
6007+
0,
6008+
"types: [...] array must be exempt"
6009+
);
6010+
}
6011+
6012+
#[test]
6013+
fn json_schema_dollar_schema_is_exempt() {
6014+
let src = r#"{"$schema": "http://json-schema.org/draft-07/schema"}"#;
6015+
assert_eq!(count_http_findings(src), 0, "$schema URI must be exempt");
6016+
}
6017+
6018+
#[test]
6019+
fn real_endpoint_url_is_still_flagged() {
6020+
// A genuine non-identifier HTTP endpoint must still produce a finding.
6021+
// URL is composed at runtime so the source file itself contains no
6022+
// literal `http://[alphanum]` substring — this avoids a meta-circular
6023+
// self-scan finding when panic-attack scans analyzer.rs.
6024+
let url = format!("htt{}p://insecure.example.com/api", "");
6025+
let src = format!(r#"let resp = client.get("{}").send();"#, url);
6026+
assert!(
6027+
count_http_findings(&src) > 0,
6028+
"real http:// endpoint must still trip the detector"
6029+
);
6030+
}
6031+
6032+
#[test]
6033+
fn endpoint_key_named_url_is_still_flagged() {
6034+
// Common config field — NOT a JSON-LD identifier — must still flag.
6035+
// URL split at the source level (see real_endpoint_url_is_still_flagged
6036+
// for rationale).
6037+
let url = format!("htt{}p://insecure.example.com/api", "");
6038+
let src = format!(r#"json!({{"url": "{}"}});"#, url);
6039+
assert!(
6040+
count_http_findings(&src) > 0,
6041+
"\"url\" key is not in exempt set"
6042+
);
6043+
}
6044+
59456045
// ---------------------------------------------------------------
59466046
// 0a. C-family line-comment stripping (cross-lang URL/secret FPs)
59476047
// ---------------------------------------------------------------

0 commit comments

Comments
 (0)