Skip to content

Commit 434d640

Browse files
fix(assail): strip C-family line comments before cross-lang URL detection (#52)
## Summary Self-scan repro: a doc-comment line like ```rust /// types: [\"http://hyperpolymath.dev/panic-attack/AssailReport\"] ``` was flagging **InsecureProtocol (PA-IP)**. The string in the comment is an illustrative JSON-LD `@type` namespace URI, not a configured endpoint — it has no runtime effect, but the cross-language detector's regex matched it anyway because `analyze_cross_language` was being passed the raw file content (no comment stripping). ## Fix New `strip_c_family_line_comments` helper, applied to the content before the `http://`-URL regex runs in `analyze_cross_language`. The helper: - Detects `//` (and `///`, `//!` by extension) - Respects string literals (so `\"http://localhost\"` is **preserved**) - Handles escape sequences (`\\\"`) inside strings Naturally covers Rust, JavaScript, TypeScript, Java, C, C++, Go — every language whose comment syntax is `//`. Python `#`, Lisp `;`, Lua/Idris `--` etc. are not (yet) language-aware; cross-language remains best-effort and could be extended per file_path extension in a follow-up. ## Out of scope - **Block comments (`/* */`)** and **raw-string literals (`r#\"...\"#`)** are not consumed here. The existing localhost exemption + this line-comment strip handle the bulk of FPs in practice. - A real string-literal URL like `\"http://example.com\"` is **STILL** flagged — the regex sees through the string. That's correct: a hardcoded HTTP endpoint in production code is the signal we want. - JSON-LD `@type` URIs that genuinely live in code (not in comments) remain a TP from the regex's perspective; suppress via the user-classification registry if audited. ## Regression coverage 6 new tests in `assail::analyzer::tests`: | Test | Asserts | |---|---| | `strip_c_family_line_comment_handles_basic_double_slash` | Trailing `//` comment is dropped | | `strip_c_family_line_comment_handles_doc_comments` | `///` and `//!` lines are dropped | | `strip_c_family_line_comment_preserves_urls_in_strings` | `\"http://localhost/path\"` survives intact | | `strip_c_family_line_comment_handles_escaped_quote_in_string` | `\\\"hi\\\"` doesn't confuse the string tracker | | `strip_c_family_line_comments_doc_comment_url_fp_gone` | Self-scan repro: doc-URL → stripped | | `strip_c_family_line_comments_keeps_jsonld_type_string` | Self-scan companion: JSON-LD string-literal URL → preserved (out of scope, but the stripper must not over-fire) | **Note**: test URLs use `http://localhost` so panic-attack scanning its own source doesn't trip the InsecureProtocol detector on the test data itself. Same exemption rule used in production. ## Verification - [x] `cargo test --bin panic-attack --features signing,http` — **242 passed** (was 236; +6 new tests) - [x] `cargo clippy --all-targets --features signing,http -- -D warnings` — clean - [x] `cargo fmt --check` — clean - [x] **Self-scan before**: 2 InsecureProtocol findings in `storage/mod.rs` (1 doc-comment FP, 1 JSON-LD literal — out of scope) - [x] **Self-scan after**: 1 InsecureProtocol finding in `storage/mod.rs` (the JSON-LD literal remains; the doc-comment FP is gone) - [x] No new findings introduced anywhere - [x] GPG-signed commit 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent bcd1903 commit 434d640

1 file changed

Lines changed: 132 additions & 3 deletions

File tree

src/assail/analyzer.rs

Lines changed: 132 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4729,15 +4729,26 @@ impl Analyzer {
47294729
file_path: &str,
47304730
) -> Result<()> {
47314731
// HTTP (insecure) URLs - should be HTTPS
4732-
// Count http:// URLs that are NOT localhost/127.0.0.1 (those are fine)
4732+
// Count http:// URLs that are NOT localhost/127.0.0.1 (those are fine).
4733+
//
4734+
// Strip C-family line comments first: `/// example: http://example.com`
4735+
// in a doc-comment is illustrative prose, not a configured endpoint,
4736+
// and was the source of repeated FPs on JSON-LD `@type` namespace URIs
4737+
// documented in /// blocks. The same heuristic naturally handles JS,
4738+
// Java, C, C++, Go — every language whose comment syntax is `//`.
4739+
// Python `#`, Lisp `;`, etc. are not currently de-noised; the
4740+
// cross-language detector is best-effort, and a follow-up can add
4741+
// language-aware stripping per file_path extension.
4742+
let scan_content_owned = strip_c_family_line_comments(content);
4743+
let scan_content: &str = &scan_content_owned;
47334744
let http_re = RE_HTTP_URL
47344745
.get_or_init(|| Regex::new(r#"http://[a-zA-Z0-9]"#).expect("static regex is valid"));
47354746
let http_localhost_re = RE_HTTP_LOCALHOST.get_or_init(|| {
47364747
Regex::new(r#"http://(localhost|127\.0\.0\.1|0\.0\.0\.0|\[::1\])"#)
47374748
.expect("static regex is valid")
47384749
});
4739-
let http_total = http_re.find_iter(content).count();
4740-
let http_local = http_localhost_re.find_iter(content).count();
4750+
let http_total = http_re.find_iter(scan_content).count();
4751+
let http_local = http_localhost_re.find_iter(scan_content).count();
47414752
let http_count = http_total.saturating_sub(http_local);
47424753
if http_count > 0 {
47434754
weak_points.push(WeakPoint {
@@ -5531,6 +5542,57 @@ fn strip_proof_comments(content: &str, line_marker: &str, block: Option<(&str, &
55315542
out
55325543
}
55335544

5545+
/// Strip C-family line comments (`//`, `///`, `//!`) from each line, leaving
5546+
/// the rest of the source intact. String literals are detected so a `//`
5547+
/// inside `"http://example.com"` is NOT treated as a comment.
5548+
///
5549+
/// Used by `analyze_cross_language` to keep URL / secret regexes from firing
5550+
/// on illustrative code in doc-comments. Best-effort: handles `"..."` strings
5551+
/// with `\\` escapes. Raw-string literals (`r#"..."#`) and block comments
5552+
/// (`/* ... */`) are NOT consumed here — adding them would broaden semantics
5553+
/// and is left for a follow-up if FP evidence warrants it.
5554+
fn strip_c_family_line_comments(content: &str) -> String {
5555+
let mut out = String::with_capacity(content.len());
5556+
for (i, line) in content.lines().enumerate() {
5557+
if i > 0 {
5558+
out.push('\n');
5559+
}
5560+
out.push_str(strip_c_family_line_comment(line));
5561+
}
5562+
out
5563+
}
5564+
5565+
fn strip_c_family_line_comment(line: &str) -> &str {
5566+
let bytes = line.as_bytes();
5567+
let mut in_string = false;
5568+
let mut escape = false;
5569+
let mut idx = 0;
5570+
while idx < bytes.len() {
5571+
let b = bytes[idx];
5572+
if escape {
5573+
escape = false;
5574+
idx += 1;
5575+
continue;
5576+
}
5577+
if in_string {
5578+
if b == b'\\' {
5579+
escape = true;
5580+
} else if b == b'"' {
5581+
in_string = false;
5582+
}
5583+
idx += 1;
5584+
continue;
5585+
}
5586+
match b {
5587+
b'"' => in_string = true,
5588+
b'/' if idx + 1 < bytes.len() && bytes[idx + 1] == b'/' => return &line[..idx],
5589+
_ => {}
5590+
}
5591+
idx += 1;
5592+
}
5593+
line
5594+
}
5595+
55345596
/// Strip Isabelle prose constructs that are documentation, not proof code:
55355597
///
55365598
/// 1. `@{text ...}` antiquotations — used inside prose blocks AND inside
@@ -5880,6 +5942,73 @@ mod tests {
58805942
use std::fs;
58815943
use tempfile::TempDir;
58825944

5945+
// ---------------------------------------------------------------
5946+
// 0a. C-family line-comment stripping (cross-lang URL/secret FPs)
5947+
// ---------------------------------------------------------------
5948+
5949+
#[test]
5950+
fn strip_c_family_line_comment_handles_basic_double_slash() {
5951+
let line = "let x = 1; // a comment";
5952+
assert_eq!(strip_c_family_line_comment(line), "let x = 1; ");
5953+
}
5954+
5955+
#[test]
5956+
fn strip_c_family_line_comment_handles_doc_comments() {
5957+
// URLs use http://localhost so panic-attack's own InsecureProtocol
5958+
// detector (which scans this file during self-scan) treats them as
5959+
// exempt — otherwise the regression test plants a finding in its
5960+
// own source. Same exemption rule used in production for dev
5961+
// endpoints.
5962+
for prefix in ["///", "//!"] {
5963+
let line = format!("{prefix} example: http://localhost/example");
5964+
assert_eq!(strip_c_family_line_comment(&line), "");
5965+
}
5966+
}
5967+
5968+
#[test]
5969+
fn strip_c_family_line_comment_preserves_urls_in_strings() {
5970+
// Real string literals containing // must NOT be treated as comments.
5971+
let line = r#"let url = "http://localhost/path";"#;
5972+
let kept = strip_c_family_line_comment(line);
5973+
assert!(
5974+
kept.contains("http://localhost"),
5975+
"string-literal URL was wrongly stripped: {kept:?}"
5976+
);
5977+
}
5978+
5979+
#[test]
5980+
fn strip_c_family_line_comment_handles_escaped_quote_in_string() {
5981+
let line = r#"let s = "she said \"hi\""; // trailing"#;
5982+
let kept = strip_c_family_line_comment(line);
5983+
assert!(kept.contains(r#"\"hi\""#));
5984+
assert!(!kept.contains("trailing"));
5985+
}
5986+
5987+
#[test]
5988+
fn strip_c_family_line_comments_doc_comment_url_fp_gone() {
5989+
// Self-scan repro: doc-comment example URL must not feed the
5990+
// InsecureProtocol detector.
5991+
let src = "/// Endpoint: http://localhost/X\nfn foo() {}\n";
5992+
let stripped = strip_c_family_line_comments(src);
5993+
assert!(!stripped.contains("http://"));
5994+
assert!(stripped.contains("fn foo"));
5995+
}
5996+
5997+
#[test]
5998+
fn strip_c_family_line_comments_keeps_jsonld_type_string() {
5999+
// The second self-scan FP: an http:// URL inside a JSON literal
6000+
// (NOT a comment) must STILL be present after stripping. The
6001+
// JSON-LD-identifier-vs-endpoint distinction is left to the
6002+
// user-classification registry; the comment stripper must not
6003+
// accidentally hide real string-literal URLs.
6004+
let src = r#"json!({"types": ["http://localhost/X"]});"#;
6005+
let stripped = strip_c_family_line_comments(src);
6006+
assert!(
6007+
stripped.contains("http://"),
6008+
"string-literal URL was wrongly stripped: {stripped:?}"
6009+
);
6010+
}
6011+
58836012
// ---------------------------------------------------------------
58846013
// 0. Isabelle prose-stripping (regression for #43 false positives)
58856014
// ---------------------------------------------------------------

0 commit comments

Comments
 (0)