Skip to content

Commit 2e446e7

Browse files
unclesp1d3rclaude
andauthored
feat: Stringy v1.0 -- pipeline architecture, CLI hardening, and comprehensive testing (#141)
## Summary This PR delivers the complete v1.0 implementation of Stringy, transforming it from a prototype into a production-ready binary string extraction tool. The work spans 56 commits across 132 files (+7,963 / -6,477 lines), organized into five major workstreams. **610 tests pass** (23 platform-skipped), **zero clippy warnings**, all pre-commit hooks green. ## What Changed ### 1. Pipeline Architecture (New) Introduced a modular pipeline orchestrator that replaces ad-hoc extraction logic: - `Pipeline::new(config).run(&path)` -- single entry point for all analysis - `PipelineConfig` with builder pattern for extraction, filtering, output, and debug options - `FilterEngine` for tag inclusion/exclusion, encoding, min-length, and top-N filtering - Score normalizer with 0-100 display scores and section weight / semantic boost / noise penalty breakdown - `mmap-guard` integration for safe memory-mapped file I/O (replaces raw `memmap2`) - Graceful fallback for unknown formats (raw byte scanning with Info diagnostic) - Processing warnings via env-var injection for testable error paths **Key files:** `src/pipeline/mod.rs`, `src/pipeline/config.rs`, `src/pipeline/filter.rs`, `src/pipeline/normalizer.rs` ### 2. CLI Hardening Complete rewrite of the CLI layer with production-quality UX: - **Typed exit codes**: 0=success, 2=config/validation, 3=not-found, 4=permission-denied, 1=other - **Short flags**: `-j` (json), `-m` (min-len), `-t` (top), `-e` (enc) - **Actionable error messages** with fix suggestions - **NO_COLOR** env var support (no-color.org spec) - **Graceful spinner** fallback (no `.expect()`, hidden when non-TTY or NO_COLOR) - **stdin support** via `patharg::InputArg` with progress feedback - `--notags` renamed to `--no-tags` (kebab-case consistency) - `--summary`, `--debug`, `--raw` modes with proper conflict declarations - Help text with EXAMPLES section and all 21 tag names listed **Key files:** `src/main.rs`, `src/types/error.rs` ### 3. Core Data Model Improvements - `FoundString::new()` + builder methods replace struct literals (forward-compatible) - `StringContext` constructors for cleaner extraction code - `SectionInfo` made `#[non_exhaustive]` with `with_*` builders - `OutputMetadata` with analysis duration, top tag distribution, builder pattern - `FilterContext` with builder methods for test ergonomics - `ExtractionConfig::with_min_length()` builder - Container parsers split into module directories (elf/, pe/, macho/) with separate test files **Key files:** `src/types/constructors.rs`, `src/types/found_string.rs`, `src/types/mod.rs` ### 4. Output Enhancements - **Plain text**: Full C0 control character sanitization (not just newlines) - **TTY table**: Analysis duration with minute-level formatting, summary banner - **JSON**: Debug-only fields (`section_weight`, `semantic_boost`, `noise_penalty`) via `skip_serializing_if` - **YARA**: Long string skip behavior with deterministic comments - `display_score` always present in non-raw mode; forced to 0 in raw mode **Key files:** `src/output/table/plain.rs`, `src/output/table/tty.rs`, `src/output/json.rs` ### 5. Build & Infrastructure - **Zig cross-compilation** replaces Docker for test fixtures (ELF, PE, Mach-O) - **Release profile** optimization: `strip = true`, `codegen-units = 1`, `lto = "thin"` (release) / `"fat"` (dist) - **mmap-guard** dependency for safe memory-mapped I/O - **cargo-dist** version bump to 0.31.0 - **GOTCHAS.md** added as living documentation for edge cases - CI workflow updates (action versions, scorecard) ## Test Plan - [x] 610 tests pass across 25 test files (597 previously + 13 new short-flag tests) - [x] `cargo clippy --all-targets -- -D warnings` clean - [x] `cargo fmt --check` clean - [x] All pre-commit hooks pass (fmt, clippy, cargo-check, mdformat, cargo-audit) - [x] Integration test coverage for all CLI flags, exit codes, and conflict combinations - [x] Score determinism verified across consecutive runs - [x] Warning emission tested via debug-build env var injection - [x] Unknown/empty binary fallback paths tested - [x] Short flag equivalence verified (-j = --json, -m = --min-len, etc.) - [x] NO_COLOR env var tested - [x] stdin pipe tested with ELF binary and empty input - [x] Permission denied exit code 4 tested (Unix) ### New Test Files in This PR | File | Tests | Coverage Area | |------|-------|--------------| | `integration_flows_1_5.rs` | 16 | Quick analysis, filtering, top-N, JSON, YARA | | `integration_flows_6_7.rs` | 11 | Plain text output, summary mode | | `integration_flow8_errors.rs` | 11 | Argument conflicts, validation failures | | `integration_flow8_diagnostics.rs` | 13 | Warnings, fallback, score determinism, debug | | `integration_cli_errors.rs` | 10 | Exit codes, error messages | | `integration_cli_coverage.rs` | 18 | Encoding, stdin, combined filters, permissions | | `integration_cli_short_flags.rs` | 13 | Short flags, NO_COLOR, OR logic, edge cases | | `integration_flow1_minlen.rs` | 2 | Min-length default and explicit filter | | `integration_flows_5_yara.rs` | 1 | YARA long string deterministic skip | ## Breaking Changes - `--notags` renamed to `--no-tags` (CLI flag) - `FoundString` struct fields are no longer public for direct construction (use `FoundString::new()` + builders) - `SectionInfo` is `#[non_exhaustive]` (external crate consumers must use constructors) - Exit codes changed: file-not-found is now 3 (was 1), permission-denied is now 4 (was 1) ## Risk Assessment **Risk Level: Medium** | Factor | Assessment | |--------|-----------| | Size | Large (132 files), but most changes are additive | | Breaking changes | CLI flag rename + exit code changes -- documented above | | Test coverage | Strong -- 610 tests covering all new paths | | Dependencies | +mmap-guard (thin wrapper), +patharg (stdin handling) | | Security | `#![forbid(unsafe_code)]` maintained, no new unsafe | ## Reviewer Notes - The pipeline module (`src/pipeline/`) is entirely new -- start review there - Container parser splits (elf.rs -> elf/mod.rs + elf/tests.rs) are mechanical moves with no logic changes - Documentation changes are extensive but mostly rewrites for accuracy - GOTCHAS.md is a new file worth reading for context on edge cases --- Generated with [Claude Code](https://claude.com/claude-code) --------- Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 99dc2e4 commit 2e446e7

File tree

129 files changed

+8251
-6488
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

129 files changed

+8251
-6488
lines changed

.gitattributes

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,4 +78,8 @@ LICENSE text
7878
*.so binary
7979
*.dylib binary
8080

81+
# Test fixture binaries
82+
tests/fixtures/test_empty.bin binary
83+
tests/fixtures/test_unknown.bin binary
84+
8185
.env text=utf-8 eol=lf

.github/copilot-instructions.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,8 +67,11 @@ Container parsers assign weights (1.0-10.0) to sections based on string likeliho
6767
Strings are grouped by `(text, encoding)` tuple in a `HashMap<(String, Encoding), Vec<StringOccurrence>>`:
6868

6969
- **Preserve all occurrences**: Each occurrence captures offset, RVA, section, source, tags, score, confidence
70+
7071
- **Tag merging**: Union all tags via `HashSet`, then sort
72+
7173
- **Combined scoring formula**:
74+
7275
```text
7376
base_score = max(occurrence.original_score)
7477
occurrence_bonus = 5 * (count - 1)

.github/workflows/audit.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,6 @@ jobs:
2222
contents: read
2323
issues: write
2424
steps:
25-
- uses: actions/checkout@v6
26-
- uses: actions-rust-lang/audit@v1
25+
- uses: actions/checkout@v6.0.2
26+
- uses: actions-rust-lang/audit@v1.2.7
2727
name: Audit Rust Dependencies

.github/workflows/ci.yml

Lines changed: 26 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -24,17 +24,23 @@ jobs:
2424
quality:
2525
runs-on: ubuntu-latest
2626
steps:
27-
- uses: actions/checkout@v6
27+
- uses: actions/checkout@v6.0.2
2828
- uses: dtolnay/rust-toolchain@efa25f7f19611383d5b0ccf2d1c8914531636bf9
2929
with:
3030
components: rustfmt, clippy
3131
toolchain: 1.93.0
32-
- uses: jdx/mise-action@v3
32+
- uses: jdx/mise-action@v3.6.3
3333
with:
3434
install: true
3535
cache: true
3636
github_token: ${{ secrets.GITHUB_TOKEN }}
3737

38+
- name: Ensure rustfmt and clippy are installed
39+
run: rustup component add rustfmt clippy
40+
41+
- name: Cache Rust dependencies
42+
uses: Swatinem/rust-cache@v2.8.2
43+
3844
- name: Rustfmt Check
3945
run: cargo fmt --all -- --check
4046

@@ -52,28 +58,31 @@ jobs:
5258
- toolchain: stable minus 3 releases
5359
- toolchain: stable minus 4 releases
5460
steps:
55-
- uses: actions/checkout@v6
61+
- uses: actions/checkout@v6.0.2
5662
- uses: dtolnay/rust-toolchain@efa25f7f19611383d5b0ccf2d1c8914531636bf9
5763
with:
5864
components: rustfmt, clippy
5965
toolchain: ${{ matrix.toolchain }}
6066

6167
- name: Cache Rust dependencies
62-
uses: Swatinem/rust-cache@v2
68+
uses: Swatinem/rust-cache@v2.8.2
6369

6470
- name: Check MSRV compliance
6571
run: cargo check --all-features
6672

6773
test:
6874
runs-on: ubuntu-latest
6975
steps:
70-
- uses: actions/checkout@v6
71-
- uses: jdx/mise-action@v3
76+
- uses: actions/checkout@v6.0.2
77+
- uses: jdx/mise-action@v3.6.3
7278
with:
7379
install: true
7480
cache: true
7581
github_token: ${{ secrets.GITHUB_TOKEN }}
7682

83+
- name: Generate test fixtures
84+
run: just gen-fixtures
85+
7786
- name: Run tests (all features)
7887
run: just test-ci
7988

@@ -93,13 +102,16 @@ jobs:
93102

94103
runs-on: ${{ matrix.os }}
95104
steps:
96-
- uses: actions/checkout@v6
97-
- uses: jdx/mise-action@v3
105+
- uses: actions/checkout@v6.0.2
106+
- uses: jdx/mise-action@v3.6.3
98107
with:
99108
install: true
100109
cache: true
101110
github_token: ${{ secrets.GITHUB_TOKEN }}
102111

112+
- name: Generate test fixtures
113+
run: just gen-fixtures
114+
103115
- name: Run tests (all features)
104116
run: just test-ci
105117
- name: Build release
@@ -109,18 +121,21 @@ jobs:
109121
runs-on: ubuntu-latest
110122
needs: [test, test-cross-platform]
111123
steps:
112-
- uses: actions/checkout@v6
113-
- uses: jdx/mise-action@v3
124+
- uses: actions/checkout@v6.0.2
125+
- uses: jdx/mise-action@v3.6.3
114126
with:
115127
install: true
116128
cache: true
117129
github_token: ${{ secrets.GITHUB_TOKEN }}
118130

131+
- name: Generate test fixtures
132+
run: just gen-fixtures
133+
119134
- name: Generate coverage
120135
run: just coverage
121136

122137
- name: Upload to Codecov
123-
uses: codecov/codecov-action@v5
138+
uses: codecov/codecov-action@v5.5.2
124139
with:
125140
files: lcov.info
126141
fail_ci_if_error: false

.github/workflows/codeql.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,9 @@ jobs:
1919
name: CodeQL Analyze
2020
runs-on: ubuntu-22.04
2121
steps:
22-
- uses: actions/checkout@v6
22+
- uses: actions/checkout@v6.0.2
2323

24-
- uses: jdx/mise-action@v3
24+
- uses: jdx/mise-action@v3.6.3
2525
with:
2626
install: true
2727
cache: true

.github/workflows/copilot-setup-steps.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,9 @@ jobs:
2222
contents: read
2323

2424
steps:
25-
- uses: actions/checkout@v6
25+
- uses: actions/checkout@v6.0.2
2626

27-
- uses: jdx/mise-action@v3
27+
- uses: jdx/mise-action@v3.6.3
2828
with:
2929
install: true
3030
cache: true

.github/workflows/docs.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,9 +25,9 @@ jobs:
2525
runs-on: ubuntu-latest
2626
steps:
2727
- name: Checkout
28-
uses: actions/checkout@v6
28+
uses: actions/checkout@v6.0.2
2929

30-
- uses: jdx/mise-action@v3
30+
- uses: jdx/mise-action@v3.6.3
3131
with:
3232
install: true
3333
cache: true
@@ -59,4 +59,4 @@ jobs:
5959
steps:
6060
- name: Deploy to GitHub Pages
6161
id: deployment
62-
uses: actions/deploy-pages@v4
62+
uses: actions/deploy-pages@v4.0.5

.github/workflows/release.yml

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -64,9 +64,9 @@ jobs:
6464
# we specify bash to get pipefail; it guards against the `curl` command
6565
# failing. otherwise `sh` won't catch that `curl` returned non-0
6666
shell: bash
67-
run: "curl --proto '=https' --tlsv1.2 -LsSf https://github.com/axodotdev/cargo-dist/releases/download/v0.30.3/cargo-dist-installer.sh | sh"
67+
run: "curl --proto '=https' --tlsv1.2 -LsSf https://github.com/axodotdev/cargo-dist/releases/download/v0.31.0/cargo-dist-installer.sh | sh"
6868
- name: Cache dist
69-
uses: actions/upload-artifact@v6
69+
uses: actions/upload-artifact@v7
7070
with:
7171
name: cargo-dist-cache
7272
path: ~/.cargo/bin/dist
@@ -82,7 +82,7 @@ jobs:
8282
cat plan-dist-manifest.json
8383
echo "manifest=$(jq -c "." plan-dist-manifest.json)" >> "$GITHUB_OUTPUT"
8484
- name: "Upload dist-manifest.json"
85-
uses: actions/upload-artifact@v6
85+
uses: actions/upload-artifact@v7
8686
with:
8787
name: artifacts-plan-dist-manifest
8888
path: plan-dist-manifest.json
@@ -135,7 +135,7 @@ jobs:
135135
run: ${{ matrix.install_dist.run }}
136136
# Get the dist-manifest
137137
- name: Fetch local artifacts
138-
uses: actions/download-artifact@v7
138+
uses: actions/download-artifact@v8
139139
with:
140140
pattern: artifacts-*
141141
path: target/distrib/
@@ -151,7 +151,7 @@ jobs:
151151
dist build ${{ needs.plan.outputs.tag-flag }} --print=linkage --output-format=json ${{ matrix.dist_args }} > dist-manifest.json
152152
echo "dist ran successfully"
153153
- name: Attest
154-
uses: actions/attest-build-provenance@v3
154+
uses: actions/attest-build-provenance@v4
155155
with:
156156
subject-path: "target/distrib/*${{ join(matrix.targets, ', ') }}*"
157157
- id: cargo-dist
@@ -168,7 +168,7 @@ jobs:
168168
169169
cp dist-manifest.json "$BUILD_MANIFEST_NAME"
170170
- name: "Upload artifacts"
171-
uses: actions/upload-artifact@v6
171+
uses: actions/upload-artifact@v7
172172
with:
173173
name: artifacts-build-local-${{ join(matrix.targets, '_') }}
174174
path: |
@@ -190,7 +190,7 @@ jobs:
190190
persist-credentials: false
191191
submodules: recursive
192192
- name: Install cached dist
193-
uses: actions/download-artifact@v7
193+
uses: actions/download-artifact@v8
194194
with:
195195
name: cargo-dist-cache
196196
path: ~/.cargo/bin/
@@ -202,7 +202,7 @@ jobs:
202202
shell: bash
203203
# Get all the local artifacts for the global tasks to use (for e.g. checksums)
204204
- name: Fetch local artifacts
205-
uses: actions/download-artifact@v7
205+
uses: actions/download-artifact@v8
206206
with:
207207
pattern: artifacts-*
208208
path: target/distrib/
@@ -233,7 +233,7 @@ jobs:
233233
find . -name '*.cdx.xml' | tee -a "$GITHUB_OUTPUT"
234234
echo "EOF" >> "$GITHUB_OUTPUT"
235235
- name: "Upload artifacts"
236-
uses: actions/upload-artifact@v6
236+
uses: actions/upload-artifact@v7
237237
with:
238238
name: artifacts-build-global
239239
path: |
@@ -259,14 +259,14 @@ jobs:
259259
persist-credentials: false
260260
submodules: recursive
261261
- name: Install cached dist
262-
uses: actions/download-artifact@v7
262+
uses: actions/download-artifact@v8
263263
with:
264264
name: cargo-dist-cache
265265
path: ~/.cargo/bin/
266266
- run: chmod +x ~/.cargo/bin/dist
267267
# Fetch artifacts from scratch-storage
268268
- name: Fetch artifacts
269-
uses: actions/download-artifact@v7
269+
uses: actions/download-artifact@v8
270270
with:
271271
pattern: artifacts-*
272272
path: target/distrib/
@@ -279,14 +279,14 @@ jobs:
279279
cat dist-manifest.json
280280
echo "manifest=$(jq -c "." dist-manifest.json)" >> "$GITHUB_OUTPUT"
281281
- name: "Upload dist-manifest.json"
282-
uses: actions/upload-artifact@v6
282+
uses: actions/upload-artifact@v7
283283
with:
284284
# Overwrite the previous copy
285285
name: artifacts-dist-manifest
286286
path: dist-manifest.json
287287
# Create a GitHub Release while uploading all files to it
288288
- name: "Download GitHub Artifacts"
289-
uses: actions/download-artifact@v7
289+
uses: actions/download-artifact@v8
290290
with:
291291
pattern: artifacts-*
292292
path: artifacts
@@ -326,7 +326,7 @@ jobs:
326326
token: ${{ secrets.HOMEBREW_TAP_TOKEN }}
327327
# So we have access to the formula
328328
- name: Fetch homebrew formulae
329-
uses: actions/download-artifact@v7
329+
uses: actions/download-artifact@v8
330330
with:
331331
pattern: artifacts-*
332332
path: Formula/

.github/workflows/scorecard.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ jobs:
2626

2727
steps:
2828
- name: "Checkout code"
29-
uses: actions/checkout@v6
29+
uses: actions/checkout@v6.0.2
3030
with:
3131
persist-credentials: false
3232

@@ -38,7 +38,7 @@ jobs:
3838
publish_results: true
3939

4040
- name: "Upload artifact"
41-
uses: actions/upload-artifact@v6
41+
uses: actions/upload-artifact@v7
4242
with:
4343
name: SARIF file
4444
path: results.sarif

.github/workflows/security.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,9 @@ jobs:
2424
audit:
2525
runs-on: ubuntu-latest
2626
steps:
27-
- uses: actions/checkout@v6
27+
- uses: actions/checkout@v6.0.2
2828

29-
- uses: jdx/mise-action@v3
29+
- uses: jdx/mise-action@v3.6.3
3030
with:
3131
install: true
3232
cache: true

0 commit comments

Comments
 (0)