Skip to content

Commit a477c8e

Browse files
jmc-wanderclaude
andcommitted
feat: hybrid multi-modal PII detection with NER, evaluation harness, and v0.3.0 release
Add detect-then-replace architecture: regex patterns, field heuristics, and optional transformer-based NER run in parallel, producing PIIDetection spans that are merged (union strategy, highest confidence wins overlaps) before tokenization. The system learns over time — fields with repeated PII detections get auto-promoted, and user feedback (false positives/negatives) adjusts confidence. New components: - pii_detection: PIIDetection model, DetectionStrategy protocol, RegexDetectionStrategy, FieldHeuristicStrategy, NERDetectionStrategy, CompositeDetectionPipeline - pii_evaluation: span-level IoU matching, per-entity precision/recall/F1, micro/macro averaging - event_handler: external event ingestion for WOS integration New CLI commands: - pii-ingest: download and ingest ai4privacy/pii-masking-200k evaluation dataset - pii-evaluate: run detection strategies against labeled data, print F1 table Config additions: detection_mode (regex_only|hybrid|ner_only), ner_model, ner_device, ner_confidence_threshold, ner_max_text_length in PIIConfig. Optional ML dependencies: pip install apprentice-ai[ml] adds transformers, torch, datasets. NER strategy lazy-loads and gracefully disables when deps are missing. Updated docs: README with PII protection section, GitHub Pages with privacy feature cards, example config with PII options, version bump to 0.3.0. 2486 tests pass, 19 skipped, 0 failures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 52c0041 commit a477c8e

23 files changed

Lines changed: 3210 additions & 61 deletions

README.md

Lines changed: 79 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,76 @@ Router ────────────────────────
8686
2. **Reinforcement** — Both models process each request. The evaluator scores local vs. remote output. A rolling window tracks correlation. When sustained correlation exceeds the threshold, the system promotes to Phase 3.
8787
3. **Steady State** — The local model handles most traffic. An adaptive sampler periodically sends requests to both models to verify quality hasn't degraded. If it has, the system automatically regresses to Phase 2.
8888

89+
## PII Protection
90+
91+
Apprentice includes a built-in PII detection and tokenization middleware that scrubs sensitive data before it reaches models, training stores, or audit logs. The system uses a hybrid multi-modal approach that combines fast regex patterns with optional NER model inference.
92+
93+
### Detection Modes
94+
95+
| Mode | Strategies | Latency | Dependencies |
96+
|------|-----------|---------|-------------|
97+
| `regex_only` (default) | Regex patterns + field heuristics | ~0.1ms | None |
98+
| `hybrid` | Regex + field heuristics + NER model | ~50ms | `pip install apprentice-ai[ml]` |
99+
| `ner_only` | NER model only | ~50ms | `pip install apprentice-ai[ml]` |
100+
101+
### What It Detects
102+
103+
**Regex**: Emails, phone numbers, SSNs, credit cards, IP addresses, API keys, dates of birth
104+
**Field Heuristics**: Sensitive field names (email, phone, ssn, password, etc.) + learned patterns from prior detections
105+
**NER Model**: Person names, locations, organizations, miscellaneous entities — unstructured PII that regex can't catch
106+
107+
### How It Works
108+
109+
```
110+
Input data (may contain PII)
111+
112+
├─ RegexDetectionStrategy [confidence=1.0]
113+
├─ FieldHeuristicStrategy [confidence=0.9]
114+
└─ NERDetectionStrategy [confidence=varies]
115+
116+
117+
Merge + deduplicate (union, highest confidence wins overlaps)
118+
119+
120+
Replace spans with opaque tokens → model sees __PII_EMAIL_a1b2c3__
121+
122+
123+
Post-process: restore tokens → original PII for end user
124+
```
125+
126+
The system learns over time — fields that repeatedly contain PII get auto-flagged, and user feedback (false positives/negatives) adjusts confidence.
127+
128+
### Configuration
129+
130+
```yaml
131+
pii:
132+
enabled: true
133+
detection_mode: hybrid # regex_only | hybrid | ner_only
134+
ner_model: dslim/bert-base-NER
135+
ner_device: cpu # cpu | cuda
136+
ner_confidence_threshold: 0.7
137+
sensitive_fields:
138+
- email
139+
- phone
140+
- ssn
141+
- password
142+
```
143+
144+
### Evaluation
145+
146+
Apprentice includes a built-in evaluation harness for measuring PII detection quality against labeled datasets:
147+
148+
```bash
149+
# Ingest the ai4privacy/pii-masking-200k dataset
150+
apprentice pii-ingest --dataset ai4privacy/pii-masking-200k --limit 1000
151+
152+
# Evaluate regex baseline
153+
apprentice pii-evaluate --mode regex_only
154+
155+
# Evaluate hybrid (regex + NER)
156+
apprentice pii-evaluate --mode hybrid
157+
```
158+
89159
## CLI
90160

91161
| Command | Purpose |
@@ -94,6 +164,9 @@ Router ────────────────────────
94164
| `apprentice serve <config>` | Start HTTP server with REST API |
95165
| `apprentice status <config>` | Show phase, confidence, budget for each task |
96166
| `apprentice report <config>` | Generate summary report with metrics |
167+
| `apprentice ingest <file>` | Bulk ingest training data from file |
168+
| `apprentice pii-ingest` | Download and ingest PII evaluation dataset |
169+
| `apprentice pii-evaluate` | Evaluate PII detection against labeled data |
97170

98171
### HTTP Server Endpoints
99172

@@ -156,7 +229,7 @@ Each task gets its own phase progression, confidence window, and evaluator. A si
156229
157230
## Architecture
158231
159-
25 components organized in two layers — 18 leaf implementations with zero cross-dependencies, wired together by 7 integration compositions:
232+
28 components organized in two layers — 21 leaf implementations with zero cross-dependencies, wired together by 7 integration compositions:
160233
161234
### Leaf Components
162235
@@ -180,6 +253,9 @@ Each task gets its own phase progression, confidence window, and evaluator. A si
180253
| `cli` | Command-line interface and HTTP server |
181254
| `audit_log` | Structured event logging (JSONL) |
182255
| `report_generator` | Reports, metrics, and observability |
256+
| `pii_tokenizer` | PII detection middleware with learned patterns |
257+
| `pii_detection` | Multi-strategy PII detection (regex, NER, heuristic) |
258+
| `pii_evaluation` | Span-level PII detection evaluation harness |
183259

184260
### Integration Compositions
185261

@@ -199,14 +275,14 @@ Each task gets its own phase progression, confidence window, and evaluator. A si
199275
git clone https://github.com/jmcentire/apprentice.git
200276
cd apprentice
201277
make dev # Install with dev + lint dependencies
202-
make test # Run all 2,390 tests
278+
make test # Run all 2,486 tests
203279
make test-quick # Stop on first failure
204280
make lint # Run ruff linter
205281
make lint-fix # Auto-fix lint issues
206282
make clean # Remove build artifacts
207283
```
208284

209-
Requires Python 3.12+. Core dependencies: `pydantic`, `pyyaml`, `httpx`.
285+
Requires Python 3.12+. Core dependencies: `pydantic`, `pyyaml`, `httpx`. Optional: `pip install apprentice-ai[ml]` for NER-based PII detection (adds `transformers`, `torch`, `datasets`).
210286

211287
## Built With
212288

docs/index.html

Lines changed: 27 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -623,6 +623,21 @@ <h3>Adaptive Sampling</h3>
623623
<h3>Multi-Provider</h3>
624624
<p>Anthropic, OpenAI, or any API as the remote teacher. Ollama, vLLM, or llama.cpp as the local student. Mix and match per task.</p>
625625
</div>
626+
<div class="feature-card">
627+
<span class="feature-icon">&#x1f512;</span>
628+
<h3>PII Protection</h3>
629+
<p>Hybrid multi-modal PII detection: fast regex, field heuristics, and optional NER model inference. Scrubs sensitive data before it reaches models or logs. Learns over time.</p>
630+
</div>
631+
<div class="feature-card">
632+
<span class="feature-icon">&#x1f9e0;</span>
633+
<h3>NER Integration</h3>
634+
<p>Optional transformer-based named entity recognition catches person names, addresses, and organizations that regex can't. Lazy-loaded &mdash; zero overhead when disabled.</p>
635+
</div>
636+
<div class="feature-card">
637+
<span class="feature-icon">&#x1f4dd;</span>
638+
<h3>Feedback Loop</h3>
639+
<p>Human and AI feedback drives continuous improvement. False positive/negative reports adjust detection confidence. The system gets smarter with every correction.</p>
640+
</div>
626641
</div>
627642
</div>
628643
</section>
@@ -631,11 +646,11 @@ <h3>Multi-Provider</h3>
631646
<div class="container">
632647
<div class="arch-grid">
633648
<div>
634-
<h2>25 components.<br>Zero cross-dependencies.</h2>
649+
<h2>28 components.<br>Zero cross-dependencies.</h2>
635650
<p>
636-
18 leaf implementations wired together by 7 integration compositions.
651+
21 leaf implementations wired together by 7 integration compositions.
637652
Each component was contract-tested independently before integration.
638-
2,390 tests verify the system end-to-end.
653+
2,486 tests verify the system end-to-end.
639654
</p>
640655
<p>
641656
Built with <a href="https://jmcentire.github.io/pact/">Pact</a> &mdash;
@@ -659,7 +674,9 @@ <h2>25 components.<br>Zero cross-dependencies.</h2>
659674
<div class="step"><span class="arrow">&#9656;</span> <span><span class="label">training_data_store</span> <span class="dim">// collect examples</span></span></div>
660675
<div class="step"><span class="arrow">&#9656;</span> <span><span class="label">fine_tuning_orchestrator</span> <span class="dim">// LoRA, OpenAI, HF</span></span></div>
661676
<div class="step"><span class="arrow">&#9656;</span> <span><span class="label">budget_manager</span> <span class="dim">// spend tracking</span></span></div>
662-
<div class="step"><span class="arrow">&#9656;</span> <span><span class="warn">2,390 tests pass</span> <span class="dim">// verified</span></span></div>
677+
<div class="step"><span class="arrow">&#9656;</span> <span><span class="label">pii_detection</span> <span class="dim">// regex + NER + heuristics</span></span></div>
678+
<div class="step"><span class="arrow">&#9656;</span> <span><span class="label">pii_tokenizer</span> <span class="dim">// detect → replace → restore</span></span></div>
679+
<div class="step"><span class="arrow">&#9656;</span> <span><span class="warn">2,486 tests pass</span> <span class="dim">// verified</span></span></div>
663680
</div>
664681
</div>
665682
</div>
@@ -732,6 +749,7 @@ <h2>HTTP API &amp; CLI</h2>
732749
<h2>Up and running in 60 seconds</h2>
733750
<p>
734751
Python 3.12+, three dependencies. <code>pip install apprentice-ai</code> and go.
752+
For NER-based PII detection: <code>pip install apprentice-ai[ml]</code>.
735753
</p>
736754
<p>
737755
Define your tasks in YAML &mdash; prompt templates, evaluators, confidence thresholds.
@@ -758,7 +776,11 @@ <h2>Up and running in 60 seconds</h2>
758776
<span class="code-comment"># As a CLI</span>
759777
<span class="code-cmd">apprentice serve</span> config.yaml
760778
<span class="code-cmd">apprentice status</span> config.yaml
761-
<span class="code-cmd">apprentice report</span> config.yaml</code></pre>
779+
<span class="code-cmd">apprentice report</span> config.yaml
780+
781+
<span class="code-comment"># PII evaluation (optional: pip install apprentice-ai[ml])</span>
782+
<span class="code-cmd">apprentice pii-ingest</span> <span class="code-flag">--limit</span> 1000
783+
<span class="code-cmd">apprentice pii-evaluate</span> <span class="code-flag">--mode</span> hybrid</code></pre>
762784
</div>
763785
</div>
764786
</div>

examples/apprentice.yaml

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,3 +90,30 @@ audit:
9090
training_data:
9191
storage_dir: .apprentice/training_data/
9292
max_examples_per_task: 50000
93+
94+
# ── PII Protection ──
95+
# Scrubs sensitive data before it reaches models, training stores, or audit logs.
96+
# detection_mode: regex_only (default, no extra deps) | hybrid | ner_only
97+
pii:
98+
enabled: true
99+
detection_mode: regex_only # regex_only | hybrid | ner_only
100+
sensitive_fields:
101+
- email
102+
- phone
103+
- ssn
104+
- password
105+
- credit_card
106+
- api_key
107+
108+
# ── NER model settings (uncomment for hybrid or ner_only mode) ──
109+
# Requires: pip install apprentice-ai[ml]
110+
# ner_model: "dslim/bert-base-NER" # HuggingFace model ID
111+
# ner_device: cpu # cpu | cuda
112+
# ner_confidence_threshold: 0.7 # discard NER detections below this
113+
# ner_max_text_length: 10000 # skip NER on very long strings
114+
115+
# ── Feedback ──
116+
# Human and AI feedback collection for continuous improvement.
117+
feedback:
118+
enabled: true
119+
storage_dir: .apprentice/feedback/

pyproject.toml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "apprentice-ai"
7-
version = "0.2.0"
7+
version = "0.3.0"
88
description = "Adaptive model distillation framework — progressively replace expensive API calls with fine-tuned local models through coaching, evaluation, and phased rollout"
99
readme = "README.md"
1010
license = {text = "MIT"}
@@ -29,6 +29,11 @@ gke = [
2929
"google-cloud-storage>=2.10",
3030
]
3131
wos = []
32+
ml = [
33+
"transformers>=4.35.0",
34+
"torch>=2.0.0",
35+
"datasets>=2.14.0",
36+
]
3237
lint = [
3338
"ruff>=0.4.0",
3439
]

src/apprentice/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
)
1818
from apprentice.factory import build_from_config
1919

20-
__version__ = "0.1.0"
20+
__version__ = "0.3.0"
2121
__all__ = [
2222
"Apprentice",
2323
"build_from_config",

src/apprentice/apprentice_class.py

Lines changed: 49 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -490,6 +490,19 @@ async def run(self, task_name: str, input_data: Dict[str, Any]) -> TaskResponse:
490490
start_time = time.time()
491491
start_time_utc = datetime.now(timezone.utc).isoformat()
492492

493+
# Apply middleware pre-processing (PII tokenization)
494+
middleware_ctx = None
495+
pipeline = getattr(self, '_middleware_pipeline', None)
496+
if pipeline:
497+
from apprentice.middleware import MiddlewareContext as MwCtx
498+
middleware_ctx = MwCtx(
499+
request_id=request_id,
500+
task_name=task_name,
501+
input_data=input_data,
502+
)
503+
middleware_ctx = pipeline.execute_pre(middleware_ctx)
504+
input_data = middleware_ctx.input_data # Now tokenized
505+
493506
# Initialize RunContext
494507
run_ctx = RunContext(
495508
request_id=request_id,
@@ -635,6 +648,16 @@ async def run(self, task_name: str, input_data: Dict[str, Any]) -> TaskResponse:
635648
# Log to audit
636649
await self._audit_log.log(run_ctx)
637650

651+
# Apply middleware post-processing (PII detokenization)
652+
if pipeline and middleware_ctx:
653+
from apprentice.middleware import MiddlewareResponse as MwResp
654+
mw_response = MwResp(
655+
output_data=output,
656+
middleware_state=middleware_ctx.middleware_state,
657+
)
658+
mw_response = pipeline.execute_post(middleware_ctx, mw_response)
659+
output = mw_response.output_data # Detokenized for user
660+
638661
# Build and return TaskResponse
639662
return TaskResponse(
640663
task_name=task_name,
@@ -680,6 +703,19 @@ async def _run_via_router(self, task_name: str, input_data: Dict[str, Any]) -> T
680703
request_id = str(uuid.uuid4())
681704
start_time = time.time()
682705

706+
# Apply middleware pre-processing (PII tokenization)
707+
middleware_ctx = None
708+
pipeline = getattr(self, '_middleware_pipeline', None)
709+
if pipeline:
710+
from apprentice.middleware import MiddlewareContext as MwCtx
711+
middleware_ctx = MwCtx(
712+
request_id=request_id,
713+
task_name=task_name,
714+
input_data=input_data,
715+
)
716+
middleware_ctx = pipeline.execute_pre(middleware_ctx)
717+
input_data = middleware_ctx.input_data # Now tokenized
718+
683719
# Render prompt template from task registry if available
684720
prompt = str(input_data)
685721
try:
@@ -714,9 +750,21 @@ async def _run_via_router(self, task_name: str, input_data: Dict[str, Any]) -> T
714750

715751
status = RunStatus.degraded if result.is_degraded else RunStatus.success
716752

753+
output = {"content": result.response.content}
754+
755+
# Apply middleware post-processing (PII detokenization)
756+
if pipeline and middleware_ctx:
757+
from apprentice.middleware import MiddlewareResponse as MwResp
758+
mw_response = MwResp(
759+
output_data=output,
760+
middleware_state=middleware_ctx.middleware_state,
761+
)
762+
mw_response = pipeline.execute_post(middleware_ctx, mw_response)
763+
output = mw_response.output_data # Detokenized for user
764+
717765
return TaskResponse(
718766
task_name=task_name,
719-
output={"content": result.response.content},
767+
output=output,
720768
source=source,
721769
status=status,
722770
request_id=request_id,

0 commit comments

Comments
 (0)