jmcentire
diff --git a/‎README.md‎
Lines changed: 79 additions & 3 deletions b/‎README.md‎
Lines changed: 79 additions & 3 deletions
diff --git a/‎docs/index.html‎
Lines changed: 27 additions & 5 deletions b/‎docs/index.html‎
Lines changed: 27 additions & 5 deletions
diff --git a/‎examples/apprentice.yaml‎
Lines changed: 27 additions & 0 deletions b/‎examples/apprentice.yaml‎
Lines changed: 27 additions & 0 deletions
diff --git a/‎pyproject.toml‎
Lines changed: 6 additions & 1 deletion b/‎pyproject.toml‎
Lines changed: 6 additions & 1 deletion
diff --git a/‎src/apprentice/__init__.py‎
Lines changed: 1 addition & 1 deletion b/‎src/apprentice/__init__.py‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎src/apprentice/apprentice_class.py‎
Lines changed: 49 additions & 1 deletion b/‎src/apprentice/apprentice_class.py‎
Lines changed: 49 additions & 1 deletion
@@ -86,6 +86,76 @@ Router ────────────────────────
 2. **Reinforcement** — Both models process each request. The evaluator scores local vs. remote output. A rolling window tracks correlation. When sustained correlation exceeds the threshold, the system promotes to Phase 3.
 3. **Steady State** — The local model handles most traffic. An adaptive sampler periodically sends requests to both models to verify quality hasn't degraded. If it has, the system automatically regresses to Phase 2.
 
+## PII Protection
+
+Apprentice includes a built-in PII detection and tokenization middleware that scrubs sensitive data before it reaches models, training stores, or audit logs. The system uses a hybrid multi-modal approach that combines fast regex patterns with optional NER model inference.
+
+### Detection Modes
+
+| Mode | Strategies | Latency | Dependencies |
+|------|-----------|---------|-------------|
+| `regex_only` (default) | Regex patterns + field heuristics | ~0.1ms | None |
+| `hybrid` | Regex + field heuristics + NER model | ~50ms | `pip install apprentice-ai[ml]` |
+| `ner_only` | NER model only | ~50ms | `pip install apprentice-ai[ml]` |
+
+### What It Detects
+
+**Regex**: Emails, phone numbers, SSNs, credit cards, IP addresses, API keys, dates of birth
+**Field Heuristics**: Sensitive field names (email, phone, ssn, password, etc.) + learned patterns from prior detections
+**NER Model**: Person names, locations, organizations, miscellaneous entities — unstructured PII that regex can't catch
+
+### How It Works
+
+```
+Input data (may contain PII)
+  │
+  ├─ RegexDetectionStrategy        [confidence=1.0]
+  ├─ FieldHeuristicStrategy        [confidence=0.9]
+  └─ NERDetectionStrategy          [confidence=varies]
+  │
+  ▼
+Merge + deduplicate (union, highest confidence wins overlaps)
+  │
+  ▼
+Replace spans with opaque tokens → model sees __PII_EMAIL_a1b2c3__
+  │
+  ▼
+Post-process: restore tokens → original PII for end user
+```
+
+The system learns over time — fields that repeatedly contain PII get auto-flagged, and user feedback (false positives/negatives) adjusts confidence.
+
+### Configuration
+
+```yaml
+pii:
+  enabled: true
+  detection_mode: hybrid       # regex_only | hybrid | ner_only
+  ner_model: dslim/bert-base-NER
+  ner_device: cpu              # cpu | cuda
+  ner_confidence_threshold: 0.7
+  sensitive_fields:
+    - email
+    - phone
+    - ssn
+    - password
+```
+
+### Evaluation
+
+Apprentice includes a built-in evaluation harness for measuring PII detection quality against labeled datasets:
+
+```bash
+# Ingest the ai4privacy/pii-masking-200k dataset
+apprentice pii-ingest --dataset ai4privacy/pii-masking-200k --limit 1000
+
+# Evaluate regex baseline
+apprentice pii-evaluate --mode regex_only
+
+# Evaluate hybrid (regex + NER)
+apprentice pii-evaluate --mode hybrid
+```
+
 ## CLI
 
 | Command | Purpose |
@@ -94,6 +164,9 @@ Router ────────────────────────
 | `apprentice serve <config>` | Start HTTP server with REST API |
 | `apprentice status <config>` | Show phase, confidence, budget for each task |
 | `apprentice report <config>` | Generate summary report with metrics |
+| `apprentice ingest <file>` | Bulk ingest training data from file |
+| `apprentice pii-ingest` | Download and ingest PII evaluation dataset |
+| `apprentice pii-evaluate` | Evaluate PII detection against labeled data |
 
 ### HTTP Server Endpoints
 
@@ -156,7 +229,7 @@ Each task gets its own phase progression, confidence window, and evaluator. A si
 
 ## Architecture
 
-25 components organized in two layers — 18 leaf implementations with zero cross-dependencies, wired together by 7 integration compositions:
+28 components organized in two layers — 21 leaf implementations with zero cross-dependencies, wired together by 7 integration compositions:
 
 ### Leaf Components
 
@@ -180,6 +253,9 @@ Each task gets its own phase progression, confidence window, and evaluator. A si
 | `cli` | Command-line interface and HTTP server |
 | `audit_log` | Structured event logging (JSONL) |
 | `report_generator` | Reports, metrics, and observability |
+| `pii_tokenizer` | PII detection middleware with learned patterns |
+| `pii_detection` | Multi-strategy PII detection (regex, NER, heuristic) |
+| `pii_evaluation` | Span-level PII detection evaluation harness |
 
 ### Integration Compositions
 
@@ -199,14 +275,14 @@ Each task gets its own phase progression, confidence window, and evaluator. A si
 git clone https://github.com/jmcentire/apprentice.git
 cd apprentice
 make dev          # Install with dev + lint dependencies
-make test         # Run all 2,390 tests
+make test         # Run all 2,486 tests
 make test-quick   # Stop on first failure
 make lint         # Run ruff linter
 make lint-fix     # Auto-fix lint issues
 make clean        # Remove build artifacts
 ```
 
-Requires Python 3.12+. Core dependencies: `pydantic`, `pyyaml`, `httpx`.
+Requires Python 3.12+. Core dependencies: `pydantic`, `pyyaml`, `httpx`. Optional: `pip install apprentice-ai[ml]` for NER-based PII detection (adds `transformers`, `torch`, `datasets`).
 
 ## Built With
 
 
@@ -623,6 +623,21 @@ <h3>Adaptive Sampling</h3>
         <h3>Multi-Provider</h3>
         <p>Anthropic, OpenAI, or any API as the remote teacher. Ollama, vLLM, or llama.cpp as the local student. Mix and match per task.</p>
       </div>
+      <div class="feature-card">
+        <span class="feature-icon">&#x1f512;</span>
+        <h3>PII Protection</h3>
+        <p>Hybrid multi-modal PII detection: fast regex, field heuristics, and optional NER model inference. Scrubs sensitive data before it reaches models or logs. Learns over time.</p>
+      </div>
+      <div class="feature-card">
+        <span class="feature-icon">&#x1f9e0;</span>
+        <h3>NER Integration</h3>
+        <p>Optional transformer-based named entity recognition catches person names, addresses, and organizations that regex can't. Lazy-loaded &mdash; zero overhead when disabled.</p>
+      </div>
+      <div class="feature-card">
+        <span class="feature-icon">&#x1f4dd;</span>
+        <h3>Feedback Loop</h3>
+        <p>Human and AI feedback drives continuous improvement. False positive/negative reports adjust detection confidence. The system gets smarter with every correction.</p>
+      </div>
     </div>
   </div>
 </section>
@@ -631,11 +646,11 @@ <h3>Multi-Provider</h3>
   <div class="container">
     <div class="arch-grid">
       <div>
-        <h2>25 components.<br>Zero cross-dependencies.</h2>
+        <h2>28 components.<br>Zero cross-dependencies.</h2>
         <p>
-          18 leaf implementations wired together by 7 integration compositions.
+          21 leaf implementations wired together by 7 integration compositions.
           Each component was contract-tested independently before integration.
-          2,390 tests verify the system end-to-end.
+          2,486 tests verify the system end-to-end.
         </p>
         <p>
           Built with <a href="https://jmcentire.github.io/pact/">Pact</a> &mdash;
@@ -659,7 +674,9 @@ <h2>25 components.<br>Zero cross-dependencies.</h2>
         <div class="step"><span class="arrow">&#9656;</span> <span><span class="label">training_data_store</span> <span class="dim">// collect examples</span></span></div>
         <div class="step"><span class="arrow">&#9656;</span> <span><span class="label">fine_tuning_orchestrator</span> <span class="dim">// LoRA, OpenAI, HF</span></span></div>
         <div class="step"><span class="arrow">&#9656;</span> <span><span class="label">budget_manager</span> <span class="dim">// spend tracking</span></span></div>
-        <div class="step"><span class="arrow">&#9656;</span> <span><span class="warn">2,390 tests pass</span> <span class="dim">// verified</span></span></div>
+        <div class="step"><span class="arrow">&#9656;</span> <span><span class="label">pii_detection</span> <span class="dim">// regex + NER + heuristics</span></span></div>
+        <div class="step"><span class="arrow">&#9656;</span> <span><span class="label">pii_tokenizer</span> <span class="dim">// detect → replace → restore</span></span></div>
+        <div class="step"><span class="arrow">&#9656;</span> <span><span class="warn">2,486 tests pass</span> <span class="dim">// verified</span></span></div>
       </div>
     </div>
   </div>
@@ -732,6 +749,7 @@ <h2>HTTP API &amp; CLI</h2>
         <h2>Up and running in 60 seconds</h2>
         <p>
           Python 3.12+, three dependencies. <code>pip install apprentice-ai</code> and go.
+          For NER-based PII detection: <code>pip install apprentice-ai[ml]</code>.
         </p>
         <p>
           Define your tasks in YAML &mdash; prompt templates, evaluators, confidence thresholds.
@@ -758,7 +776,11 @@ <h2>Up and running in 60 seconds</h2>
 <span class="code-comment"># As a CLI</span>
 <span class="code-cmd">apprentice serve</span> config.yaml
 <span class="code-cmd">apprentice status</span> config.yaml
-<span class="code-cmd">apprentice report</span> config.yaml</code></pre>
+<span class="code-cmd">apprentice report</span> config.yaml
+
+<span class="code-comment"># PII evaluation (optional: pip install apprentice-ai[ml])</span>
+<span class="code-cmd">apprentice pii-ingest</span> <span class="code-flag">--limit</span> 1000
+<span class="code-cmd">apprentice pii-evaluate</span> <span class="code-flag">--mode</span> hybrid</code></pre>
       </div>
     </div>
   </div>
 
@@ -90,3 +90,30 @@ audit:
 training_data:
   storage_dir: .apprentice/training_data/
   max_examples_per_task: 50000
+
+# ── PII Protection ──
+# Scrubs sensitive data before it reaches models, training stores, or audit logs.
+# detection_mode: regex_only (default, no extra deps) | hybrid | ner_only
+pii:
+  enabled: true
+  detection_mode: regex_only           # regex_only | hybrid | ner_only
+  sensitive_fields:
+    - email
+    - phone
+    - ssn
+    - password
+    - credit_card
+    - api_key
+
+  # ── NER model settings (uncomment for hybrid or ner_only mode) ──
+  # Requires: pip install apprentice-ai[ml]
+  # ner_model: "dslim/bert-base-NER"  # HuggingFace model ID
+  # ner_device: cpu                     # cpu | cuda
+  # ner_confidence_threshold: 0.7       # discard NER detections below this
+  # ner_max_text_length: 10000          # skip NER on very long strings
+
+# ── Feedback ──
+# Human and AI feedback collection for continuous improvement.
+feedback:
+  enabled: true
+  storage_dir: .apprentice/feedback/
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 
 [project]
 name = "apprentice-ai"
-version = "0.2.0"
+version = "0.3.0"
 description = "Adaptive model distillation framework — progressively replace expensive API calls with fine-tuned local models through coaching, evaluation, and phased rollout"
 readme = "README.md"
 license = {text = "MIT"}
@@ -29,6 +29,11 @@ gke = [
     "google-cloud-storage>=2.10",
 ]
 wos = []
+ml = [
+    "transformers>=4.35.0",
+    "torch>=2.0.0",
+    "datasets>=2.14.0",
+]
 lint = [
     "ruff>=0.4.0",
 ]
 
@@ -17,7 +17,7 @@
 )
 from apprentice.factory import build_from_config
 
-__version__ = "0.1.0"
+__version__ = "0.3.0"
 __all__ = [
     "Apprentice",
     "build_from_config",
 
@@ -490,6 +490,19 @@ async def run(self, task_name: str, input_data: Dict[str, Any]) -> TaskResponse:
         start_time = time.time()
         start_time_utc = datetime.now(timezone.utc).isoformat()
 
+        # Apply middleware pre-processing (PII tokenization)
+        middleware_ctx = None
+        pipeline = getattr(self, '_middleware_pipeline', None)
+        if pipeline:
+            from apprentice.middleware import MiddlewareContext as MwCtx
+            middleware_ctx = MwCtx(
+                request_id=request_id,
+                task_name=task_name,
+                input_data=input_data,
+            )
+            middleware_ctx = pipeline.execute_pre(middleware_ctx)
+            input_data = middleware_ctx.input_data  # Now tokenized
+
         # Initialize RunContext
         run_ctx = RunContext(
             request_id=request_id,
@@ -635,6 +648,16 @@ async def run(self, task_name: str, input_data: Dict[str, Any]) -> TaskResponse:
             # Log to audit
             await self._audit_log.log(run_ctx)
 
+            # Apply middleware post-processing (PII detokenization)
+            if pipeline and middleware_ctx:
+                from apprentice.middleware import MiddlewareResponse as MwResp
+                mw_response = MwResp(
+                    output_data=output,
+                    middleware_state=middleware_ctx.middleware_state,
+                )
+                mw_response = pipeline.execute_post(middleware_ctx, mw_response)
+                output = mw_response.output_data  # Detokenized for user
+
             # Build and return TaskResponse
             return TaskResponse(
                 task_name=task_name,
@@ -680,6 +703,19 @@ async def _run_via_router(self, task_name: str, input_data: Dict[str, Any]) -> T
         request_id = str(uuid.uuid4())
         start_time = time.time()
 
+        # Apply middleware pre-processing (PII tokenization)
+        middleware_ctx = None
+        pipeline = getattr(self, '_middleware_pipeline', None)
+        if pipeline:
+            from apprentice.middleware import MiddlewareContext as MwCtx
+            middleware_ctx = MwCtx(
+                request_id=request_id,
+                task_name=task_name,
+                input_data=input_data,
+            )
+            middleware_ctx = pipeline.execute_pre(middleware_ctx)
+            input_data = middleware_ctx.input_data  # Now tokenized
+
         # Render prompt template from task registry if available
         prompt = str(input_data)
         try:
@@ -714,9 +750,21 @@ async def _run_via_router(self, task_name: str, input_data: Dict[str, Any]) -> T
 
             status = RunStatus.degraded if result.is_degraded else RunStatus.success
 
+            output = {"content": result.response.content}
+
+            # Apply middleware post-processing (PII detokenization)
+            if pipeline and middleware_ctx:
+                from apprentice.middleware import MiddlewareResponse as MwResp
+                mw_response = MwResp(
+                    output_data=output,
+                    middleware_state=middleware_ctx.middleware_state,
+                )
+                mw_response = pipeline.execute_post(middleware_ctx, mw_response)
+                output = mw_response.output_data  # Detokenized for user
+
             return TaskResponse(
                 task_name=task_name,
-                output={"content": result.response.content},
+                output=output,
                 source=source,
                 status=status,
                 request_id=request_id,
Original file line number	Diff line number	Diff line change
`@@ -17,7 +17,7 @@`
`17`	`17`	`)`
`18`	`18`	`from apprentice.factory import build_from_config`
`19`	`19`
`20`		`-__version__ = "0.1.0"`
	`20`	`+__version__ = "0.3.0"`
`21`	`21`	`__all__ = [`
`22`	`22`	`"Apprentice",`
`23`	`23`	`"build_from_config",`