Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions .claude/commands/ferpa-audit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
description: Run FERPA read-time audit (Layer A static + Layer B DB) and produce docs/ferpa-audit-<date>.md
---

Follow `.claude/skills/ferpa-audit/SKILL.md` exactly.

1. Ensure `ferpa-config.yaml` at the repository root is current.
2. Run `./scripts/ferpa-audit.sh` from the repository root (or Layer A then Layer B per the skill if the shell script is unavailable).
3. Open the generated `docs/ferpa-audit-<YYYY-MM-DD>.md` and confirm Critical/Warning/Note counts match executive summary.
4. Every finding in any narrative you add MUST cite `.claude/skills/ferpa-audit/references/regulatory-citations.md`.

v1 scope: read-time detection only — do not expand into lineage, retention, breach response, or CI gating unless the user explicitly asks.
110 changes: 110 additions & 0 deletions .claude/skills/ferpa-audit/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
---
name: ferpa-audit
description: Run Layer A (static) + Layer B (Postgres) read-time FERPA leak detection and write docs/ferpa-audit-<date>.md. Invoke with /ferpa-audit or when auditing student-data flows for CIO/legal review.
---

# FERPA audit skill (v1 — read-time detection)

## Purpose

This skill produces a **single markdown report** institutions can share with a **CIO, compliance lead, or legal counsel** without a engineer in the room. It answers: *“Where could student education records or personally identifiable information leak at read time — in code, configuration, logs, vendor calls, or the database?”*

v1 is **detection and documentation only**: no CI gating, no retention/breach playbooks, no data-lineage graphs (see issue #107 for lineage).

## Invocation

- **Slash command:** `/ferpa-audit` (repo command: `.claude/commands/ferpa-audit.md`).
- **CLI (authoritative for Layer A+B merge):** from repository root, after configuring `ferpa-config.yaml` and (for Layer B) database env vars per `operations/db_config.py`:

```bash
./scripts/ferpa-audit.sh
```

- **Layer A only (fast):**

```bash
cd codebenders-dashboard && npx tsx ../.claude/skills/ferpa-audit/scripts/static-audit.ts --repo-root .. --out /tmp/ferpa-static.json
```

- **Merge Layer B (expects static JSON from Layer A):**

```bash
./venv/bin/python .claude/skills/ferpa-audit/scripts/db-audit.py --repo-root . --static-json /tmp/ferpa-static.json
```

Default report path: `docs/ferpa-audit-<YYYY-MM-DD>.md` (UTC date).

## Regulatory knowledge (load-bearing)

**Every finding MUST cite** a section from `references/regulatory-citations.md` in this skill folder. Do not invent citations. If nothing fits, lower severity to **Note** and still pick the closest hook, or omit the finding.

**Why this matters:** Generic “PII scanners” flag strings; FERPA audits explain **whether education records or PII could be disclosed** without a valid basis (e.g. consent, school-official/legitimate educational interest, permitted vendor relationship, statistical de-identification). The citation anchors the narrative for procurement and legal review.

### Severity rubric

| Level | Meaning | Typical examples |
|-------|---------|------------------|
| **Critical** | Plausible **uncontrolled disclosure** of identifiers or row-level education records to the wrong party, or bypass of documented safeguards. | Arbitrary SQL execution without FERPA column controls; student-level payloads to non-allowlisted external hosts when hardening is off; logging full query results client-side. |
| **Warning** | **Material gap** in access control, vendor boundary, or consistency with documented policy — exploitable or high residual risk but may depend on deployment flags or network posture. | API routes returning cohort/student analytics without role checks; optional external data API path; LLM receives result rows without institutional review of caps/redaction. |
| **Note** | **Transparency / policy alignment** issues, small-N disclosure risk, or technical debt that should be tracked with FERPA framing. | AI transparency inventory drift; schema metadata sent to vendors; planned audit-log coverage not yet implemented. |

Escalate **Critical → Warning** when the issue is **fully mitigated by documented production configuration** (e.g. `FORCE_DIRECT_DB=true`) but the **unsafe path still exists in code** — frame as “residual risk if misconfigured.”

## Layers

### Layer A — Static codebase audit (`scripts/static-audit.ts`)

Inputs: repository tree, `ferpa-config.yaml`. Output: JSON findings file consumed by Layer B.

**Checks (v1):**

1. **SELECT / response-shape exclusions** — Regex over SQL-like literals and template strings for columns in `select_exclusions` / `sensitive_demographics`; flag unless `// FERPA-OK:` appears on the same line (convention).
2. **Policy vs enforcement** — Compare `/api/analyze` FERPA column guard (`lib/sql-inspector.ts`) to other execution paths (notably `/api/execute-sql`). A documented exclusion that is **prompt-only** or **only on one route** is a **Critical** or **Warning** finding.
3. **Console / client log leakage** — Flag `console.log` / `console.debug` / `console.info` (client bundles) that log query plans, results, or other large objects that may contain student rows.
4. **External fetch / third-party hosts** — Detect `schools.syntex-ai.com` and other non-allowlisted hosts; tie to `FORCE_DIRECT_DB` / `buildExternalAnalysisReadyUrl` story (#126).
5. **RBAC coverage** — Routes listed in `ferpa-config.yaml` under `rbac.student_data_routes` must reference the configured role header (default `x-user-role`).
6. **LLM prompt / vendor disclosure** — Ensure `ferpaExcluded` in `app/api/analyze/route.ts` is the single source of truth for model-directed SQL exclusions; flag gross inconsistency if other prompts contradict.
7. **AI transparency cross-check** — Every OpenAI / Vercel AI SDK call site under `app/api/**` must be reflected in `content/ai-transparency.ts` (issue #108).

**Microsoft Presidio:** Optional enhancement. v1 does not require Presidio to pass acceptance; regex + AST + project config carry FERPA semantics. To add Presidio, install `presidio-analyzer` in the project venv and extend Layer A with a subprocess helper (documented in runbook).

### Layer B — Live Postgres audit (`scripts/db-audit.py`)

Inputs: read-only DB connection (`DB_*` env vars or `operations/db_config.py` defaults), `ferpa-config.yaml`, static JSON. Output: final markdown report.

**Checks (v1):**

1. **Schema snapshot** — Tables/columns in the configured schema(s), timestamped in the appendix.
2. **RLS** — Flag tables holding PII-class columns (per config taxonomy) with **no** row-level security policy (informational **Warning** in v1 — many institutional dashboards rely on app-layer RBAC instead).
3. **Small-N / subgroup disclosure** — For configured dimensions on the predictions table, flag cells where `COUNT(*) < subpopulation_minimum_n` (§99.35 statistical-disclosure framing).
4. **Audit-log path** — If `audit_log.table_name` is unset or table missing, emit a **Note** referencing planned institutional audit coverage (#67), not a fake pass.

### Layer C — Report

`db-audit.py` merges executive summary, findings grouped by severity, and appendix (skill path, config hash, DB snapshot time).

**Writing style:** Plain English “what we saw,” “why it matters under FERPA,” “what to do next.” Avoid library names without one-line explanations.

## Known regression targets (main branch)

When tuning scanners, these MUST appear **without manual hints**:

1. **Student_GUID / execute-sql** — Runtime guard exists on `/api/analyze` (#127) but **not** on arbitrary SQL execution — policy vs enforcement gap.
2. **schools.syntex-ai.com** — External analysis-ready API path unless `FORCE_DIRECT_DB` / direct DB mode (#126).
3. **Client console logging** — e.g. query page logging plans/results.

## Out of scope (v1)

- Lineage (#107), retention schedules, breach playbooks, write-time prevention / CI enforcement.

## Files

| File | Role |
|------|------|
| `SKILL.md` | This document |
| `references/regulatory-citations.md` | §99.x citation anchors |
| `scripts/static-audit.ts` | Layer A |
| `scripts/db-audit.py` | Layer B + report merge |
| `ferpa-config.yaml` (repository root) | Project exclusions, allowlists, thresholds |
| `../../docs/ferpa-audit-runbook.md` | Human runbook |
| `../../scripts/ferpa-audit.sh` | One-shot runner |
81 changes: 81 additions & 0 deletions .claude/skills/ferpa-audit/references/regulatory-citations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# FERPA regulatory hooks (34 CFR Part 99)

Use this file as the **only** authoritative list of section citations for FERPA-audit findings. Each finding in `docs/ferpa-audit-<date>.md` must reference **at least one** anchor below. Prefer the narrowest hook that matches the risk.

Citations refer to the **Family Educational Rights and Privacy Act** regulations at 34 CFR Part 99 (commonly cited as “FERPA” in higher-education practice). This is a plain-English index for audit narratives; it is not legal advice.

---

## §99.3 — Definitions

| Anchor | What it covers | Typical audit use |
|--------|----------------|-------------------|
| **§99.3 — “Personally identifiable information” (PII)** | Information that, alone or in combination, would let a reasonable person identify a student with reasonable certainty — including direct identifiers and many indirect linkages. | Flagging direct identifiers (e.g. institution-issued student IDs/GUIDs), linkable keys, or combinations that re-identify individuals in outputs, logs, or vendor payloads. |
| **§99.3 — “Education records”** | Records directly related to a student and maintained by an educational agency or institution (with stated exceptions). | Explaining why student academic/demographic datasets, predictions, and course rows are not “just analytics data” — they are protected education records unless an exception clearly applies. |
| **§99.3 — “Directory information”** | Limited categories an institution may disclose without consent if public notice and opt-out requirements are met. | Warning when “directory information” arguments are misapplied to non-directory fields (grades, risk flags, detailed demographics, etc.). |

---

## §99.7 — Policy and rights awareness

| Anchor | Typical audit use |
|--------|-------------------|
| **§99.7 — annual notification / rights awareness** | Institution must inform parents/eligible students of rights under FERPA. | **Note**-level reminders when new technical surfaces (AI, external APIs) change how records are processed, so policy notices and transparency artifacts stay aligned with reality. |

---

## §99.30 — Basis for disclosure

| Anchor | Typical audit use |
|--------|-------------------|
| **§99.30 — general rule on consent** | Disclosure of PII from education records generally requires prior written consent unless a specific exception applies. | Framing **why** a new outbound data path (vendor API, third-party host) is sensitive even if “no SSN is sent.” |

---

## §99.31 — Conditions for disclosure

| Anchor | Typical audit use |
|--------|-------------------|
| **§99.31(a)(1) — studies exception (statutory)** | Permits disclosure to researchers under defined conditions. | Rare in routine dashboard audits; use only when the system is actually operating under this exception. |
| **§99.31(a)(1)(i) — “school officials” / legitimate educational interest** | Institutions may disclose to school officials with legitimate educational interest in the information. | **Primary hook** for internal analytics: staff may access student data only when their role and task justify it — motivates **RBAC**, access logging, and least-privilege API design. |
| **§99.31(a)(1)(ii)(A)(B) — contractors / “school officials” vendors** | Vendors performing institutional services may receive disclosures only under direct control and consistent use/re-disclosure rules. | **Primary hook** for **cloud LLM APIs**, hosted analytics, or third-party data hosts: the institution remains responsible for whether the disclosure is permitted and properly constrained. |

---

## §99.32 — Recordkeeping and transparency to parents/students

| Anchor | Typical audit use |
|--------|-------------------|
| **§99.32 — record of requests and disclosures** | Institutions must maintain a record of certain disclosures (with defined exceptions). | **Note**/**Warning** when read paths touch sensitive tables but no auditable trail exists (future linkage to institutional audit-log requirements). |

---

## §99.33 — Limits on redisclosure

| Anchor | Typical audit use |
|--------|-------------------|
| **§99.33 — redisclosure rules** | Third parties receiving education records generally may not redisclose except under specific circumstances. | Explaining risk when data is sent to vendors, partner hosts, or embedded in client-side logs that leave the institution’s control. |

---

## §99.35 — Disclosure for research / statistical purposes

| Anchor | Typical audit use |
|--------|-------------------|
| **§99.35 — de-identified / statistical disclosures** | Additional conditions when disclosing for research or statistical purposes. | Supporting findings on **small-cell suppression**, aggregation, and k-anonymity-style thresholds so subgroup statistics cannot identify individuals. |

---

## §99.37 — Directory information

| Anchor | Typical audit use |
|--------|-------------------|
| **§99.37 — directory information disclosures** | Conditions under which directory information may be released without consent. | Use when reviewing whether a field is truly directory information before treating it as low sensitivity. |

---

## How to cite in a finding

1. Pick **one** primary anchor (e.g. `§99.31(a)(1)(i) — legitimate educational interest`).
2. Add a **short** plain-English “why” tying the technical fact to that hook (vendor disclosure, missing access control, identifier in export, etc.).
3. Do **not** stack unrelated sections; add a second citation only when two distinct legal bases are genuinely implicated.
Loading
Loading