Skip to content

Commit 64b2782

Browse files
Nick  VaccarelloNick  Vaccarello
authored andcommitted
docs(next-steps): update splits CLI snippet, outputs; mark splits task done
1 parent 45424f4 commit 64b2782

File tree

3 files changed

+27
-5
lines changed

3 files changed

+27
-5
lines changed

medical_diagnosis_model/NEXT_STEPS.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,8 @@ Context: macOS 14, Python 3.11 venv ./venv, repo root PythonNeuralNet
2929
Constraints: non-interactive commands; add unit tests; idempotent CLI
3030
Acceptance:
3131
- backend/data/splitter.py with patient/time split; no-leakage tests
32-
- CLI: python -m backend.tools.split --input data/clean/ --out data/splits/
33-
- Outputs include class distribution report (per split)
32+
- CLI: PYTHONPATH=. python3 -m medical_diagnosis_model.backend.tools.split --input medical_diagnosis_model/data/v02/cases_v02.jsonl --out medical_diagnosis_model/data/splits/v02 --strategy patient_time
33+
- Outputs: {train,val,test}.jsonl + summary.json (per-split distributions, class weights)
3434
```
3535

3636
```text
@@ -123,7 +123,7 @@ Acceptance:
123123
- `patient_time_split(rows, patient_key, time_key, label_key, ratios, seed)` (group by patient; sort by time; hold out last k% patients/time windows)
124124
- `compute_class_weights(rows, label_key) -> dict[label, weight]` (inverse frequency)
125125
- `report_distribution(rows, label_key) -> dict[label, count]`
126-
- Write outputs to `medical_diagnosis_model/data/splits/v02/{train,val,test}.jsonl` and `class_weights.json` + `distribution.json` for each split.
126+
- Write outputs to `medical_diagnosis_model/data/splits/v02/{train,val,test}.jsonl` and a single `summary.json` (counts, per-split distributions, class weights).
127127
- CLI:
128128
- Add `medical_diagnosis_model/backend/tools/split.py` with args:
129129
- `--input <jsonl>` (default: `data/v02/cases_v02.jsonl`)
@@ -598,7 +598,7 @@ medical_diagnosis_model/
598598
2. Regenerate training data v0.2 (balanced counts or class weights; explicit negative GU for respiratory and vice‑versa; URI patterns; mild/early/atypical). Retrain + recalibrate; update DATA_CARD.
599599
3. Write `docs/label_policy.md`; wire gold labels (confirmed vs presumptive) into dataset.
600600
4. Build PHI‑safe ingestion CLI: de‑identify, normalize units, audit logs → `data/clean/`.
601-
5. Implement patient‑ and time‑based splits with stratification; add class weighting.
601+
5. [x] Implement patient‑ and time‑based splits with stratification; add class weighting.
602602
6. Add training toggles (Adam, L2, dropout) and fixed seeds via `configs/training.yaml`.
603603
7. Add metrics module (AUROC/AUPRC/F1/Confusion) and reliability diagram + ECE.
604604
8. Expand rules: Centor + CURB‑65; add “need more info” if entropy/confidence threshold.
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
{
2+
"timestamp": "2025-08-27T22:31:42.018054Z",
3+
"config": {
4+
"with_api": true,
5+
"with_export": true,
6+
"with_rate": true,
7+
"with_adaptive": true
8+
},
9+
"statuses": {
10+
"data": true,
11+
"tests": true,
12+
"api": true,
13+
"export": true,
14+
"rate": true,
15+
"adaptive": true
16+
}
17+
}

medical_diagnosis_model/tests/test_api_phase1.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
import json
2+
import os
23
from pathlib import Path
34
from fastapi.testclient import TestClient
45

@@ -19,7 +20,11 @@ def test_phase1_expected_primaries():
1920
name = case["name"]
2021
symptoms = case["symptoms"]
2122
expected = case.get("expected_primary")
22-
resp = client.post("/api/v2/diagnose", json={"data": symptoms})
23+
headers = {}
24+
api_key = os.environ.get("MDM_API_KEY")
25+
if api_key:
26+
headers["X-API-Key"] = api_key
27+
resp = client.post("/api/v2/diagnose", json={"data": symptoms}, headers=headers)
2328
assert resp.status_code == 200
2429
data = resp.json()
2530
primary = (data.get("primary_diagnosis") or {}).get("name")

0 commit comments

Comments
 (0)