Skip to content

Commit 3155c2c

Browse files
Nick  VaccarelloNick  Vaccarello
authored andcommitted
fix(train): use gradient descent in v2 updates; docs: note sign convention in Appendix A
1 parent c50af4a commit 3155c2c

File tree

2 files changed

+17
-8
lines changed

2 files changed

+17
-8
lines changed

foundational_brain/BEHIND_THE_SCENES.md

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -186,23 +186,29 @@ This document lives in `foundational_brain/BEHIND_THE_SCENES.md` and explains th
186186
## Appendix A (v0.2 pipeline specifics)
187187

188188
### A1. JSONL → vector mapping
189+
189190
- Each record has a free‑text symptom map (Name → Severity 0–10) and a `label_name`.
190191
- We map names to fixed symptom IDs; build x = [presence; severity], where presence_i = 1 if severity_i > 0 else 0 and severity_i ∈ [0,1].
191192
- Label is mapped to a class index and then one‑hot y.
192193

193194
### A2. Class balance and explicit negatives
195+
194196
- Balanced per‑class counts (or class weights) suppress prior skew.
195197
- Explicit negatives encode “absence of key symptoms” (e.g., dysuria=0, frequency=0 in respiratory cases), teaching strong negative evidence.
196198

197199
### A3. Training objective (softmax + cross‑entropy)
200+
198201
- Same equations as main text; we optimize NLL with SGD.
199202
- Validation split for early stopping; select the best epoch by lowest val loss.
203+
Implementation note: v2 applies gradient descent (subtracting gradients) for both layers; the foundational demo used MSE with a toy update.
200204

201205
### A4. Probability calibration (temperature scaling)
202-
- Pick T* on the validation set by minimizing NLL(softmax(z/T)).
203-
- At inference, ŷ = softmax(z/T*). This improves reliability (confidence ≈ accuracy).
206+
207+
- Pick T\* on the validation set by minimizing NLL(softmax(z/T)).
208+
- At inference, ŷ = softmax(z/T\*). This improves reliability (confidence ≈ accuracy).
204209

205210
### A5. Expected Information Gain (EIG) in adaptive questioning
211+
206212
- Current posterior P(d) (after clinical rules).
207213
- For a candidate symptom s, approximate P(yes|d) from disease symptom frequencies.
208214
- P(yes) = Σ_d P(d) P(yes|d); P(no) = 1 − P(yes).
@@ -211,14 +217,17 @@ This document lives in `foundational_brain/BEHIND_THE_SCENES.md` and explains th
211217
- EIG(s) = H(P) − [P(yes) H(P(d|yes)) + P(no) H(P(d|no))]. We ask the s with highest EIG.
212218

213219
### A6. Evidence‑aware stop and triage
220+
214221
- We stop only if (a) top‑1 probability ≥ threshold and (b) minimal supporting evidence exists (e.g., at least one GU key for UTI).
215222
- First question is selected from a small triage set (respiratory vs GU vs GI discriminators) to reduce early ambiguity.
216223

217224
### A7. Quick metrics
225+
218226
- Confusion matrix and ECE bins provide a fast snapshot of class separation and calibration.
219227
- ECE = Σ_bins (n_bin/N) |mean_conf − mean_acc|.
220228

221229
### A8. Where to look
230+
222231
- Data gen: `medical_diagnosis_model/data/generate_v02.py`
223232
- Train from JSONL: `medical_diagnosis_model/versions/v2/medical_neural_network_v2.py`
224233
- Pipeline + metrics: `medical_diagnosis_model/tools/train_pipeline.py`

medical_diagnosis_model/versions/v2/medical_neural_network_v2.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -307,17 +307,17 @@ def _backward_softmax_ce(self, network, inputs, hidden_outputs, probs, expected_
307307
neuron['delta'] = error * d
308308
hidden_deltas.append(neuron['delta'])
309309

310-
# Update weights for output layer
310+
# Update weights for output layer (gradient descent)
311311
for i, neuron in enumerate(output_layer):
312312
for j in range(len(hidden_outputs)):
313-
neuron['weights'][j] += self.learning_rate * neuron['delta'] * hidden_outputs[j]
314-
neuron['weights'][-1] += self.learning_rate * neuron['delta']
313+
neuron['weights'][j] -= self.learning_rate * neuron['delta'] * hidden_outputs[j]
314+
neuron['weights'][-1] -= self.learning_rate * neuron['delta']
315315

316-
# Update weights for hidden layer
316+
# Update weights for hidden layer (gradient descent)
317317
for j, neuron in enumerate(hidden_layer):
318318
for k in range(len(inputs)):
319-
neuron['weights'][k] += self.learning_rate * neuron['delta'] * inputs[k]
320-
neuron['weights'][-1] += self.learning_rate * neuron['delta']
319+
neuron['weights'][k] -= self.learning_rate * neuron['delta'] * inputs[k]
320+
neuron['weights'][-1] -= self.learning_rate * neuron['delta']
321321

322322
def _train_softmax_cross_entropy(self, network, train_set, val_set, verbose=True):
323323
best_val_nll = float('inf')

0 commit comments

Comments
 (0)