IDC6940_HealthInsightAI/slides.html at main · LabMage/IDC6940_HealthInsightAI · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html lang="en"><head>
<script src="slides_files/libs/clipboard/clipboard.min.js"></script>
<script src="slides_files/libs/quarto-html/tabby.min.js"></script>
<script src="slides_files/libs/quarto-html/popper.min.js"></script>
<script src="slides_files/libs/quarto-html/tippy.umd.min.js"></script>
<link href="slides_files/libs/quarto-html/tippy.css" rel="stylesheet">
<link href="slides_files/libs/quarto-html/light-border.css" rel="stylesheet">
<link href="slides_files/libs/quarto-html/quarto-html.min.css" rel="stylesheet" data-mode="light">
<link href="slides_files/libs/quarto-html/quarto-syntax-highlighting.css" rel="stylesheet" id="quarto-text-highlighting-styles"><meta charset="utf-8">
  <meta name="generator" content="quarto-1.3.361">

  <meta name="author" content="Elena Boiko, Jacqueline Razo (Advisor: Dr.&nbsp;Cohen)">
  <meta name="dcterms.date" content="2025-04-29">
  <title>Diagnosing Diseases Using kNN</title>
  <meta name="apple-mobile-web-app-capable" content="yes">
  <meta name="apple-mobile-web-app-status-bar-style" content="black-translucent">
  <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no, minimal-ui">
  <link rel="stylesheet" href="slides_files/libs/revealjs/dist/reset.css">
  <link rel="stylesheet" href="slides_files/libs/revealjs/dist/reveal.css">
  <style>
    code{white-space: pre-wrap;}
    span.smallcaps{font-variant: small-caps;}
    div.columns{display: flex; gap: min(4vw, 1.5em);}
    div.column{flex: auto; overflow-x: auto;}
    div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
    ul.task-list{list-style: none;}
    ul.task-list li input[type="checkbox"] {
      width: 0.8em;
      margin: 0 0.8em 0.2em -1em; /* quarto-specific, see https://github.com/quarto-dev/quarto-cli/issues/4556 */
      vertical-align: middle;
    }
    /* CSS for citations */
    div.csl-bib-body { }
    div.csl-entry {
      clear: both;
    }
    .hanging-indent div.csl-entry {
      margin-left:2em;
      text-indent:-2em;
    }
    div.csl-left-margin {
      min-width:2em;
      float:left;
    }
    div.csl-right-inline {
      margin-left:2em;
      padding-left:1em;
    }
    div.csl-indent {
      margin-left: 2em;
    }  </style>
  <link rel="stylesheet" href="slides_files/libs/revealjs/dist/theme/quarto.css">
  <link href="slides_files/libs/revealjs/plugin/quarto-line-highlight/line-highlight.css" rel="stylesheet">
  <link href="slides_files/libs/revealjs/plugin/reveal-menu/menu.css" rel="stylesheet">
  <link href="slides_files/libs/revealjs/plugin/reveal-menu/quarto-menu.css" rel="stylesheet">
  <link href="slides_files/libs/revealjs/plugin/quarto-support/footer.css" rel="stylesheet">
  <style type="text/css">

  .callout {
    margin-top: 1em;
    margin-bottom: 1em;
    border-radius: .25rem;
  }

  .callout.callout-style-simple {
    padding: 0em 0.5em;
    border-left: solid #acacac .3rem;
    border-right: solid 1px silver;
    border-top: solid 1px silver;
    border-bottom: solid 1px silver;
    display: flex;
  }

  .callout.callout-style-default {
    border-left: solid #acacac .3rem;
    border-right: solid 1px silver;
    border-top: solid 1px silver;
    border-bottom: solid 1px silver;
  }

  .callout .callout-body-container {
    flex-grow: 1;
  }

  .callout.callout-style-simple .callout-body {
    font-size: 1rem;
    font-weight: 400;
  }

  .callout.callout-style-default .callout-body {
    font-size: 0.9rem;
    font-weight: 400;
  }

  .callout.callout-titled.callout-style-simple .callout-body {
    margin-top: 0.2em;
  }

  .callout:not(.callout-titled) .callout-body {
      display: flex;
  }

  .callout:not(.no-icon).callout-titled.callout-style-simple .callout-content {
    padding-left: 1.6em;
  }

  .callout.callout-titled .callout-header {
    padding-top: 0.2em;
    margin-bottom: -0.2em;
  }

  .callout.callout-titled .callout-title  p {
    margin-top: 0.5em;
    margin-bottom: 0.5em;
  }

  .callout.callout-titled.callout-style-simple .callout-content  p {
    margin-top: 0;
  }

  .callout.callout-titled.callout-style-default .callout-content  p {
    margin-top: 0.7em;
  }

  .callout.callout-style-simple div.callout-title {
    border-bottom: none;
    font-size: .9rem;
    font-weight: 600;
    opacity: 75%;
  }

  .callout.callout-style-default  div.callout-title {
    border-bottom: none;
    font-weight: 600;
    opacity: 85%;
    font-size: 0.9rem;
    padding-left: 0.5em;
    padding-right: 0.5em;
  }

  .callout.callout-style-default div.callout-content {
    padding-left: 0.5em;
    padding-right: 0.5em;
  }

  .callout.callout-style-simple .callout-icon::before {
    height: 1rem;
    width: 1rem;
    display: inline-block;
    content: "";
    background-repeat: no-repeat;
    background-size: 1rem 1rem;
  }

  .callout.callout-style-default .callout-icon::before {
    height: 0.9rem;
    width: 0.9rem;
    display: inline-block;
    content: "";
    background-repeat: no-repeat;
    background-size: 0.9rem 0.9rem;
  }

  .callout-title {
    display: flex
  }

  .callout-icon::before {
    margin-top: 1rem;
    padding-right: .5rem;
  }

  .callout.no-icon::before {
    display: none !important;
  }

  .callout.callout-titled .callout-body > .callout-content > :last-child {
    margin-bottom: 0.5rem;
  }

  .callout.callout-titled .callout-icon::before {
    margin-top: .5rem;
    padding-right: .5rem;
  }

  .callout:not(.callout-titled) .callout-icon::before {
    margin-top: 1rem;
    padding-right: .5rem;
  }

  /* Callout Types */

  div.callout-note {
    border-left-color: #4582ec !important;
  }

  div.callout-note .callout-icon::before {
    background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAYAAABzenr0AAAAAXNSR0IArs4c6QAAAERlWElmTU0AKgAAAAgAAYdpAAQAAAABAAAAGgAAAAAAA6ABAAMAAAABAAEAAKACAAQAAAABAAAAIKADAAQAAAABAAAAIAAAAACshmLzAAAEU0lEQVRYCcVXTWhcVRQ+586kSUMMxkyaElstCto2SIhitS5Ek8xUKV2poatCcVHtUlFQk8mbaaziwpWgglJwVaquitBOfhQXFlqlzSJpFSpIYyXNjBNiTCck7x2/8/LeNDOZxDuEkgOXe++553zfefee+/OYLOXFk3+1LLrRdiO81yNqZ6K9cG0P3MeFaMIQjXssE8Z1JzLO9ls20MBZX7oG8w9GxB0goaPrW5aNMp1yOZIa7Wv6o2ykpLtmAPs/vrG14Z+6d4jpbSKuhdcSyq9wGMPXjonwmESXrriLzFGOdDBLB8Y6MNYBu0dRokSygMA/mrun8MGFN3behm6VVAwg4WR3i6FvYK1T7MHo9BK7ydH+1uurECoouk5MPRyVSBrBHMYwVobG2aOXM07sWrn5qgB60rc6mcwIDJtQrnrEr44kmy+UO9r0u9O5/YbkS9juQckLed3DyW2XV/qWBBB3ptvI8EUY3I9p/67OW+g967TNr3Sotn3IuVlfMLVnsBwH4fsnebJvyGm5GeIUA3jljERmrv49SizPYuq+z7c2H/jlGC+Ghhupn/hcapqmcudB9jwJ/3jvnvu6vu5lVzF1fXyZuZZ7U8nRmVzytvT+H3kilYvH09mLWrQdwFSsFEsxFVs5fK7A0g8gMZjbif4ACpKbjv7gNGaD8bUrlk8x+KRflttr22JEMRUbTUwwDQScyzPgedQHZT0xnx7ujw2jfVfExwYHwOsDTjLdJ2ebmeQIlJ7neo41s/DrsL3kl+W2lWvAga0tR3zueGr6GL78M3ifH0rGXrBC2aAR8uYcIA5gwV8zIE8onoh8u0Fca/ciF7j1uOzEnqcIm59sEXoGc0+z6+H45V1CvAvHcD7THztu669cnp+L0okAeIc6zjbM/24LgGM1gZk7jnRu1aQWoU9sfUOuhrmtaPIO3YY1KLLWZaEO5TKUbMY5zx8W9UJ6elpLwKXbsaZ4EFl7B4bMtDv0iRipKoDQT2sNQI9b1utXFdYisi+wzZ/ri/1m7QfDgEuvgUUEIJPq3DhX/5DWNqIXDOweC2wvIR90Oq3lDpdMIgD2r0dXvGdsEW5H6x6HLRJYU7C69VefO1x8Gde1ZFSJLfWS1jbCnhtOPxmpfv2LXOA2Xk2tvnwKKPFuZ/oRmwBwqRQDcKNeVQkYcOjtWVBuM/JuYw5b6isojIkYxyYAFn5K7ZBF10fea52y8QltAg6jnMqNHFBmGkQ1j+U43HMi2xMar1Nv0zGsf1s8nUsmUtPOOrbFIR8bHFDMB5zL13Gmr/kGlCkUzedTzzmzsaJXhYawnA3UmARpiYj5ooJZiUoxFRtK3X6pgNPv+IZVPcnwbOl6f+aBaO1CNvPW9n9LmCp01nuSaTRF2YxHqZ8DYQT6WsXT+RD6eUztwYLZ8rM+rcPxamv1VQzFUkzFXvkiVrySGQgJNvXHJAxiU3/NwiC03rSf05VBaPtu/Z7/B8Yn/w7eguloAAAAAElFTkSuQmCC');
  }

  div.callout-note.callout-style-default .callout-title {
    background-color: #dae6fb
  }

  div.callout-important {
    border-left-color: #d9534f !important;
  }

  div.callout-important .callout-icon::before {
    background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAYAAABzenr0AAAAAXNSR0IArs4c6QAAAERlWElmTU0AKgAAAAgAAYdpAAQAAAABAAAAGgAAAAAAA6ABAAMAAAABAAEAAKACAAQAAAABAAAAIKADAAQAAAABAAAAIAAAAACshmLzAAAEKklEQVRYCcVXTWhcVRS+575MJym48A+hSRFr00ySRQhURRfd2HYjk2SSTokuBCkU2o0LoSKKraKIBTcuFCoidGFD08nkBzdREbpQ1EDNIv8qSGMFUboImMSZd4/f9zJv8ibJMC8xJQfO3HPPPef7zrvvvnvviIkpC9nsw0UttFunbUhpFzFtarSd6WJkStVMw5xyVqYTvkwfzuf/5FgtkVoB0729j1rjXwThS7Vio+Mo6DNnvLfahoZ+i/o32lULuJ3NNiz7q6+pyAUkJaFF6JwaM2lUJlV0MlnQn5aTRbEu0SEqHUa0A4AdiGuB1kFXRfVyg5d87+Dg4DL6m2TLAub60ilj7A1Ec4odSAc8X95sHh7+ZRPCFo6Fnp7HfU/fBng/hi10CjCnWnJjsxvDNxWw0NfV6Rv5GgP3I3jGWXumdTD/3cbEOP2ZbOZp69yniG3FQ9z1jD7bnBu9Fc2tKGC2q+uAJOQHBDRiZX1x36o7fWBs7J9ownbtO+n0/qWkvW7UPIfc37WgT6ZGR++EOJyeQDSb9UB+DZ1G6DdLDzyS+b/kBCYGsYgJbSQHuThGKRcw5xdeQf8YdNHsc6ePXrlSYMBuSIAFTGAtQo+VuALo4BX83N190NWZWbynBjhOHsmNfFWLeL6v+ynsA58zDvvAC8j5PkbOcXCMg2PZFk3q8MjI7WAG/Dp9AwP7jdGBOOQkAvlFUB+irtm16I1Zw9YBcpGTGXYmk3kQIC/Cds55l+iMI3jqhjAuaoe+am2Jw5GT3Nbz3CkE12NavmzN5+erJW7046n/CH1RO/RVa8lBLozXk9uqykkGAyRXLWlLv5jyp4RFsG5vGVzpDLnIjTWgnRy2Rr+tDKvRc7Y8AyZq10jj8DqXdnIRNtFZb+t/ZRtXcDiVnzpqx8mPcDWxgARUqx0W1QB9MeUZiNrV4qP+Ehc+BpNgATsTX8ozYKL2NtFYAHc84fG7ndxUPr+AR/iQSns7uSUufAymwDOb2+NjK27lEFocm/EE2WpyIy/Hi66MWuMKJn8RvxIcj87IM5Vh9663ziW36kR0HNenXuxmfaD8JC7tfKbrhFr7LiZCrMjrzTeGx+PmkosrkNzW94ObzwocJ7A1HokLolY+AvkTiD/q1H0cN48c5EL8Crkttsa/AXQVDmutfyku0E7jShx49XqV3MFK8IryDhYVbj7Sj2P2eBxwcXoe8T8idsKKPRcnZw1b+slFTubwUwhktrfnAt7J++jwQtLZcm3sr9LQrjRzz6cfMv9aLvgmnAGvpoaGLxM4mAEaLV7iAzQ3oU0IvD5x9ix3yF2RAAuYAOO2f7PEFWCXZ4C9Pb2UsgDeVnFSpbFK7/IWu7TPTvBqzbGdCHOJQSxiEjt6IyZmxQyEJHv6xyQsYk//moVFsN2zP6fRImjfq7/n/wFDguUQFNEwugAAAABJRU5ErkJggg==');
  }

  div.callout-important.callout-style-default .callout-title {
    background-color: #f7dddc
  }

  div.callout-warning {
    border-left-color: #f0ad4e !important;
  }

  div.callout-warning .callout-icon::before {
    background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAYAAABzenr0AAAAAXNSR0IArs4c6QAAAERlWElmTU0AKgAAAAgAAYdpAAQAAAABAAAAGgAAAAAAA6ABAAMAAAABAAEAAKACAAQAAAABAAAAIKADAAQAAAABAAAAIAAAAACshmLzAAAETklEQVRYCeVWW2gcVRg+58yaTUnizqbipZeX4uWhBEniBaoUX1Ioze52t7sRq6APio9V9MEaoWlVsFasRq0gltaAPuxms8lu0gcviE/FFOstVbSIxgcv6SU7EZqmdc7v9+9mJtNks51NTUH84ed889/PP+cmxP+d5FIbMJmNbpREu4WUkiTtCicKny0l1pIKmBzovF2S+hIJHX8iEu3hZJ5lNZGqyRrGSIQpq15AzF28jgpeY6yk6GVdrfFqdrD6Iw+QlB8g0YS2g7dyQmXM/IDhBhT0UCiRf59lfqmmDvzRt6kByV/m4JjtzuaujMUM2c5Z2d6JdKrRb3K2q6mA+oYVz8JnDdKPmmNthzkAk/lN63sYPgevrguc72aZX/L9C6x09GYyxBgCX4NlvyGUHOKELlm5rXeR1kchuChJt4SSwyddZRXgvwMGvYo4QSlk3/zkHD8UHxwVJA6zjZZqP8v8kK8OWLnIZtLyCAJagYC4rTGW/9Pqj92N/c+LUaAj27movwbi19tk/whRCIE7Q9vyI6yvRpftAKVTdUjOW40X3h5OXsKCdmFcx0xlLJoSuQngnrJe7Kcjm4OMq9FlC7CMmScQANuNvjfP3PjGXDBaUQmbp296S5L4DrpbrHN1T87ZVEZVCzg1FF0Ft+dKrlLukI+/c9ENo+TvlTDbYFvuKPtQ9+l052rXrgKoWkDAFnvh0wTOmYn8R5f4k/jN/fZiCM1tQx9jQQ4ANhqG4hiL0qIFTGViG9DKB7GYzgubnpofgYRwO+DFjh0Zin2m4b/97EDkXkc+f6xYAPX0KK2I/7fUQuwzuwo/L3AkcjugPNixC8cHf0FyPjWlItmLxWw4Ou9YsQCr5fijMGoD/zpdRy95HRysyXA74MWOnscpO4j2y3HAVisw85hX5+AFBRSHt4ShfLFkIMXTqyKFc46xdzQM6XbAi702a7sy04J0+feReMFKp5q9esYLCqAZYw/k14E/xcLLsFElaornTuJB0svMuJINy8xkIYuL+xPAlWRceH6+HX7THJ0djLUom46zREu7tTkxwmf/FdOZ/sh6Q8qvEAiHpm4PJ4a/doJe0gH1t+aHRgCzOvBvJedEK5OFE5jpm4AGP2a8Dxe3gGJ/pAutug9Gp6he92CsSsWBaEcxGx0FHytmIpuqGkOpldqNYQK8cSoXvd+xLxXADw0kf6UkJNFtdo5MOgaLjiQOQHcn+A6h5NuL2s0qsC2LOM75PcF3yr5STuBSAcGG+meA14K/CI21HcS4LBT6tv0QAh8Dr5l93AhZzG5ZJ4VxAqdZUEl9z7WJ4aN+svMvwHHL21UKTd1mqvChH7/Za5xzXBBKrUcB0TQ+Ulgkfbi/H/YT5EptrGzsEK7tR1B7ln9BBwckYfMiuSqklSznIuoIIOM42MQO+QnduCoFCI0bpkzjCjddHPN/F+2Yu+sd9bKNpVwHhbS3LluK/0zgfwD0xYI5dXuzlQAAAABJRU5ErkJggg==');
  }

  div.callout-warning.callout-style-default .callout-title {
    background-color: #fcefdc
  }

  div.callout-tip {
    border-left-color: #02b875 !important;
  }

  div.callout-tip .callout-icon::before {
    background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAYAAABzenr0AAAAAXNSR0IArs4c6QAAAERlWElmTU0AKgAAAAgAAYdpAAQAAAABAAAAGgAAAAAAA6ABAAMAAAABAAEAAKACAAQAAAABAAAAIKADAAQAAAABAAAAIAAAAACshmLzAAADr0lEQVRYCe1XTWgTQRj9ZjZV8a9SPIkKgj8I1bMHsUWrqYLVg4Ue6v9BwZOxSYsIerFao7UiUryIqJcqgtpimhbBXoSCVxUFe9CTiogUrUp2Pt+3aUI2u5vdNh4dmMzOzHvvezuz8xNFM0mjnbXaNu1MvFWRXkXEyE6aYOYJpdW4IXuA4r0fo8qqSMDBU0v1HJUgVieAXxzCsdE/YJTdFcVIZQNMyhruOMJKXYFoLfIfIvVIMWdsrd+Rpd86ZmyzzjJmLStqRn0v8lzkb4rVIXvnpScOJuAn2ACC65FkPzEdEy4TPWRLJ2h7z4cArXzzaOdKlbOvKKX25Wl00jSnrwVxAg3o4dRxhO13RBSdNvH0xSARv3adTXbBdTf64IWO2vH0LT+cv4GR1DJt+DUItaQogeBX/chhbTBxEiZ6gftlDNXTrvT7co4ub5A6gp9HIcHvzTa46OS5fBeP87Qm0fQkr4FsYgVQ7Qg+ZayaDg9jhg1GkWj8RG6lkeSacrrHgDaxdoBiZPg+NXV/KifMuB6//JmYH4CntVEHy/keA6x4h4CU5oFy8GzrBS18cLJMXcljAKB6INjWsRcuZBWVaS3GDrqB7rdapVIeA+isQ57Eev9eCqzqOa81CY05VLd6SamW2wA2H3SiTbnbSxmzfp7WtKZkqy4mdyAlGx7ennghYf8voqp9cLSgKdqNfa6RdRsAAkPwRuJZNbpByn+RrJi1RXTwdi8RQF6ymDwGMAtZ6TVE+4uoKh+MYkcLsT0Hk8eAienbiGdjJHZTpmNjlbFJNKDVAp2fJlYju6IreQxQ08UJDNYdoLSl6AadO+fFuCQqVMB1NJwPm69T04Wv5WhfcWyfXQB+wXRs1pt+nCknRa0LVzSA/2B+a9+zQJadb7IyyV24YAxKp2Jqs3emZTuNnKxsah+uabKbMk7CbTgJx/zIgQYErIeTKRQ9yD9wxVof5YolPHqaWo7TD6tJlh7jQnK5z2n3+fGdggIOx2kaa2YI9QWarc5Ce1ipNWMKeSG4DysFF52KBmTNMmn5HqCFkwy34rDg05gDwgH3bBi+sgFhN/e8QvRn8kbamCOhgrZ9GJhFDgfcMHzFb6BAtjKpFhzTjwv1KCVuxHvCbsSiEz4CANnj84cwHdFXAbAOJ4LTSAawGWFn5tDhLMYz6nWeU2wJfIhmIJBefcd/A5FWQWGgrWzyORZ3Q6HuV+Jf0Bj+BTX69fm1zWgK7By1YTXchFDORywnfQ7GpzOo6S+qECrsx2ifVQAAAABJRU5ErkJggg==');
  }

  div.callout-tip.callout-style-default .callout-title {
    background-color: #ccf1e3
  }

  div.callout-caution {
    border-left-color: #fd7e14 !important;
  }

  div.callout-caution .callout-icon::before {
    background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAYAAABzenr0AAAAAXNSR0IArs4c6QAAAERlWElmTU0AKgAAAAgAAYdpAAQAAAABAAAAGgAAAAAAA6ABAAMAAAABAAEAAKACAAQAAAABAAAAIKADAAQAAAABAAAAIAAAAACshmLzAAACV0lEQVRYCdVWzWoUQRCuqp2ICBLJXgITZL1EfQDBW/bkzUMUD7klD+ATSHBEfAIfQO+iXsWDxJsHL96EHAwhgzlkg8nBg25XWb0zIb0zs9muYYWkoKeru+vn664fBqElyZNuyh167NXJ8Ut8McjbmEraKHkd7uAnAFku+VWdb3reSmRV8PKSLfZ0Gjn3a6Xlcq9YGb6tADjn+lUfTXtVmaZ1KwBIvFI11rRXlWlatwIAAv2asaa9mlB9wwygiDX26qaw1yYPzFXg2N1GgG0FMF8Oj+VIx7E/03lHx8UhvYyNZLN7BwSPgekXXLribw7w5/c8EF+DBK5idvDVYtEEwMeYefjjLAdEyQ3M9nfOkgnPTEkYU+sxMq0BxNR6jExrAI31H1rzvLEfRIdgcv1XEdj6QTQAS2wtstEALLG1yEZ3QhH6oDX7ExBSFEkFINXH98NTrme5IOaaA7kIfiu2L8A3qhH9zRbukdCqdsA98TdElyeMe5BI8Rs2xHRIsoTSSVFfCFCWGPn9XHb4cdobRIWABNf0add9jakDjQJpJ1bTXOJXnnRXHRf+dNL1ZV1MBRCXhMbaHqGI1JkKIL7+i8uffuP6wVQAzO7+qVEbF6NbS0LJureYcWXUUhH66nLR5rYmva+2tjRFtojkM2aD76HEGAD3tPtKM309FJg5j/K682ywcWJ3PASCcycH/22u+Bh7Aa0ehM2Fu4z0SAE81HF9RkB21c5bEn4Dzw+/qNOyXr3DCTQDMBOdhi4nAgiFDGCinIa2owCEChUwD8qzd03PG+qdW/4fDzjUMcE1ZpIAAAAASUVORK5CYII=');
  }

  div.callout-caution.callout-style-default .callout-title {
    background-color: #ffe5d0
  }

  </style>
  <style type="text/css">
    .reveal div.sourceCode {
      margin: 0;
      overflow: auto;
    }
    .reveal div.hanging-indent {
      margin-left: 1em;
      text-indent: -1em;
    }
    .reveal .slide:not(.center) {
      height: 100%;
      overflow-y: auto;
    }
    .reveal .slide.scrollable {
      overflow-y: auto;
    }
    .reveal .footnotes {
      height: 100%;
      overflow-y: auto;
    }
    .reveal .slide .absolute {
      position: absolute;
      display: block;
    }
    .reveal .footnotes ol {
      counter-reset: ol;
      list-style-type: none;
      margin-left: 0;
    }
    .reveal .footnotes ol li:before {
      counter-increment: ol;
      content: counter(ol) ". ";
    }
    .reveal .footnotes ol li > p:first-child {
      display: inline-block;
    }
    .reveal .slide ul,
    .reveal .slide ol {
      margin-bottom: 0.5em;
    }
    .reveal .slide ul li,
    .reveal .slide ol li {
      margin-top: 0.4em;
      margin-bottom: 0.2em;
    }
    .reveal .slide ul[role="tablist"] li {
      margin-bottom: 0;
    }
    .reveal .slide ul li > *:first-child,
    .reveal .slide ol li > *:first-child {
      margin-block-start: 0;
    }
    .reveal .slide ul li > *:last-child,
    .reveal .slide ol li > *:last-child {
      margin-block-end: 0;
    }
    .reveal .slide .columns:nth-child(3) {
      margin-block-start: 0.8em;
    }
    .reveal blockquote {
      box-shadow: none;
    }
    .reveal .tippy-content>* {
      margin-top: 0.2em;
      margin-bottom: 0.7em;
    }
    .reveal .tippy-content>*:last-child {
      margin-bottom: 0.2em;
    }
    .reveal .slide > img.stretch.quarto-figure-center,
    .reveal .slide > img.r-stretch.quarto-figure-center {
      display: block;
      margin-left: auto;
      margin-right: auto;
    }
    .reveal .slide > img.stretch.quarto-figure-left,
    .reveal .slide > img.r-stretch.quarto-figure-left  {
      display: block;
      margin-left: 0;
      margin-right: auto;
    }
    .reveal .slide > img.stretch.quarto-figure-right,
    .reveal .slide > img.r-stretch.quarto-figure-right  {
      display: block;
      margin-left: auto;
      margin-right: 0;
    }
  </style>
</head>
<body class="quarto-light">
  <div class="reveal">
    <div class="slides">

<section id="title-slide" data-background-position="right" data-background-size="contain" class="quarto-title-block center">
  <h1 class="title">Diagnosing Diseases Using kNN</h1>
  <p class="subtitle">An Application of kNN to Diagnose Diabetes</p>

<div class="quarto-title-authors">
<div class="quarto-title-author">
<div class="quarto-title-author-name">
Elena Boiko, Jacqueline Razo (Advisor: Dr.&nbsp;Cohen)
</div>
</div>
</div>

  <p class="date">2025-04-29</p>
</section>
<section id="introduction" class="slide level2 center">
<h2>Introduction</h2>
<p>In healthcare, kNN has shown promise in predicting chronic diseases like <strong>diabetes</strong> <span class="citation" data-cites="suriya2023type">(<a href="#/references" role="doc-biblioref" onclick="">Suriya and Muthu 2023</a>)</span> and <strong>hypertension</strong> <span class="citation" data-cites="khateeb2017efficient">(<a href="#/references" role="doc-biblioref" onclick="">Khateeb and Usman 2017</a>)</span>.</p>
<p>In this project, we focus on how kNN can be applied and optimized to predict <strong>diabetes</strong>, a critical and growing public health issue.</p>
<aside class="notes">
<p>kNN is a well-known algorithm that’s already been applied in medical research, including disease prediction for conditions like hypertension and diabetes. In this project, we explored how kNN behaves with health data and how we can optimize it to improve predictive accuracy for diabetes.</p>
<style type="text/css">
        span.MJX_Assistive_MathML {
          position:absolute!important;
          clip: rect(1px, 1px, 1px, 1px);
          padding: 1px 0 0 0!important;
          border: 0!important;
          height: 1px!important;
          width: 1px!important;
          overflow: hidden!important;
          display:block!important;
      }</style></aside>
</section>
<section id="why-this-matters" class="slide level2 center">
<h2>Why This Matters</h2>
<p><span class="fragment">- Diabetes affects millions worldwide. Early detection can improve outcomes.</span></p>
<p><span class="fragment">- Machine learning, especially interpretable models like <strong>kNN</strong>, can support diagnosis.</span></p>
<p><span class="fragment">- Our project explores:</span></p>
<p><span class="fragment">- How different <strong>k values</strong>, <strong>distance metrics</strong>, and <strong>preprocessing techniques</strong> affect kNN’s performance.</span></p>
<p><span class="fragment">- Whether kNN is competitive with other models for this task.</span></p>
<aside class="notes">
<p>Diabetes affects over 37 million people in the U.S., and many don’t know they have it. Early detection is critical to avoid complications. Machine learning can help — especially models like kNN that are easy to understand and implement. We tested how different settings, such as the number of neighbors, distance metrics, and preprocessing techniques impact performance. We also compared kNN with models like decision trees and random forests to see how it holds up.</p>
<style type="text/css">
        span.MJX_Assistive_MathML {
          position:absolute!important;
          clip: rect(1px, 1px, 1px, 1px);
          padding: 1px 0 0 0!important;
          border: 0!important;
          height: 1px!important;
          width: 1px!important;
          overflow: hidden!important;
          display:block!important;
      }</style></aside>
</section>
<section id="why-we-chose-knn-for-this-project" class="slide level2">
<h2>Why We Chose kNN for This Project</h2>
<p><span class="fragment">Well-suited for medical datasets with small to medium size</span></p>
<p><span class="fragment">Easy to interpret — great for health professionals</span></p>
<p><span class="fragment">Flexible with minimal assumptions</span></p>
<p><span class="fragment">Can impute missing data and detect patterns</span></p>
<aside class="notes">
<p>We chose kNN because it’s simple, interpretable, and works well on structured health data like surveys. Unlike more complex models, kNN doesn’t make strong assumptions — it just compares similar cases. That makes it more transparent, which is valuable in healthcare, where decision-making needs to be explainable.</p>
<style type="text/css">
        span.MJX_Assistive_MathML {
          position:absolute!important;
          clip: rect(1px, 1px, 1px, 1px);
          padding: 1px 0 0 0!important;
          border: 0!important;
          height: 1px!important;
          width: 1px!important;
          overflow: hidden!important;
          display:block!important;
      }</style></aside>
</section>
<section id="our-approach" class="slide level2 center">
<h2>Our Approach</h2>
<ul>
<li>Clean and preprocess real-world survey data.</li>
<li>Train kNN models with various configurations.</li>
<li>Evaluate performance and compare with tree-based models.</li>
</ul>
<aside class="notes">
<p>Our project followed a structured approach. We used the CDC’s diabetes health indicators dataset, which includes over 250,000 survey responses. After cleaning and preparing the data, we trained multiple versions of kNN by changing key parameters like the number of neighbors and the distance metric. We then compared the best kNN’s performance with decision trees and random forests to see how it performed in a real-world healthcare prediction task.</p>
<style type="text/css">
        span.MJX_Assistive_MathML {
          position:absolute!important;
          clip: rect(1px, 1px, 1px, 1px);
          padding: 1px 0 0 0!important;
          border: 0!important;
          height: 1px!important;
          width: 1px!important;
          overflow: hidden!important;
          display:block!important;
      }</style></aside>
</section>
<section id="method-knn-overview" class="slide level2 center middle">
<h2>Method: kNN Overview</h2>
<p><span class="fragment">k-Nearest Neighbors (kNN) is a <strong>non-parametric, instance-based</strong> learning algorithm</span></p>
<p><span class="fragment">It is a <strong>lazy learner</strong> — no explicit training phase is required</span></p>
<p><span class="fragment">Instead, it classifies new data based on similarity to existing labeled points <span class="citation" data-cites="zhang2016introduction">(<a href="#/references" role="doc-biblioref" onclick="">Zhang 2016</a>)</span></span></p>
</section>
<section class="slide level2">

<h3 id="classification-process"><span class="fragment">Classification Process:</span></h3>
<p><span class="fragment"><strong>1. Distance Calculation:</strong><br>
Measures similarity using metrics like <strong>Euclidean</strong> or <strong>Manhattan</strong> distance</span></p>
<p><span class="fragment"><strong>2. Neighbor Selection:</strong><br>
Hyperparameter <strong>k</strong> defines how many nearby points to consider</span></p>
<p><span class="fragment"><strong>3. Majority Voting:</strong><br>
The most frequent class among the <strong>k nearest neighbors</strong> determines the prediction</span></p>
<aside class="notes">
<p>So, let’s talk about how kNN actually works kNN is known as a lazy learner - it doesn’t train a model in advance. Instead, it stores all the data and makes predictions based on similarity. When a new data point comes in, kNN finds the closest examples from the training set - based on distance - and predicts the most common label among them. This makes it intuitive and highly adaptable, which is why it’s useful in clinical applications where transparency matters.</p>
<p>The process starts by calculating the distance - we used both Euclidean and Manhattan distances in our tests. Then, the model looks at the closest k data points — and this k is something we tune to get better results. Finally, it uses majority voting: whichever class appears most often among the neighbors becomes the prediction.</p>
<style type="text/css">
        span.MJX_Assistive_MathML {
          position:absolute!important;
          clip: rect(1px, 1px, 1px, 1px);
          padding: 1px 0 0 0!important;
          border: 0!important;
          height: 1px!important;
          width: 1px!important;
          overflow: hidden!important;
          display:block!important;
      }</style></aside>
</section>
<section id="distance-calculation" class="slide level2 smaller">
<h2>Distance Calculation:</h2>
<p>kNN identifies the nearest neighbors by calculating distances between points.</p>
<div class="fragment">
<p><strong>Euclidean distance:</strong> <span class="citation" data-cites="theerthagiri2022diagnosis">(<a href="#/references" role="doc-biblioref" onclick="">Theerthagiri, Ruby, and Vidya 2022</a>)</span> <span class="math display">\[
d = \sqrt{(X_2 - X_1)^2 + (Y_2 - Y_1)^2}
\]</span></p>
</div>
<div class="fragment">
<p><strong>Manhattan distance:</strong> <span class="citation" data-cites="aggarwal2015data">(<a href="#/references" role="doc-biblioref" onclick="">Aggarwal et al. 2015</a>)</span> <span class="math display">\[
d = |X_2 - X_1| + |Y_2 - Y_1|
\]</span></p>
</div>
<div style="text-align: center; margin-top: 1em;">
<div class="cell">
<div class="cell-output-display">
<p><img data-src="slides_files/figure-revealjs/unnamed-chunk-1-1.png" width="960"></p>
</div>
</div>
</div>
<aside class="notes">
<p>To find “closeness,” kNN uses distance metrics. We tested two types: The most common is Euclidean distance, which measures straight-line distance, and Manhattan distance, which works like a city grid. Since kNN relies on distance, the choice of metric — along with scaling — can significantly affect results. This diagram shows both — we tested both in our project to compare results.</p>
<style type="text/css">
        span.MJX_Assistive_MathML {
          position:absolute!important;
          clip: rect(1px, 1px, 1px, 1px);
          padding: 1px 0 0 0!important;
          border: 0!important;
          height: 1px!important;
          width: 1px!important;
          overflow: hidden!important;
          display:block!important;
      }</style></aside>
</section>
<section id="classification-process-1" class="slide level2 center middle">
<h2>Classification Process</h2>
<div class="columns">
<div class="column" style="width:50%;">
<p><span class="fragment">The red square represents a data point to be classified. The algorithm selects the 5 nearest neighbors within the green circle—3 hearts and 2 circles. Based on the majority vote, the red square is classified as a heart.</span></p>
</div><div class="column" style="width:40%;">
<div class="quarto-figure quarto-figure-center">
<figure>
<p><img data-src="images/kNN_picture.png"></p>
<figcaption>Figure 1. kNN with k=5</figcaption>
</figure>
</div>
</div>
</div>
<aside class="notes">
<p>Here’s a simple example of how kNN makes a prediction. The red square is a new point. It looks at the 5 nearest neighbors — in this case, 3 hearts and 2 circles. Since hearts are the majority, the red square is predicted as a heart. This simple majority voting process makes kNN easy to understand and explain - a big advantage in medical settings.</p>
<style type="text/css">
        span.MJX_Assistive_MathML {
          position:absolute!important;
          clip: rect(1px, 1px, 1px, 1px);
          padding: 1px 0 0 0!important;
          border: 0!important;
          height: 1px!important;
          width: 1px!important;
          overflow: hidden!important;
          display:block!important;
      }</style></aside>
</section>
<section id="strengths-and-weaknesses-of-knn" class="slide level2 smaller">
<h2>Strengths and Weaknesses of kNN</h2>
<div class="columns">
<div class="column" style="width:50%;">
<h3 id="strengths">Strengths</h3>
<ul>
<li><strong>Simple, intuitive, and non-parametric</strong> — no assumptions about data distribution<br>
</li>
<li><strong>No training phase</strong> — the algorithm learns during prediction<br>
</li>
<li><strong>Performs well</strong> on small to medium datasets, especially when features are well-scaled<br>
</li>
<li><strong>Easy to understand and implement</strong> — ideal for baseline models or educational use</li>
</ul>
</div><div class="column" style="width:50%;">
<h3 id="weaknesses-of-knn">Weaknesses of kNN</h3>
<ul>
<li><strong>Slow prediction time</strong> on large datasets due to distance calculations <span class="citation" data-cites="deng2016efficient">(<a href="#/references" role="doc-biblioref" onclick="">Deng et al. 2016</a>)</span></li>
<li><strong>Sensitive to feature scaling and distance metric choice</strong> <span class="citation" data-cites="uddin2022comparative">(<a href="#/references" role="doc-biblioref" onclick="">Uddin et al. 2022</a>)</span></li>
<li><strong>Choosing the right ‘k’</strong> is critical — too low or high can reduce performance<br>
</li>
<li><strong>Affected by irrelevant or correlated features</strong>, which may distort neighbor similarity</li>
</ul>
</div>
</div>
<aside class="notes">
<p>Here’s a quick summary of what makes kNN useful, and what challenges come with it. It’s simple, doesn’t need training, and works well with smaller datasets when features are properly scaled. That’s why it’s often used for quick testing or in educational settings. But it can struggle with large datasets or noisy features. It’s also sensitive to scaling, and choosing the right k value is really important. That’s why preprocessing and tuning are so important - especially in healthcare, where accuracy and fairness matter.</p>
<style type="text/css">
        span.MJX_Assistive_MathML {
          position:absolute!important;
          clip: rect(1px, 1px, 1px, 1px);
          padding: 1px 0 0 0!important;
          border: 0!important;
          height: 1px!important;
          width: 1px!important;
          overflow: hidden!important;
          display:block!important;
      }</style></aside>
</section>
<section id="analysis-and-results" class="slide level2">
<h2>Analysis and Results</h2>
<h3 class="smaller" id="data-source-and-collection">Data Source and Collection</h3>
<p><strong>Data Source:</strong> <a href="https://archive.ics.uci.edu/dataset/891/cdc+diabetes+health+indicators.">CDC Diabetes Health Indicators</a></p>
<p>Collected via the CDC’s Behavioral Risk Factor Surveillance System (BRFSS)</p>
<p>Dataset contains 253,680 survey responses</p>
<p>Covers 21 features: demographics, lifestyle, healthcare, and health history</p>
<p><strong>Target:</strong> Diabetes_binary</p>
<p>(0 = No diabetes, 1 = Diabetes/Prediabetes)</p>
<aside class="notes">
<p>For this project, we used real survey data from the CDC’s BRFSS program. The dataset includes over 250,000 adult responses and covers a wide range of features like age, BMI, physical activity, and general health. Our target was a binary variable indicating diabetes or prediabetes. The dataset’s size and feature variety made it an ideal test case for evaluating how kNN behaves in a real-world health context.</p>
<style type="text/css">
        span.MJX_Assistive_MathML {
          position:absolute!important;
          clip: rect(1px, 1px, 1px, 1px);
          padding: 1px 0 0 0!important;
          border: 0!important;
          height: 1px!important;
          width: 1px!important;
          overflow: hidden!important;
          display:block!important;
      }</style></aside>
</section>
<section id="data-challenges" class="slide level2 smaller">
<h2>Data Challenges</h2>
<p><strong>Data Quality</strong><br>
<span class="fragment">No missing values</span><br>
<span class="fragment">24,206 duplicate rows detected</span></p>
<p><strong>Outliers &amp; Scaling Sensitivity</strong><br>
<span class="fragment">BMI, MentHlth, PhysHlth had extreme values</span><br>
<span class="fragment">kNN is highly sensitive to scale</span></p>
<p><strong>Feature Relationships</strong><br>
<span class="fragment">No multicollinearity (r &lt; 0.5)</span><br>
<span class="fragment">All features retained for now</span></p>
<p><strong>Early Insight</strong><br>
<span class="fragment">Higher BMI in diabetic cases, but overlapping range</span><br>
<span class="fragment">Used as a predictor along with other features</span></p>
<aside class="notes">
<p>Before modeling, we looked closely at the raw dataset. There were no missing values, but nearly 10% of the data was duplicated - those could bias the model if not removed. BMI and other health features showed outliers, and because kNN uses distance, that’s a problem - large values can dominate. We also checked for correlation but didn’t find any features that were too closely related. One early pattern we noticed: people with diabetes generally had higher BMI, but it wasn’t only factor.</p>
<style type="text/css">
        span.MJX_Assistive_MathML {
          position:absolute!important;
          clip: rect(1px, 1px, 1px, 1px);
          padding: 1px 0 0 0!important;
          border: 0!important;
          height: 1px!important;
          width: 1px!important;
          overflow: hidden!important;
          display:block!important;
      }</style></aside>
</section>
<section id="class-distribution" class="slide level2 smaller">
<h2>Class Distribution:</h2>
<h3 class="smaller" id="diabetes-class-imbalance">Diabetes Class Imbalance</h3>
<div class="columns">
<div class="column" style="width:50%;">
<p><strong>Key Points</strong><br>
<span class="fragment">Significant class imbalance observed</span><br>
<span class="fragment">Majority class: No Diabetes (0) – <strong>86.07%</strong></span><br>
<span class="fragment">Minority class: Diabetes (1) – <strong>13.93%</strong></span></p>
<p><strong>Impact on Modeling</strong><br>
<span class="fragment">Imbalance can bias predictions</span><br>
<span class="fragment">Models may underpredict diabetes cases</span></p>
</div><div class="column" style="width:40%;">
<div class="cell">
<div class="cell-output-display">
<p><img data-src="slides_files/figure-revealjs/unnamed-chunk-2-1.png" width="864"></p>
</div>
</div>
</div>
</div>
<aside class="notes">
<p>Another major challenge was class imbalance. About 86% of cases were non-diabetic, and only 14% were diabetic or prediabetic. This imbalance can cause models to favor the majority class and miss early diabetes cases. We addressed this in preprocessing.</p>
<style type="text/css">
        span.MJX_Assistive_MathML {
          position:absolute!important;
          clip: rect(1px, 1px, 1px, 1px);
          padding: 1px 0 0 0!important;
          border: 0!important;
          height: 1px!important;
          width: 1px!important;
          overflow: hidden!important;
          display:block!important;
      }</style></aside>
</section>
<section id="preparing-the-data" class="slide level2 smaller">
<h2>Preparing the Data</h2>
<ul>
<li><p><strong>Removed 24,206 duplicate rows</strong><br>
<span class="fragment">Diabetic class increased from 13.9% → 15.3%</span></p></li>
<li><p><strong>Kept ordinal features as numeric</strong><br>
<span class="fragment">Age, Education, Income, and GenHlth retained due to natural ordering</span></p></li>
<li><p><strong>Scaled Features with Outliers</strong><br>
<span class="fragment">BMI, MentHlth, PhysHlth scaled with StandardScaler &amp; Robustscaler</span></p></li>
<li><p><strong>Handled class imbalance</strong><br>
<span class="fragment">Applied SMOTE to generate synthetic diabetic samples</span></p></li>
</ul>
<p>➡️ Final dataset: <strong>clean, scaled, and balanced</strong></p>
<aside class="notes">
<p>To get the data ready, we removed duplicates and kept ordinal features like age, education, and income as numeric. The variables BMI, MentHlth, and PhysHlth were standardized using StandardScaler or RobustScaler to ensure equal contribution during distance calculations, a critical aspect for kNN’s accuracy. To fix the class imbalance, we used SMOTE, which creates synthetic diabetic cases to help the model learn both classes equally. After these steps, the dataset was clean, scaled, and balanced — ready for training our kNN models.</p>
<style type="text/css">
        span.MJX_Assistive_MathML {
          position:absolute!important;
          clip: rect(1px, 1px, 1px, 1px);
          padding: 1px 0 0 0!important;
          border: 0!important;
          height: 1px!important;
          width: 1px!important;
          overflow: hidden!important;
          display:block!important;
      }</style></aside>
</section>
<section id="knn-model-setup" class="slide level2 smaller">
<h2>kNN Model Setup</h2>
<ul>
<li>Explored different <strong>k values</strong>: 5, 10, 15<br>
</li>
<li>Compared <strong>distance metrics</strong>: Euclidean vs.&nbsp;Manhattan<br>
</li>
<li>Evaluated <strong>weighting methods</strong>: uniform vs.&nbsp;distance<br>
</li>
<li>Tested multiple <strong>scaling techniques</strong><br>
</li>
<li>Included variations with <strong>SMOTE</strong> and <strong>Feature Selection</strong></li>
</ul>
<div style="overflow-x: auto; font-size: 90%">
<table style="width:100%;">
<colgroup>
<col style="width: 16%">
<col style="width: 16%">
<col style="width: 16%">
<col style="width: 16%">
<col style="width: 16%">
<col style="width: 16%">
</colgroup>
<thead>
<tr class="header">
<th>Model</th>
<th>k</th>
<th>Distance</th>
<th>Weights</th>
<th>Scaler</th>
<th>SMOTE</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>kNN 1</td>
<td>5</td>
<td>Euclidean (p=2)</td>
<td>Uniform</td>
<td>StandardScaler</td>
<td>No</td>
</tr>
<tr class="even">
<td>kNN 2</td>
<td>15</td>
<td>Manhattan (p=1)</td>
<td>Distance</td>
<td>RobustScaler</td>
<td>No</td>
</tr>
<tr class="odd">
<td>kNN 3</td>
<td>10</td>
<td>Euclidean (p=2)</td>
<td>Uniform</td>
<td>StandardScaler</td>
<td>Yes</td>
</tr>
<tr class="even">
<td>kNN 4</td>
<td>15</td>
<td>Euclidean (p=2)</td>
<td>Distance</td>
<td>StandardScaler</td>
<td>Yes (Feature Selection)</td>
</tr>
</tbody>
</table>
</div>
<aside class="notes">
<p>To better understand kNN’s behavior, we designed four model variations. We changed the number of neighbors, distance type, and weighting method. We also experimented with different scaling techniques and applied SMOTE to handle class imbalance. In one version, we also used feature selection to reduce dimensional noise. The final model - kNN 4 - combined all these enhancements and delivered the strongest performance overall.</p>
<style type="text/css">
        span.MJX_Assistive_MathML {
          position:absolute!important;
          clip: rect(1px, 1px, 1px, 1px);
          padding: 1px 0 0 0!important;
          border: 0!important;
          height: 1px!important;
          width: 1px!important;
          overflow: hidden!important;
          display:block!important;
      }</style></aside>
</section>
<section id="performance-of-knn-variants" class="slide level2 smaller">
<h2>Performance of kNN Variants</h2>
<h4 class="no-title" id="table-3-performance-comparison-of-knn-models">Table 3: Performance Comparison of kNN Models</h4>
<div style="overflow-x: auto; font-size: 90%">
<table>
<colgroup>
<col style="width: 9%">
<col style="width: 9%">
<col style="width: 9%">
<col style="width: 9%">
<col style="width: 9%">
<col style="width: 9%">
<col style="width: 9%">
<col style="width: 9%">
<col style="width: 9%">
<col style="width: 9%">
<col style="width: 9%">
</colgroup>
<thead>
<tr class="header">
<th>Model</th>
<th>k</th>
<th>Distance</th>
<th>Weights</th>
<th>Scaler</th>
<th>SMOTE</th>
<th>Accuracy</th>
<th>ROC_AUC</th>
<th>Precision_1</th>
<th>Recall_1</th>
<th>F1_1</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>kNN 1</td>
<td>5</td>
<td>Euclidean (p=2)</td>
<td>Uniform</td>
<td>StandardScaler</td>
<td>No</td>
<td>0.83</td>
<td>0.70</td>
<td>0.41</td>
<td>0.21</td>
<td>0.27</td>
</tr>
<tr class="even">
<td>kNN 2</td>
<td>15</td>
<td>Manhattan (p=1)</td>
<td>Distance</td>
<td>RobustScaler</td>
<td>No</td>
<td>0.84</td>
<td>0.75</td>
<td>0.45</td>
<td>0.16</td>
<td>0.23</td>
</tr>
<tr class="odd">
<td>kNN 3</td>
<td>10</td>
<td>Euclidean (p=2)</td>
<td>Uniform</td>
<td>StandardScaler</td>
<td>Yes</td>
<td>0.69</td>
<td>0.73</td>
<td>0.28</td>
<td>0.64</td>
<td>0.39</td>
</tr>
<tr class="even">
<td>kNN 4</td>
<td>15</td>
<td>Euclidean (p=2)</td>
<td>Distance</td>
<td>StandardScaler</td>
<td>Yes (FS)</td>
<td>0.78</td>
<td>0.88</td>
<td>0.73</td>
<td>0.88</td>
<td>0.80</td>
</tr>
</tbody>
</table>
</div>
<ul>
<li><p><strong>Best configuration: kNN 4</strong><br>
<span class="fragment">k = 15, Euclidean distance, distance weighting</span><br>
<span class="fragment">StandardScaler, SMOTE, and feature selection</span></p></li>
<li><p><strong>Highest Weighted F1 Score: 0.80</strong><br>
<span class="fragment">Achieved recall = 0.88, precision = 0.73</span></p></li>
<li><p>🩺 Most effective at identifying diabetic class (1)</p></li>
</ul>
<aside class="notes">
<p>Table 3 shows the results for each kNN model. As you can see, model performance varies significantly depending on configuration. For instance, KNN 1 and 2 without SMOTE had higher accuracy but poor recall - meaning they missed a lot of diabetic cases. KNN 4, which combined SMOTE and feature selection, offered the best balance - best F1 score of 0.80 and a recall of 0.88 - especially for minority class detection. That’s why we selected kNN 4 as the final model - it was the best at identifying diabetic patients fairly and consistently.</p>
<style type="text/css">
        span.MJX_Assistive_MathML {
          position:absolute!important;
          clip: rect(1px, 1px, 1px, 1px);
          padding: 1px 0 0 0!important;
          border: 0!important;
          height: 1px!important;
          width: 1px!important;
          overflow: hidden!important;
          display:block!important;
      }</style></aside>
</section>
<section id="comparing-knn-with-tree-models" class="slide level2 smaller">
<h2>Comparing kNN with Tree Models</h2>
<h4 class="no-title" id="table-4-best-knn-vs.-tree-based-models">Table 4: Best kNN vs.&nbsp;Tree-Based Models</h4>
<div style="overflow-x: auto; font-size: 90%">
<table style="width:100%;">
<colgroup>
<col style="width: 21%">
<col style="width: 10%">
<col style="width: 14%">
<col style="width: 12%">
<col style="width: 18%">
<col style="width: 14%">
<col style="width: 8%">
</colgroup>
<thead>
<tr class="header">
<th>Model</th>
<th>SMOTE</th>
<th>Accuracy</th>
<th>ROC_AUC</th>
<th>Precision_1</th>
<th>Recall_1</th>
<th>F1_1</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>KNN</td>
<td>Yes</td>
<td>0.78</td>
<td>0.88</td>
<td>0.73</td>
<td>0.88</td>
<td>0.80</td>
</tr>
<tr class="even">
<td>Decision Tree</td>
<td>Yes</td>
<td>0.72</td>
<td>0.80</td>
<td>0.70</td>
<td>0.78</td>
<td>0.74</td>
</tr>
<tr class="odd">
<td>Decision Tree</td>
<td>No</td>
<td>0.86</td>
<td>0.81</td>
<td>0.52</td>
<td>0.15</td>
<td>0.24</td>
</tr>
<tr class="even">
<td>Random Forest</td>
<td>No</td>
<td>0.87</td>
<td>0.82</td>
<td>0.59</td>
<td>0.13</td>
<td>0.21</td>
</tr>
</tbody>
</table>
</div>
<p><span class="fragment">- <strong>kNN achieved the highest F1 score</strong> (0.80) with strong recall on the diabetic class<br>
</span></p>
<p><span class="fragment">- <strong>Decision Tree with SMOTE</strong> performed comparably but slightly lower on F1<br>
</span></p>
<p><span class="fragment">- <strong>Random Forest had highest accuracy</strong>, but <strong>poor recall</strong> (0.13) shows it struggled to detect diabetic cases<br>
</span></p>
<p>Tree-based models offer interpretability, but may need tuning or resampling for minority detection</p>
<aside class="notes">
<p>We also compared the best kNN model to Decision Trees and Random Forests. Random Forest had the highest accuracy — but very poor recall. It missed most diabetic cases. The Decision Tree with SMOTE did better, but it still couldn’t match kNN’s balance of precision and recall. Our tuned kNN outperformed both in detecting the minority class, making it a stronger choice for disease prediction.</p>
<style type="text/css">
        span.MJX_Assistive_MathML {
          position:absolute!important;
          clip: rect(1px, 1px, 1px, 1px);
          padding: 1px 0 0 0!important;
          border: 0!important;
          height: 1px!important;
          width: 1px!important;
          overflow: hidden!important;
          display:block!important;
      }</style></aside>
</section>
<section id="roc_auc-curves-comparison" class="slide level2 smaller">
<h2>ROC_AUC curves comparison</h2>
<p>This plot compares the ROC curves for all four models.<br>
kNN with Feature Selection performs best (AUC = 0.88), followed by Random Forest.</p>

<img data-src="slides_files/roc_curve.png" class="r-stretch quarto-figure-center"><p class="caption">ROC Curve</p><aside class="notes">
<p>This ROC curve shows how well each model separates diabetic from non-diabetic cases. Our best kNN model, with SMOTE and feature selection, had the highest AUC of 0.88, meaning it balanced true positives and false positives better than the others. Random Forest and Decision Tree performed reasonably well, but neither matched kNN in class separation. This supports the idea that a well-tuned kNN model can be both accurate and clinically effective.</p>
<style type="text/css">
        span.MJX_Assistive_MathML {
          position:absolute!important;
          clip: rect(1px, 1px, 1px, 1px);
          padding: 1px 0 0 0!important;
          border: 0!important;
          height: 1px!important;
          width: 1px!important;
          overflow: hidden!important;
          display:block!important;
      }</style></aside>
</section>
<section id="conclusion" class="slide level2 smaller">
<h2>Conclusion</h2>
<ul>
<li><p>This project demonstrated that the <strong>k-Nearest Neighbors (kNN)</strong> algorithm can be an effective tool for disease prediction when properly tuned and supported by strong preprocessing.<br>
</p></li>
<li><p>Despite its simplicity, kNN achieved competitive results through careful configuration — including scaling, handling class imbalance, and feature selection.<br>
</p></li>
<li><p>Its interpretability, flexibility, and performance make it a practical choice in healthcare settings, where fairness and transparency are essential.<br>
</p></li>
<li><p>Ultimately, this work highlights how even basic algorithms, when thoughtfully applied, can deliver meaningful insights in real-world medical data.<br>
</p></li>
</ul>
<aside class="notes">
<p>In conclusion, our study shows that kNN is a strong candidate for disease prediction, especially when transparency and recall are priorities. With thoughtful tuning, scaling, and SMOTE, kNN outperformed tree-based models in F1 score and minority class detection. Despite being simple, it handled the diabetes prediction task very well - especially after reducing dimensional noise and balancing the data.</p>
<style type="text/css">
        span.MJX_Assistive_MathML {
          position:absolute!important;