TrueLearnAI.github.io/blogs.html at main · TrueLearnAI/TrueLearnAI.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
  <meta charset="utf-8" />
  <meta name="generator" content="pandoc" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
  <title>TrueLearn Blogs</title>
  <style>
    code{white-space: pre-wrap;}
    span.smallcaps{font-variant: small-caps;}
    div.columns{display: flex; gap: min(4vw, 1.5em);}
    div.column{flex: auto; overflow-x: auto;}
    div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
    ul.task-list{list-style: none;}
    ul.task-list li input[type="checkbox"] {
      width: 0.8em;
      margin: 0 0.8em 0.2em -1.6em;
      vertical-align: middle;
    }
    .display.math{display: block; text-align: center; margin: 0.5rem auto;}
    /* CSS for syntax highlighting */
    pre > code.sourceCode { white-space: pre; position: relative; }
    pre > code.sourceCode > span { display: inline-block; line-height: 1.25; }
    pre > code.sourceCode > span:empty { height: 1.2em; }
    .sourceCode { overflow: visible; }
    code.sourceCode > span { color: inherit; text-decoration: inherit; }
    div.sourceCode { margin: 1em 0; }
    pre.sourceCode { margin: 0; }
    @media screen {
    div.sourceCode { overflow: auto; }
    }
    @media print {
    pre > code.sourceCode { white-space: pre-wrap; }
    pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
    }
    pre.numberSource code
      { counter-reset: source-line 0; }
    pre.numberSource code > span
      { position: relative; left: -4em; counter-increment: source-line; }
    pre.numberSource code > span > a:first-child::before
      { content: counter(source-line);
        position: relative; left: -1em; text-align: right; vertical-align: baseline;
        border: none; display: inline-block;
        -webkit-touch-callout: none; -webkit-user-select: none;
        -khtml-user-select: none; -moz-user-select: none;
        -ms-user-select: none; user-select: none;
        padding: 0 4px; width: 4em;
        color: #aaaaaa;
      }
    pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
    div.sourceCode
      {   }
    @media screen {
    pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
    }
    code span.al { color: #ff0000; font-weight: bold; } /* Alert */
    code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
    code span.at { color: #7d9029; } /* Attribute */
    code span.bn { color: #40a070; } /* BaseN */
    code span.bu { color: #008000; } /* BuiltIn */
    code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
    code span.ch { color: #4070a0; } /* Char */
    code span.cn { color: #880000; } /* Constant */
    code span.co { color: #60a0b0; font-style: italic; } /* Comment */
    code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
    code span.do { color: #ba2121; font-style: italic; } /* Documentation */
    code span.dt { color: #902000; } /* DataType */
    code span.dv { color: #40a070; } /* DecVal */
    code span.er { color: #ff0000; font-weight: bold; } /* Error */
    code span.ex { } /* Extension */
    code span.fl { color: #40a070; } /* Float */
    code span.fu { color: #06287e; } /* Function */
    code span.im { color: #008000; font-weight: bold; } /* Import */
    code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
    code span.kw { color: #007020; font-weight: bold; } /* Keyword */
    code span.op { color: #666666; } /* Operator */
    code span.ot { color: #007020; } /* Other */
    code span.pp { color: #bc7a00; } /* Preprocessor */
    code span.sc { color: #4070a0; } /* SpecialChar */
    code span.ss { color: #bb6688; } /* SpecialString */
    code span.st { color: #4070a0; } /* String */
    code span.va { color: #19177c; } /* Variable */
    code span.vs { color: #4070a0; } /* VerbatimString */
    code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
  </style>
  <link rel="stylesheet" href="./assets/css/blog.css" />
  <!--[if lt IE 9]>
    <script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
  <![endif]-->
</head>
<body>
<header id="title-block-header">
<h1 class="title">TrueLearn Blogs</h1>
</header>
<h2 id="week-1-2">Week 1-2</h2>
<p>10 October 2022</p>
<h3 id="organization">Organization</h3>
<p>During the first week, the team members introduced themselves to each
other and began discussing our project's content. Our project focused on
developing a python library TrueLearn from the available algorithm logic
and creating a dashboard to visualize the parameters of the
algorithm.</p>
<h3 id="hci-human-computer-interaction">HCI (Human Computer
Interaction)</h3>
<p>We were introduced to the concept of Human-Computer Interaction
(HCI). We spent two weeks learning about:</p>
<ul>
<li>How to gather user requirements</li>
<li>How to represent our target customers from the gathered
requirements</li>
<li>How to design a product using sketches</li>
<li>How to refine sketches to generate a prototype</li>
<li>How to evaluate and iterate on our designs</li>
</ul>
<h3 id="background-information">Background Information</h3>
<p>To learn more about our project, we did some research on the project
initiators and the project itself.</p>
<p>We went through <a href="https://k4all.org/">Knowledge4All
website</a>, learned about their mission and some of their ongoing
projects, and discovered some of their products such as <a
href="http://videolectures.net/">videolectures.net</a> and their
relationship with <a href="https://www.x5gon.org/">x5gon
organization</a>.</p>
<p>We then investigated what the family of the TrueLearn algorithm is.
We found out that it is a set of algorithms that builds a knowledge
model of the user from implicit data generated by the users. The
knowledge model built from the library can be an important part of an
educational recommendation system.</p>
<p>These background studies gave us some high-level overviews of the
project.</p>
<h2 id="week-3-4">Week 3-4</h2>
<p>24 October 2022</p>
<h3 id="gathering-user-requirement">Gathering User Requirement</h3>
<p>As we are building a library and a list of different visualizations
around the library, we decided to focus on the visualizations side of
the project for our HCI assignment which requires us to “design and
evaluate a prototype for [our] software system.”</p>
<p>In the first week, we spoke to our potential users to better
understand their needs. We discussed some of their requirements, their
experiences and pain points of using the existing platform, and their
expectations of the new platform. Based on these interviews, we
collected two sample responses from our targeted users (students and
teachers) and put the responses in our assignment.</p>
<h3 id="persona">Persona</h3>
<p>Based on the gathered requirement, we try to represent our users more
systematically by identifying their goals, motivations, pain points, and
characteristics. Using the attributes that we identified, we created
personas for teachers and students and conceptualized how they would use
our products in some scenarios.</p>
<p><img src="./assets/img/blogs/w3-persona-1.jpg" /></p>
<p><img src="./assets/img/blogs/w3-persona-2.jpg" /></p>
<p><img src="./assets/img/blogs/w3-scenario.png" /></p>
<h3 id="sketches-and-iterations">Sketches and Iterations</h3>
<p>In response to the pain points, we gathered from our users, we
started working on the design of our sketches. We worked on two versions
of the design, analyzed the strengths and weaknesses of our design based
on the feedback collected from users and improved them.</p>
<p><img src="./assets/img/blogs/w4-hci-sketch-1.jpg" /></p>
<p><img src="./assets/img/blogs/w4-hci-sketch-2.jpg" /></p>
<h3 id="first-meeting-with-the-client">First meeting with the
client</h3>
<p>Before the first meeting, we put together a list of questions to ask
the client, covering the following areas:</p>
<ul>
<li>Background information about the TrueLearn paper</li>
<li>Python Library: APIs, licenses, potential users</li>
<li>Visualizations: what we should visualize, how we should implement
it, potential users</li>
</ul>
<p>After we had completed our meetings with the client, we had a basic
understanding of the following concepts:</p>
<ul>
<li><p>The functionalities, input, and output of the TrueLearn
algorithms</p></li>
<li><p>How the input data is collected and pre-processed</p></li>
<li><p>Background information about the Python library and its potential
users</p></li>
<li><p>What kind of visualization needs to be created and who are the
users of the visualization</p></li>
</ul>
<h2 id="week-5-6">Week 5-6</h2>
<p>7 November 2022</p>
<h3 id="prototyping">Prototyping</h3>
<p>After we had finished our sketches, we started making our prototype
of the visualization dashboard. We used Balsamiq to draw our prototype
and separated our prototype into two parts: prototype for students and
prototype for teachers.</p>
<p><img src="./assets/img/blogs/w6-hci-prototype-student.jpg" /></p>
<p><img src="./assets/img/blogs/w6-hci-prototype-teacher.jpg" /></p>
<h3 id="evaluations-and-iterations">Evaluations and Iterations</h3>
<p>Considering our limited amount of time after making our prototype, we
chose to perform heuristic evaluations based on <a
href="https://www.nngroup.com/articles/ten-usability-heuristics/">10
Usability Heuristics for User Interface Design</a> proposed by Jakob
Nielsen. Based on the 10 metrics, we identified the following problems
in our prototype:</p>
<table>
<colgroup>
<col style="width: 0%" />
<col style="width: 10%" />
<col style="width: 38%" />
<col style="width: 47%" />
<col style="width: 2%" />
</colgroup>
<thead>
<tr class="header">
<th>#</th>
<th>Heuristic</th>
<th>Problem</th>
<th>Solution</th>
<th>Severity</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>1</td>
<td>Visibility of system status</td>
<td>There is no text indicating the meaning of each diagram on the
analytics page for both teacher and student.</td>
<td>Each diagram in analytics should be grouped into categories with a
subtitle indicating their meaning and text explaining what they are
showing.</td>
<td>4</td>
</tr>
<tr class="even">
<td>2</td>
<td>Visibility of system status</td>
<td>It is not clear to the user how to open the various sub-sections of
the “My Profile” and “My Content” in the sidebar.</td>
<td>Add an icon to indicate that the sub-sections can be opened by
clicking the icon.</td>
<td>2</td>
</tr>
<tr class="odd">
<td>3</td>
<td>Visibility of system status</td>
<td>Courses in the video uploaded tab should be labelled with name.</td>
<td>Add the course name below each picture.</td>
<td>2</td>
</tr>
<tr class="even">
<td>4</td>
<td>Aesthetic and minimalist design</td>
<td>The topics in the home and history pages are scattered over many
rows at the top.</td>
<td>Topics can be grouped in a carousel so that they are not spread over
multiple rows.</td>
<td>1</td>
</tr>
<tr class="odd">
<td>5</td>
<td>Consistency and standards</td>
<td>In the teacher’s dashboard, there are two “Analytics” which might be
confusing for the user.</td>
<td>Rename user profile “analytics” to “my progress” and rename teacher
“analytics” to Content insights.</td>
<td>2</td>
</tr>
</tbody>
</table>
<p>We immediately performed a round of iterations on the prototype that
solved the above problem.</p>
<p><img src="./assets/img/blogs/w6-hci-improved-prototype.jpg" /></p>
<h3 id="meeting-and-literature-review">Meeting and Literature
Review</h3>
<p>During these two weeks, we had a second meeting with the client. This
meeting focused on how we write our literature reviews and understand
the technical details of the library and visualizations.</p>
<p>For the literature review, after a discussion with the client, we
decided to split the whole report into two parts: python library and
visualizations, and to research similar projects and technologies for
each part.</p>
<p>For the python library, we briefly discussed the mechanism behind the
TrueLearn algorithm, Bayesian knowledge tracing. At the same time, the
client told us that we could go and learn how <a
href="https://github.com/CAHLR/pyBKT">pyBKT</a>, a project that is like
TrueLearn is implemented.</p>
<p>For the visualizations, the client introduced us to the open learner
model, encouraging us to read about what types of visualizations are
available and how each motivates learners. In addition, the client
envisaged us building some dynamic visualizations via <a
href="https://reactjs.org/">React</a>, as this could be easily
integrated into their existing video platform <a
href="https://x5learn.org/">x5learn</a>. He also presented some
libraries for building the visualizations, including <a
href="https://d3js.org/">D3.js</a>.</p>
<h2 id="week-7-8">Week 7-8</h2>
<p>21 November 2022</p>
<h3 id="literature-review-tools">Literature Review: tools</h3>
<p>This fortnight we have been investigating the tools needed to develop
python libraries and visualizations.</p>
<p>From the perspective of building a library that is easy for
developers to use and learn, we believe that the python library
should:</p>
<ul>
<li>have thorough documentation for each method</li>
<li>be tested by unit tests</li>
<li>be properly analysed and formatted by linters and formatters</li>
</ul>
<p>For each of these objectives, we have investigated the tools
available. For the documentation, we looked at <a
href="https://www.sphinx-doc.org/en/master/">Sphinx</a>, <a
href="https://pdoc3.github.io/pdoc/doc/pdoc/#gsc.tab=0">pdoc</a>, and <a
href="https://pydoctor.readthedocs.io/en/latest/">pydoctor</a>,
comparing them based on their functionalities, ease of use, and the UI
(User Interface) for the output and finally choosing Sphinx as our
documentation generation tool. For testing, we focused on unit testing,
doc testing, and generating test coverage. Our motivation to deploy
these testing methods drives us to select <a
href="https://pytest.org/">pytest</a> as our testing framework as it is
easy to use and supports additional features via plugin systems and
coverage.py as our test coverage report generator as it is way more
powerful than its alternative <a
href="https://docs.python.org/3/library/trace.html">trace</a>. Regarding
linters and formatters, we ended up choosing <a
href="https://prospector.landscape.io/en/master/index.html">Prospector</a>
over <a href="https://pypi.org/project/pylint/">PyLint</a> and <a
href="https://flake8.pycqa.org/en/latest/">Flake8</a> because it
integrates the most publicly available python linters and supports
out-of-the-box use.</p>
<p>From the point of view of our target users, we believe that
visualization should provide the richest possible information in both a
static and dynamic way. So, we looked at static and dynamic
visualization separately. We ended up choosing several libraries for
comparison:</p>
<table style="width:100%;">
<colgroup>
<col style="width: 16%" />
<col style="width: 72%" />
<col style="width: 10%" />
</colgroup>
<thead>
<tr class="header">
<th><strong>WHERE/VISUALIZATION TYPES</strong></th>
<th><strong>Static</strong></th>
<th><strong>Dynamic</strong></th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Front-end</strong></td>
<td>D3.js and <a href="https://www.chartjs.org/">Chart.js</a></td>
<td>D3.js and Chart.js</td>
</tr>
<tr class="even">
<td><strong>Back-end</strong></td>
<td><a href="https://matplotlib.org/">matplotlib</a>, <a
href="https://seaborn.pydata.org/">seaborn</a>, <a
href="https://github.com/plotly/plotly.py">plotly</a></td>
<td>plotly</td>
</tr>
</tbody>
</table>
<p>In terms of ease of development, Plotly is the winner at the end of
the day, as it supports us in both static and dynamic visual generation
on the back end. So, we decided to choose it as the library for the
back-end visualization. For the front end, Chart.js is a bit weaker than
D3.js, despite its ability to integrate better with React. So, from a
functional point of view, we decided to use D3.js for the development of
the front-end visualization.</p>
<p>We were also informed by our client during this week’s meeting that
the the library should be extensible, meaning that it is easy for other
developers to add more features to the library and integrate the library
into their system. He suggested we read the paper <a
href="https://arxiv.org/abs/1309.0238">API</a> design for machine
learning software: experiences from the scikit-learn project written by
the developers of <a
href="https://github.com/scikit-learn/scikit-learn">scikit-learn</a>.</p>
<h3 id="refine-requirements">Refine requirements</h3>
<p>After the last meeting with the client, we started to build a MoSCoW
list to refine some of the requirements for our project. We divided the
requirements into functional requirements and non-functional
requirements. Inside functional requirements, we listed different
requirements for</p>
<ol type="1">
<li>Python Library</li>
<li>Documentation</li>
<li>Testing</li>
<li>Licensing</li>
<li>Visualizations</li>
</ol>
<p>Inside non-functional requirements, we focused on usability,
compatibility, maintainability, and performance as we need to ensure our
library and visualization are accessible and can be easily used by the
targeted users.</p>
<p>We decided to have a conversation with the client next week and
finalize the requirements at the end of this term.</p>
<h3 id="gant-charts">Gant Charts</h3>
<p>During these two weeks, we also had preliminary planning of the
project and completed the first two parts of the Gant chart:</p>
<p><img src="./assets/img/blogs/w8-gantt.png" /></p>
<h2 id="week-9-10">Week 9-10</h2>
<p>5 December 2022</p>
<h3 id="finalize-requirements">Finalize Requirements</h3>
<p>During these two weeks, we discussed our requirements with our
client, who gave us some advice, such as:</p>
<ul>
<li>Adding descriptions to generated visualizations.</li>
<li>Removing the ability to show related topics in visualizations as it
is beyond the scope of the project.</li>
<li>Provide filtering based on skills for the generated visualizations
in frontend.</li>
</ul>
<p>Following these suggestions, we finalize our requirements and publish
the requirement on the project website.</p>
<h3 id="literature-review-design">Literature Review: design</h3>
<p>After the last meeting, we read about how scikit-learn designed their
API and used this as a basis for designing TrueLearn's API. We expect
our API to follow the same principles used in scikit-learn:</p>
<ul>
<li>Consistency: “All objects share a consistent interface composed of a
limited set of methods.”</li>
<li>Inspection: “Constructor parameters and parameter values determined
by learning algorithms are stored and exposed as public
attributes.”</li>
<li>Non-proliferation of classes: “Learning algorithms are the only
objects to be represented using custom classes. Datasets are represented
as NumPy arrays or SciPy sparse matrices. Hyper-parameter names and
values are represented as standard Python strings or numbers whenever
possible.”</li>
<li>Composition: “Whenever feasible, meta-algorithms parametrized on
other algorithms are implemented and composed from the existing building
blocks.”</li>
<li>Sensible defaults: “Whenever an operation requires a user-deﬁned
parameter, an appropriate default value is deﬁned by the library.”</li>
</ul>
<p>Implementing these principles in scikit-learn relies on the use of
the estimator, predictor, and transformer interfaces, which we decided
to deploy in the TrueLearn library. This design allows us to reduce the
learning cost of the user and makes TrueLearn easier to extend, maintain
and use.</p>
<p>Combining our API design with the final requirements, we finalized
our system design as the following:</p>
<p><img src="./assets/img/blogs/w10-system-design.png" /></p>
<h2 id="week-11-12">Week 11-12</h2>
<p>9 January 2023</p>
<h3 id="ci">CI</h3>
<p>After agreeing on the project's criteria and deliverables in the
previous term the focus has now shifted to the development of the
TrueLearn library. The project can be split into two distinct
components:</p>
<ul>
<li>The first is the refactoring of the existing code into an intuitive
and well-designed API.</li>
<li>The second is the generation of the necessary data structures to
visualise key components such as the perceived skill level of the user.
This additionally would involve the use of static visualisation tools to
output this information in a human-readable format, with the ability to
generate dynamic visualisations if the project constraints allow.</li>
</ul>
<p>After gathering the necessary files for the project, we began to
start developing our workflow to make development more efficient. This
namely involved setting up GitHub actions to automatically run certain
tests and checks on our repositories code. The checks that we wanted
were three:</p>
<ol type="1">
<li>Static analysis: ensure that our code follows good programming
practices.</li>
<li>Unit testing: verify the correctness of our API.</li>
<li>Code coverage: indicate how well we have written our unit tests to
cover all cases.</li>
</ol>
<p>All the above have been set up to run on certain triggers (i.e.,
events), and the reports produced are integrated directly into GitHub
for easy access.</p>
<p><img src="./assets/img/blogs/w11-truelearn-homepage.png" /></p>
<p><strong><em>Check status is automatically updated and available on
the repository's ‘homepage.’</em></strong></p>
<p><img src="./assets/img/blogs/w11-ci-examples.png" /></p>
<p><strong><em>Errors are formatted using GitHub checks
API</em></strong></p>
<h3 id="design">Design</h3>
<p>Moving forward, we aim to make our proposed plan concrete, detailing
how we would refactor the existing code. One element we have discussed
is the use of interfaces to define shared behaviour between the AI
models which are used to provide these recommendations. However, this
would constrain future development of the project to use this specific
interface. Another approach used already by an existing learner-focused
AI library scikit-learn is to use the programming paradigm of duck
typing. This more flexible approach allows developers to add
functionalities to the model without worrying about interface
constraints.</p>
<p>In terms of the visualisations, we plan to determine what the 3 key
visualisations are and what data structures we can use to represent that
data that we would like to model.</p>
<p>In terms of general design, we have proposed a first version of the
'truelearn' package structure and structured it as shown below:</p>
<ul>
<li><code>bayesian_models</code>: contains all the classifiers that we
need to implement: knowledge, novel, interest, and INK (meta
classifier)</li>
<li><code>preprocessing</code>: contains the Wikifier code that uses
Wikifier API to extract top-n topics from some given texts</li>
<li><code>unit_tests</code>: contains all the unit tests of the
package</li>
<li><code>visualisations</code>: contains the visualization code</li>
</ul>
<h3 id="elevator-pitch">Elevator Pitch</h3>
<p>In preparation for the upcoming elevator pitch, the group prepared
two online meetings to discuss the script, design the PowerPoint, and
rehearse. We ended up with the following design.</p>
<p><img src="./assets/img/blogs/w12-elevator-slide-1.jpeg" /></p>
<p><img src="./assets/img/blogs/w12-elevator-slide-2.jpeg" /></p>
<p><img src="./assets/img/blogs/w12-elevator-slide-3.jpeg" /></p>
<h2 id="week-13-14">Week 13-14</h2>
<p>23 January 2023</p>
<h3 id="design-package-structure">Design (Package Structure)</h3>
<p>Before we started implementing the library, we had another meeting
with the client to discuss how to structure truelearn as we felt that
the current design was too simple to accommodate some of the “could
have” features we wanted to implement.</p>
<p>In terms of library structure, we discuss several points during the
meeting:</p>
<ul>
<li>Naming
<ul>
<li>use <code>learning</code> instead of
<code>bayesian_models</code></li>
<li>create a separate sub-package for the implementation of
<code>models</code> which contain the user model and event model.</li>
</ul></li>
<li>Classifier
<ul>
<li>We could implement the baseline classifiers so that it’s easy to run
experiments presented in the TrueLearn paper via our library</li>
</ul></li>
<li>Dependencies
<ul>
<li>NumPy, trueskill, mpmath, sklearn</li>
<li>pytest (This is not a dependency of our package. It’s only a part of
test dependency.)</li>
</ul></li>
<li>User Models
<ul>
<li>Design a Topic class that includes id, description about the
topic </li>
<li>User Model includes a dictionary mapping Topic =&gt; mean and
variance, and stores the weights/dynamic factors used in the training
process</li>
</ul></li>
<li>Visualizations
<ul>
<li>Finalized visualizations: Bar charts, Line charts, Pie charts.
Possibly Bubble Charts.</li>
<li>Cosine similarity is used to determine the parameters and
uncertainty is used to display visualization (For instance, by changing
the colour of the chart) </li>
<li>Library to use: Matplotlib x Plotly</li>
</ul></li>
</ul>
<p>As a result of the design considerations above, we finalized our
package structure:</p>
<ul>
<li><code>truelearn/learning</code>: contains all the classifiers.</li>
<li><code>truelearn/models</code>: contains the implementation of the
learner model and event model.</li>
<li><code>truelearn/preprocessing</code>: contains the pre-processing
function, such as wikifier.</li>
<li><code>truelearn/util</code>: contains some utility
sub-packages.</li>
<li><code>metrics</code>: contains methods to calculate precision,
accuracy, recall, and f1 score.
<ul>
<li><code>visualizations</code>: contains methods to visualize learner
models. It supports bar charts, line charts, pie charts, bubble charts,
etc.</li>
</ul></li>
<li><code>truelearn/tests</code>: contains unit tests for each package
shown above.</li>
</ul>
<p>You can refer to PR <strong>#5</strong>, <strong>#8</strong> for more
details.</p>
<h3 id="baseline-classifier">Baseline Classifier</h3>
<p>We start our first step of refactoring at <strong>#9</strong>.</p>
<p>In this PR, we implement the first three baseline classifiers
presented in the TrueLearn paper: EngageClassifier (always predict that
learner will engage with the given event), PersistentClassifier (predict
based on the last label), MajorityClassifier (predict engagement if the
number of engagements is greater than the number of
non-engagements).</p>
<p>To make our API easy to use, we add type hints to our methods return
type and plan to add type hints to the parameter after we finalized the
implementation of the learner model. We encountered some problems when
we added type hints for the return value of the instance methods, as we
needed to annotate the return type to the class itself (<code>fit</code>
method). We initially used the quotation as a workaround to wrap the
return type to support Python 3.6+. However, after a discussion with the
client, we decided only to support Python 3.7+ as Python 3.6 is
end-of-life, which allows us to import something from
<code>__future__</code> to resolve this problem.</p>
<div class="sourceCode" id="cb1"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> __future__ <span class="im">import</span> annotations</span></code></pre></div>
<h3 id="model">Model</h3>
<p>We then started a long journey of exploring the implementation of the
learner model.</p>
<p>Initially, we are switching back and forth between two different
implementations that use the four classes: <code>Topic</code>,
<code>KnowledgeComponent</code>, <code>Knowledge</code>, and
<code>LearnerModel</code>.</p>
<p>You can refer to <strong>#10</strong> for more details. We discuss
how we should implement <code>__repr__</code> and <code>__hash__</code>
of <code>Topic</code>, how we should store the mapping (should we use
<code>Topic</code> or other ways of mapping), how different ways of
mapping potentially affect the usability of other components in our
library (visualization).</p>
<p>After some discussions among the team and the client, we decided to
switch to the following design:</p>
<ul>
<li><code>AbstractKnowledgeComponent</code> defines an abstract
interface that can be inherited by developers to implement their
knowledge components.</li>
<li><code>KnowledgeComponent</code> represents a knowledge component in
the learning process. It contains information about the mean, variance,
title, description and URL of the knowledge component.</li>
<li><code>Knowledge</code> stores a dictionary mapping a
<code>Hashable</code> type (e.g. topic id) to
<code>KnowledgeComponent</code>.</li>
<li><code>LearnerModel</code> stores the <code>Knowledge</code>.</li>
</ul>
<h3 id="knowledge-classifier">Knowledge Classifier</h3>
<p>Based on the learner model, we implement our first version of
<code>KnowledgeClassifier</code> and fix all the type hints for the
parameters. Now, <code>x</code> in <code>fit</code> and
<code>predict</code> is of type <code>Knowledge</code>.</p>
<p>Now, all the classifiers have the following public APIs:</p>
<ul>
<li><code>fit(x: Knowledge, y: bool)</code>: train the classifier by
using the knowledge of the learnable unit (x) and a label (y) that
indicates whether the learner engages with the learnable unit.</li>
<li><code>predict(x: Knowledge)</code>: predict (output
<code>True/False</code>) whether the learner will engage with the
learnable unit represented by the knowledge.</li>
<li><code>predict_proba(x: Knowledge)</code>: predict (output
probability between 0-1) whether the learner will engage with the
learnable unit represented by the knowledge.</li>
<li><code>get_params()</code>: return the parameters of the classifier
in the dictionary (name =&gt; value)
<ul>
<li><p>For <code>EngageClassifier</code>, this returns an empty
dictionary.</p></li>
<li><p>For <code>PersistClassifier</code>, this returns a dictionary
with only one key-value pair, storing whether the learner engages with
the last <code>Knowledge</code>.</p></li>
<li><p>For <code>MajorityClassifier</code>, this returns a dictionary
with two key-value pairs, storing the number of engagements and
non-engagements.</p></li>
<li><p>For <code>KnowledgeClassifier</code>, this returns a dictionary
like this:</p>
<div class="sourceCode" id="cb2"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>{</span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>    <span class="st">&quot;threshold&quot;</span>: <span class="va">self</span>.__threshold,</span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>    <span class="st">&quot;init_skill&quot;</span>: <span class="va">self</span>.__init_skill,</span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a>    <span class="st">&quot;def_var&quot;</span>: <span class="va">self</span>.__def_var,</span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a>    <span class="st">&quot;beta&quot;</span>: <span class="va">self</span>.__beta,</span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a>    <span class="st">&quot;positive_only&quot;</span>: <span class="va">self</span>.__positive_only</span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div></li>
</ul></li>
</ul>
<p>A note from the future: currently, you may notice
<code>KnowledgeClassifier</code> is not very customizable and powerful
(e.g. lack of public methods to set parameters). However, we will
gradually enhance these APIs and add more public methods as we move
towards implementing all of the classifiers.</p>
<h2 id="week-15-16">Week 15-16</h2>
<p>6 February 2023</p>
<h3 id="preprocessing-wikifier">Preprocessing (Wikifier)</h3>
<p>While implementing the model and classifier, we also started to
implement the wikifier API.</p>
<p>The main functionality of the <code>Wikifier</code> is to request API
provided by <a href="https://wikifier.org/">Wikifier</a> and convert the
returned JSON to a list of topics represented by a dictionary containing
keys like title, URL, cosine, PageRank and id.</p>
<p>As we need to parse the JSON and convert it to our data structure, we
need a library to load it. For this, we experiment with the Python
standard <code>json</code> library, <a
href="https://github.com/ultrajson/ultrajson">UltraJSON</a>, <a
href="https://github.com/ijl/orjson">orjson</a> and <a
href="https://github.com/python-rapidjson/python-rapidjson">python-rapidjson</a>.
Based on our experimentation, orjson is the best among these four, about
10x faster than <code>json</code> in Python standard library.</p>
<h3 id="classifier">Classifier</h3>
<p>In <strong>#13</strong>, we implement all the classifiers, finishing
10+ listed tasks.</p>
<blockquote>
<p>This request aims to implement NoveltyClassifier, InterestClassifier
and INKClassifier and to refactor the structure of the library.</p>
<p>The following steps are required to complete this pull request:</p>
<ul class="task-list">
<li><input type="checkbox" checked="" />Augment LearnerModel: include
engagement data to help the impl of draw probability. Currently, the
impl is incorrect.</li>
<li><input type="checkbox" checked="" />Augment
Abstract/KnowledgeComponent: include timestamps for the impl of
interests.</li>
<li><input type="checkbox" checked="" />Create an EventModel =&gt;
record the knowledge representation of the learnable unit and the
timestamp when the learning event happens (useful in interests)</li>
<li><input type="checkbox" checked="" />Create BaseClassifier (define
abstract methods) and common base class for Knowledge, Novelty, and
Interest Classifier as they share many helper methods</li>
<li><input type="checkbox" checked="" />Fix the draw probability
implementation</li>
<li><input type="checkbox" checked="" />Implement NoveltyClassifier</li>
<li><input type="checkbox" checked="" />Implement
InterestClassifier</li>
<li><input type="checkbox" checked="" />Utilize BaseClassifier for type
checking</li>
<li><input type="checkbox" checked="" />Implement INKClassifier</li>
</ul>
<p>There are some non-functional refactorings to make our library
better:</p>
<ul class="task-list">
<li><input type="checkbox" checked="" />Use <code>@dataclass</code> to
implement <code>LearnerModel</code> and <code>EventModel</code></li>
<li><input type="checkbox" checked="" />Extract some methods from the
class hierarchy and make them free functions
(e.g. <code>team_sum_quality</code>,
<code>select_topic_kc_pairs</code>). These methods are not closely
related to the internal state of the classifier, nor are they part of
the classifier’s behaviour.</li>
<li><input type="checkbox" checked="" />Remove the default argument in
<code>InterestNoveltyKnowledgeBaseClassifier</code>. The default should
be set in the base class.</li>
<li><input type="checkbox" checked="" />Use keyword arguments to make
AbstractKnowledgeComponent difficult to use incorrectly</li>
<li><input type="checkbox" checked="" />Include the
<code>typing_extension</code> package (its support of types like
<code>Self</code> and <code>Final</code> is beneficial to the library
impl) and rewrite some of the type hints</li>
<li><input type="checkbox" checked="" />Remove
<code>AbstractKnowledge</code> and only keep
<code>AbstractKnowledgeComponent</code>.</li>
<li><input type="checkbox" checked="" />Switch to google style docstring
as it’s easier to write and read. (Don’t need to write
<code>-------</code> and Don’t need to maintain the type
information.)</li>
</ul>
</blockquote>
<p>In summary, the main achievements of this PR include:</p>
<ul>
<li><code>truelearn.learning</code>
<ul>
<li>Implement <code>NoveltyClassifier</code>,
<code>InterestClassifier</code> and <code>INKClassifier</code>.</li>
<li>Define a class <code>BaseClassifier</code> and implement type-based
constraint checking <code>validate_params()</code>,
<code>get_params()</code> and <code>set_params()</code> in it</li>
<li>Define a class <code>InterestNoveltyKnowledgeBaseClassifier</code>
that implements the shared methods used by the three classifiers.</li>
</ul></li>
<li><code>truelearn.models</code>
<ul>
<li>Add timestamp to the knowledge components as we need to use time in
<code>InterestClassifier</code> and <code>INKClassifier</code>.</li>
<li>Replace the <code>Knowledge</code> that represents the learnable
unit with an <code>EventModel</code> that models a learning event. In
the event model, we store the timestamp when the event happened. This
timestamp is used in <code>InterestClassifier</code> and
<code>INKClassifier</code>.</li>
</ul></li>
<li>Others
<ul>
<li>Use <code>@dataclass</code></li>
<li>Switch from numpy style docstring to google style docstring as the
latter is more concise (we don’t need to mark the type twice).</li>
<li>Add <code>typing_extension</code> to our dependencies as it brings
more type from later Python versions to Python 3.7
(i.e. <code>Self</code> and <code>Final</code>).</li>
<li>Add <a href="https://github.com/psf/black">black</a> formatter. We
now have a formatter that keeps our codebase consistent style.</li>
<li>Enable mypy and bandit linters, which allows us to do more strict
type checking and security checking.</li>
<li>Fix some Python 3.7 compatibility issues caused by the usage of
<code>|</code>, <code>tuple</code> and <code>dict</code>.</li>
</ul></li>
</ul>
<p>There are also some discussions around the type of hints and
private/public variables. You could find them in the PR.</p>
<h3 id="metrics">Metrics</h3>
<p>We implement the following metrics:</p>
<ul>
<li><code>get_precision_score</code></li>
<li><code>get_recall_score</code></li>
<li><code>get_accuracy_score</code></li>
<li><code>get_f1_score</code></li>
</ul>
<p>These functions are implemented by importing and re-exporting the
scikit-learn library. We have included it as part of the
<code>truelearn</code> package because we envisage the need to add more
metrics here in the future.</p>
<h3 id="datasets">Datasets</h3>
<p>As we progressed on classifiers and models, we felt we needed to
provide some ways for developers to experiment with different
classifiers.</p>
<p>We intend to provide APIs that mimic <a
href="https://scikit-learn.org/stable/datasets/toy_dataset.html">those</a>
in scikit-learn. The dataset we use is PEEK-Dataset, which is described
in this <a href="https://arxiv.org/abs/2109.03154">paper</a> and <a
href="https://github.com/sahanbull/PEEK-Dataset">hosted</a> here.</p>
<p>We provide two methods <code>load_peek_dataset()</code> and
<code>load_peek_dataset_raw()</code> to load the PEEK dataset in
parsed/raw format.</p>
<p>To implement these two methods, we initially wanted to include these
datasets inside our package, like some of the datasets in scikit-learn.
However, we soon realised that this was not feasible as the dataset was
over 30M. Including this non-essential resource would have inflated our
package and taken the users longer to download it.</p>
<p>We have therefore implemented a basic downloader that can download
PEEK datasets and validate their sha256sum. Users can use it to download
datasets as needed. It also provides a cache ability: when you call
<code>load_peek_dataset/_raw()</code> multiple times, the data will only
be downloaded once.</p>
<p>When implementing the <code>truelearn.datasets</code>, we also made a
PR to the upstream PEEK-Dataset. The motivation for this PR is to add
some additional information (i.e. title and description) for each topic.
Adding this information to the mapping gives us more choices when
implementing the visualization.</p>
<p>We implemented a crawler using Python to fetch the title from the URL
in the PEEK-dataset mapping and then use the fetched title to request
another Wikipedia API
(<code>https://en.wikipedia.org/w/rest.php/v1/search/title?q={title}&amp;limit=10</code>)
to get the relevant description.</p>
<p>We processed 30366 URLs in the dataset and discovered something
interesting about Wikipedia and the first version of the dataset. We
will present them briefly below:</p>
<ul>
<li><p>The <code>limit</code> in the API that provides the description
is not simply a limit on the length of the data. Sometimes, you will
find that the first results of <code>limit=1</code> and
<code>limit=10</code> are different, and the latter is more accurate.
This is probably related to some voting algorithm (like KNN).</p></li>
<li><p>In Wikipedia, many topics lack descriptions.</p></li>
<li><p>Wikipedia and Wikimedia, though both hosted by the Wikimedia
Foundation, provide different descriptions of the same topics.</p></li>
<li><p>Some topics (their id shown below) are deleted from Wikipedia for
various reasons (lack of evidence, promotion, etc.):</p>
<div class="sourceCode" id="cb3"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>broken_links <span class="op">=</span> [</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>    <span class="st">&quot;1256&quot;</span>,</span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a>    <span class="st">&quot;3203&quot;</span>,</span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a>    <span class="st">&quot;4924&quot;</span>,</span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a>    <span class="st">&quot;6057&quot;</span>,</span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a>    <span class="st">&quot;8543&quot;</span>,</span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a>    <span class="st">&quot;13172&quot;</span>,</span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a>    <span class="st">&quot;16347&quot;</span>,</span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a>    <span class="st">&quot;20258&quot;</span>,</span>
<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a>    <span class="st">&quot;25968&quot;</span>,</span>
<span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a>]</span></code></pre></div></li>
</ul>
<p>From the future: after the upstream merged the <a
href="https://github.com/sahanbull/PEEK-Dataset/pull/1">changes</a>, we
utilized the title and description in our implementation
(<strong>#27</strong>).</p>
<h3 id="doc">Doc</h3>
<p>As we already have the docstrings in the source files, we started to
implement some CIs to build documentation automatically for each new
commit/release.</p>
<p>We used Sphinx to automatically generate documentation based on some
pre-generated templates and the docstrings in the source files. In
<strong>#14</strong>, we set up a basic template at
<code>docs</code>.</p>
<p>Later, after discussing with the client, we decided to switch to the
state-of-art documentation-hosting platform <code>readthedocs</code> and
set up configurations in <code>readthedocs.yaml</code>. (This is still
Work In Progress).</p>
<h3 id="ci-1">CI</h3>
<p>Based on the discussion <strong>#32</strong>, we decided to make
Github Annotations available only when the tests/linting failed, which
reduces the number of checks shown when all the tests are
successful.</p>
<p>Also, CI caching is enabled for static analysis, unit testing and
code coverage. It reduces the time to install the dependencies every
time we run these actions.</p>
<p><code>prospector.yml</code> is slightly adjusted to reflect the
changes in <strong>#13</strong>, where we implemented the classifiers.
We limit the line length to 88 and exclude some checkings in
<code>pydocstyle</code> because we switched to Google docstrings.</p>
<h3 id="others">Others</h3>
<p>In <strong>#16</strong>, we made the following changes related to
packaging and project structure:</p>
<ul>
<li>Build and publish on PyPi for new releases</li>
<li>Replace <code>requirements.txt</code> and <code>setup.py</code> with
<code>pyproject.toml</code></li>
<li>Change README.md to README.rst and adjust syntax accordingly</li>
</ul>
<h2 id="week-17-18">Week 17-18</h2>
<p>27 February 2023</p>
<h3 id="datasets-1">Datasets</h3>
<p>As mentioned before, we raised a PR to the upstream of the PEEK
Dataset, and it was merged. In week 17, we applied the upstream changes
to truelearn. Now, truelearn can provide titles and descriptions that
match the Wikipedia topic id.</p>
<h3 id="refactor">Refactor</h3>
<p>With most of the implementation completed, we gradually started to
refactor the existing code.</p>
<p>We started by replacing the <code>ABC</code> abstract classes in
<code>truelearn.models</code> with <code>Protocols</code>, which makes
our library more extensible: developers implementing their own
<code>KnowledgeComponent</code> don’t necessarily need to inherit from
our <code>AbstractKnowledgeComponent</code>; they just need to implement
the <code>AbstractKnowledgeComponent</code> APIs. This is the benefit of
duck typing. (To be precise, this is explicit duck typing since we
define our API explicitly via Protocol).</p>
<p>Second, to serve time-related visualizations, we designed the
<code>HistoryAwareKnowledgeComponent</code>, which inherits from the
<code>KnowledgeComponent</code> and can store previous updates into a
history buffer.</p>
<p>In addition to the changes to <code>truelearn.models</code>, we
formally decided to remove <code>truelearn.utils.persistent</code>
because:</p>
<ul>
<li>Most persistence methods, such as <code>pickle.dump</code>, require
only one line of code from the user to save the class locally.
Therefore, there is no need for truelearn to encapsulate these functions
in its subpackage.</li>
<li>We also refer to the sklearn implementation. They also leave the
responsibility of saving and loading the model to the user, providing
examples of saving using <code>pickle</code>, <code>joblib</code> and
<code>skops</code>.</li>
</ul>
<h3 id="testing">Testing</h3>
<p>To improve the usability and reliability of the library, we have
added examples to public classes/methods of <code>models</code>,
<code>learning</code> and <code>datasets</code> in <strong>#27</strong>.
These examples can be used not only to quickly help users understand how
to use the class and methods but also as a test to ensure that our
implementation of the class provides consistent and accurate values.</p>
<p>In addition to the doctests embedded in the docstrings, we started
implementing unit tests for all the classes and methods in the truelearn
package.</p>
<p>In the process of implementing the unit test, we experimented with
several advanced features of <code>pytest</code>:</p>
<ul>
<li>fixture: allows us to share test data, <code>capsys</code> (to
capture <code>stdout</code>), <code>monkeypatch</code> (to patch some
standard libraries, allowing us to simulate exceptional situations).
These fixtures allow us to implement unit tests more concisely.</li>
<li>extension: <code>pytest_socket</code> (allows us to simulate network
disconnections for testing); <code>pytest_cov</code> (allows us to write
tests targeting uncovered code).</li>
</ul>
<h3 id="visualizations">Visualizations</h3>
<p>After implementing all the classifiers and merging the upstream
updates of datasets, we started exploring how to use the datasets and
the existing classifiers to generate visualizations.</p>
<p>At the beginning of the project, we had some simple ideas for
visualization:</p>
<ul>
<li>Line chart: It can be used to show how the MEAN of a particular
learner’s KC changes over time. Also, in this line chart, we can use
several lines to label the results from different classifiers.</li>
<li>Bar chart: used to visualize multiple KCs. The height represents the
mean, and the colour represents the variance.</li>
<li>Pie chart: used to visualize the learner’s knowledge representation,
with a pop-up to show the mean, variance, title and description of the
knowledge.</li>
</ul>
<p>Through our later literature review, user studies, and discussions
with the client, we have added some ideas:</p>
<ul>
<li>Bar chart: prefer shade to colour. This is more friendly to
colour-blind people as ultimately, these visualizations are used to help
learners to understand their learning status.</li>
<li>Rose pie chart: We can use a rose pie chart. We define the distance
from the arc to the centre of the circle as mean, the shade as variance,
and the angle of each slice is proportional to the number of times the
learner has encountered that kc.</li>
<li>Bubble chart: The knowledge is represented by multiple points (KCs).
The size of the point (KC) is the mean, and the shade of the point is
the variance.</li>
<li>Word cloud: The knowledge of the learner is displayed in the form of
a word cloud. The size of the word (KC) is the mean of that KC.</li>
</ul>
<p>To implement so many different types of visualisations, we define the
following classes:</p>
<ul>
<li><code>BasePlotter</code>: defines the API that each type of plot
needs to implement (the following is the public API)
<ul>
<li><code>plot</code>: plot something to the figure</li>
<li><code>show</code>: show the image in a newly opened window</li>
<li>`to_jpg/png/html: convert the image to the corresponding format</li>
</ul></li>
<li>Based on this class, we define different plotters, including but not
limited to: <code>LinePlotter</code>, <code>PiePlotter</code>,
<code>RosePlotter</code>, <code>BarPlotter</code>, and
<code>WordPlotter</code>…</li>
</ul>
<p>To use these plotters, the user only needs to pass the
<code>Knowledge</code> in <code>LearnerModel</code> to the
<code>plot</code> function.</p>
<h3 id="ci-2">CI</h3>
<p>In <strong>#25</strong>, <strong>#34</strong>, <strong>#38</strong>
we optimized the workflow of our CI:</p>
<ul>
<li>Completely replaced <code>requirements.txt</code> and
<code>setup.py</code> with <code>pyproject.toml</code>.</li>
<li>Improve integration with GitHub: when ci fails, an annotation is
generated. We can easily see from Github which part of the code fails