You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.rst
+15-2Lines changed: 15 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -76,7 +76,7 @@ If you use the package, please consider citing the following paper:
76
76
.. code-block:: BibTex
77
77
78
78
@misc{fazekas2023testing,
79
-
title={Testing the Consistency of Performance Scores Reported for Binary Classification Problems},
79
+
title={Testing the Consistency of Performance Scores Reported for Binary Classification Problems},
80
80
author={Attila Fazekas and György Kovács},
81
81
year={2023},
82
82
eprint={2310.12527},
@@ -159,6 +159,8 @@ A simple binary classification testset consisting of ``p`` positive samples (usu
159
159
160
160
testset = {"p": 10, "n": 20}
161
161
162
+
We note that alternative notations, like using ``n_positive``, ``n_minority`` or ``n_1`` instead of ``p`` and similarly, ``n_negative``, ``n_majority`` and ``n_0`` instead of ``n`` are supported.
163
+
162
164
One can also specify a commonly used dataset by its name and the package will look up the ``p`` and ``n`` counts of the datasets from its internal registry (based on the representations in the ``common-datasets`` package):
163
165
164
166
.. code-block:: Python
@@ -261,7 +263,18 @@ Depending on the experimental setup, the consistency tests developed for binary
261
263
* prevalence threshold (``pt``),
262
264
* diagnostic odds ratio (``dor``),
263
265
* Jaccard index (``ji``),
264
-
* Cohen's kappa (``kappa``)
266
+
* Cohen's kappa (``kappa``).
267
+
268
+
We note that synonyms and full names are also supported, for example:
269
+
270
+
* alternatives to ``sens`` are ``sensitivity``, ``true_positive_rate``, ``tpr`` and ``recall``,
271
+
* alternatives to ``spec`` are ``specificity``, ``true_negative_rate``, ``tnr`` and ``selectivity``,
272
+
* alternatives to ``ppv`` are ``positive_predictive_value`` and ``precision``.
273
+
274
+
Similarly, complements are supported as:
275
+
276
+
* one can specify ``false_positive_rate`` or ``fpr`` as a complement of ``spec``,
277
+
* and similarly, ``false_negative_rate`` or ``fnr`` as a complement of ``sens``.
265
278
266
279
The tests are designed to detect inconsistencies. If the resulting ``inconsistency`` flag is ``False``, the scores can still be calculated in non-standard ways. However, **if the resulting ``inconsistency`` flag is ``True``, it conclusively indicates that inconsistencies are detected, and the reported scores could not be the outcome of the presumed experiment**.
Copy file name to clipboardExpand all lines: docs/01a_requirements.rst
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,6 +26,8 @@ A simple binary classification testset consisting of ``p`` positive samples (usu
26
26
27
27
testset = {"p": 10, "n": 20}
28
28
29
+
We note that alternative notations, like using ``n_positive``, ``n_minority`` or ``n_1`` instead of ``p`` and similarly, ``n_negative``, ``n_majority`` and ``n_0`` instead of ``n`` are supported.
30
+
29
31
One can also specify a commonly used dataset by its name and the package will look up the ``p`` and ``n`` counts of the datasets from its internal registry (based on the representations in the ``common-datasets`` package):
Copy file name to clipboardExpand all lines: docs/01c_consistency_checking.rst
+12-1Lines changed: 12 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,7 +24,18 @@ Depending on the experimental setup, the consistency tests developed for binary
24
24
* prevalence threshold (``pt``),
25
25
* diagnostic odds ratio (``dor``),
26
26
* Jaccard index (``ji``),
27
-
* Cohen's kappa (``kappa``)
27
+
* Cohen's kappa (``kappa``).
28
+
29
+
We note that synonyms and full names are also supported, for example:
30
+
31
+
* alternatives to ``sens`` are ``sensitivity``, ``true_positive_rate``, ``tpr`` and ``recall``,
32
+
* alternatives to ``spec`` are ``specificity``, ``true_negative_rate``, ``tnr`` and ``selectivity``,
33
+
* alternatives to ``ppv`` are ``positive_predictive_value`` and ``precision``.
34
+
35
+
Similarly, complements are supported as:
36
+
37
+
* one can specify ``false_positive_rate`` or ``fpr`` as a complement of ``spec``,
38
+
* and similarly, ``false_negative_rate`` or ``fnr`` as a complement of ``sens``.
28
39
29
40
The tests are designed to detect inconsistencies. If the resulting ``inconsistency`` flag is ``False``, the scores can still be calculated in non-standard ways. However, **if the resulting ``inconsistency`` flag is ``True``, it conclusively indicates that inconsistencies are detected, and the reported scores could not be the outcome of the presumed experiment**.
0 commit comments