Revise spam detection metrics in README

KOKOSde · web-flow · commit 16c2aa1bfe45 · 2026-02-10T01:06:10.000-07:00
Updated spam detection section with detailed metrics for various systems including PII, Toxicity, Prompt Injection, Spam, NSFW Text, and NSFW Image.
diff --git a/README.md b/README.md
@@ -30,12 +30,15 @@ Benchmarked using [CHI 2025 "Lost in Moderation"](https://arxiv.org/html/2503.01
 | Amazon Comprehend | 0.74 | Commercial |
 | Perspective API | 0.62 | Commercial |
 
-### Spam Detection
-
-| System | Balanced Accuracy | Dataset |
-|--------|------------------|---------|
-| **LocalMod** | **0.998** | [UCI SMS Spam Collection](https://archive.ics.uci.edu/ml/datasets/sms+spam+collection) |
 
+| Feature | System | Precision | Recall | F1 | FP | FN | n | Dataset | Eval accuracy (balanced acc) |
+|---|---|---:|---:|---:|---:|---:|---:|---|---:|
+| **PII** | LocalMod | 1.0000 | 1.0000 | 1.0000 | 0 | 0 | 2000 | `synthetic_pii_v1` (balanced) | 1.0000 |
+| **Toxicity** | LocalMod | 0.6007 | 0.8373 | 0.6973 | 1213 | 355 | 5924 | HateXplain (1924) + Civil Comments (2000) + SBIC (2000); **macro-avg** P/R/F1, summed FP/FN | 0.6500 |
+| **Prompt Injection** | LocalMod | 0.9324 | 0.8525 | 0.8907 | 65 | 155 | 2101 | `S-Labs/prompt-injection-dataset` (test, threshold=0.10) | 0.8953 |
+| **Spam** | LocalMod | 0.9861 | 1.0000 | 0.9930 | 1 | 0 | 500 | `ucirvine/sms_spam` (train) | 0.9988 |
+| **NSFW Text** | LocalMod | 0.6034 | 0.9533 | 0.7390 | 188 | 14 | 600 | Proxy: `Maxx0/Texting_sex` (NSFW) vs `ag_news` (SFW), balanced (threshold=0.60) | 0.6633 |
+| **NSFW Image** | LocalMod | 1.0000 | 0.9500 | 0.9744 | 0 | 2 | 80 | Proxy: `x1101/nsfw` (pos) vs `cifar10` (neg), balanced | 0.9750 |
 ---
 
 ## Installation