Skip to content

Commit 16c2aa1

Browse files
authored
Revise spam detection metrics in README
Updated spam detection section with detailed metrics for various systems including PII, Toxicity, Prompt Injection, Spam, NSFW Text, and NSFW Image.
1 parent 3a61032 commit 16c2aa1

1 file changed

Lines changed: 8 additions & 5 deletions

File tree

README.md

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -30,12 +30,15 @@ Benchmarked using [CHI 2025 "Lost in Moderation"](https://arxiv.org/html/2503.01
3030
| Amazon Comprehend | 0.74 | Commercial |
3131
| Perspective API | 0.62 | Commercial |
3232

33-
### Spam Detection
34-
35-
| System | Balanced Accuracy | Dataset |
36-
|--------|------------------|---------|
37-
| **LocalMod** | **0.998** | [UCI SMS Spam Collection](https://archive.ics.uci.edu/ml/datasets/sms+spam+collection) |
3833

34+
| Feature | System | Precision | Recall | F1 | FP | FN | n | Dataset | Eval accuracy (balanced acc) |
35+
|---|---|---:|---:|---:|---:|---:|---:|---|---:|
36+
| **PII** | LocalMod | 1.0000 | 1.0000 | 1.0000 | 0 | 0 | 2000 | `synthetic_pii_v1` (balanced) | 1.0000 |
37+
| **Toxicity** | LocalMod | 0.6007 | 0.8373 | 0.6973 | 1213 | 355 | 5924 | HateXplain (1924) + Civil Comments (2000) + SBIC (2000); **macro-avg** P/R/F1, summed FP/FN | 0.6500 |
38+
| **Prompt Injection** | LocalMod | 0.9324 | 0.8525 | 0.8907 | 65 | 155 | 2101 | `S-Labs/prompt-injection-dataset` (test, threshold=0.10) | 0.8953 |
39+
| **Spam** | LocalMod | 0.9861 | 1.0000 | 0.9930 | 1 | 0 | 500 | `ucirvine/sms_spam` (train) | 0.9988 |
40+
| **NSFW Text** | LocalMod | 0.6034 | 0.9533 | 0.7390 | 188 | 14 | 600 | Proxy: `Maxx0/Texting_sex` (NSFW) vs `ag_news` (SFW), balanced (threshold=0.60) | 0.6633 |
41+
| **NSFW Image** | LocalMod | 1.0000 | 0.9500 | 0.9744 | 0 | 2 | 80 | Proxy: `x1101/nsfw` (pos) vs `cifar10` (neg), balanced | 0.9750 |
3942
---
4043

4144
## Installation

0 commit comments

Comments
 (0)