Threshold Calibration Guide

PATAS Threshold Calibration Guide

This guide helps you calibrate PATAS thresholds for your specific spam patterns and requirements.

Understanding Thresholds

Pattern Mining Thresholds

Key thresholds:

min_spam_count: Minimum number of spam messages required to create a pattern (default: 10)
min_spam_ratio: Minimum ratio of spam messages in pattern (default: 0.05 = 5%)
semantic_similarity_threshold: Cosine similarity threshold for clustering (default: 0.75)
semantic_min_cluster_size: Minimum messages per semantic cluster (default: 3)

Rule Promotion Thresholds

Key thresholds:

min_precision: Minimum precision required for promotion (default: 0.95)
max_ham_hits: Maximum false positives allowed (default: 5)
min_coverage: Minimum coverage required (default: 0.01 = 1%)
min_sample_size: Minimum evaluation sample size (default: 100)

Calibration Process

Step 1: Start with Defaults

Initial configuration:

pattern_mining:
  min_spam_count: 10
  min_spam_ratio: 0.05
  semantic_similarity_threshold: 0.75
  semantic_min_cluster_size: 3

rule_lifecycle:
  promotion:
    min_precision: 0.95
    max_ham_hits: 5
    min_coverage: 0.01
    min_sample_size: 100

Step 2: Run Initial Pattern Mining

Run pattern mining on historical data:

patas mine-patterns --days=30

Review results:

Number of patterns discovered
Number of rules generated
Pattern types (URL, keyword, semantic, etc.)

Step 3: Evaluate Rules

Run shadow evaluation:

patas eval-rules --days=30

Review metrics:

Precision distribution
Recall distribution
Coverage distribution
False positive rate

Step 4: Adjust Thresholds

If missing patterns:

Lower min_spam_count (e.g., 5 instead of 10)
Lower min_spam_ratio (e.g., 0.03 instead of 0.05)
Lower semantic_similarity_threshold (e.g., 0.70 instead of 0.75)

If too many false positives:

Raise min_precision (e.g., 0.98 instead of 0.95)
Lower max_ham_hits (e.g., 3 instead of 5)
Raise semantic_similarity_threshold (e.g., 0.80 instead of 0.75)

If rules too specific:

Lower semantic_similarity_threshold (e.g., 0.70-0.75)
Enable semantic mining if disabled
Lower semantic_min_cluster_size (e.g., 2 instead of 3)

If rules too broad:

Raise semantic_similarity_threshold (e.g., 0.80-0.85)
Raise min_spam_ratio (e.g., 0.10 instead of 0.05)
Increase min_sample_size for evaluation

Step 5: Iterate

Repeat steps 2-4:

Run pattern mining with adjusted thresholds
Evaluate rules
Review metrics
Adjust thresholds based on results
Continue until optimal balance

Threshold Recommendations by Spam Type

Concentrated Spam (Same Pattern Repeated)

Characteristics:

Same spam message sent many times
High repetition rate
Easy to detect with deterministic patterns

Recommended thresholds:

pattern_mining:
  min_spam_count: 5  # Lower (pattern appears frequently)
  min_spam_ratio: 0.10  # Higher (pattern is concentrated)
  semantic_similarity_threshold: 0.80  # Higher (messages are very similar)
  semantic_min_cluster_size: 5  # Higher (larger clusters expected)

Distributed Spam (Many Variations)

Characteristics:

Spam messages vary in wording
Same meaning, different words
Requires semantic analysis

Recommended thresholds:

pattern_mining:
  min_spam_count: 10  # Default
  min_spam_ratio: 0.03  # Lower (pattern is distributed)
  semantic_similarity_threshold: 0.70  # Lower (catch variations)
  semantic_min_cluster_size: 3  # Default

High False Positive Tolerance

Use case:

Can tolerate some false positives
Want to catch more spam
Manual review available

Recommended thresholds:

rule_lifecycle:
  promotion:
    min_precision: 0.90  # Lower (allow more false positives)
    max_ham_hits: 10  # Higher (allow more false positives)
    min_coverage: 0.01  # Default
    min_sample_size: 50  # Lower (faster promotion)

Low False Positive Tolerance

Use case:

Cannot tolerate false positives
High-traffic system
Automated blocking

Recommended thresholds:

rule_lifecycle:
  promotion:
    min_precision: 0.98  # Higher (fewer false positives)
    max_ham_hits: 3  # Lower (fewer false positives)
    min_coverage: 0.01  # Default
    min_sample_size: 200  # Higher (more confidence)

Semantic Similarity Threshold Tuning

Understanding the Threshold

Semantic similarity threshold:

Range: 0.0 to 1.0
Higher = stricter (only very similar messages)
Lower = more lenient (catches more variations)

Threshold Ranges

0.80-0.85: Very Strict

Only very similar messages grouped
Fewer false positives
May miss legitimate variations
Use when: False positives are critical

0.75-0.80: Balanced (Recommended)

Good balance of precision and recall
Catches most variations
Acceptable false positive rate
Use when: General purpose

0.70-0.75: More Lenient

Catches more variations
May include some false positives
Better recall
Use when: Missing too many spam patterns

<0.70: Very Lenient

Catches many variations
Higher false positive risk
Best recall
Use when: Need to catch all variations, manual review available

Tuning Process

Start with 0.75: Good default balance
Run pattern mining: Discover patterns
Evaluate clusters: Check if similar messages are grouped
Adjust based on results:
- Too many small clusters → Lower threshold
- Too many false positives → Raise threshold
- Missing variations → Lower threshold
- Too many false positives → Raise threshold
Test on sample data: Validate before applying to full dataset

Tools for Tuning

Clustering visualization:

# Visualize clusters to understand threshold impact
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt

# Reduce embeddings to 2D for visualization
embeddings_2d = TSNE(n_components=2).fit_transform(embeddings)

# Plot clusters
plt.scatter(embeddings_2d[:, 0], embeddings_2d[:, 1], c=cluster_labels)
plt.show()

Calibration script:

poetry run python scripts/calibrate_similarity_threshold.py \
  --dataset data/sample_messages.jsonl \
  --thresholds 0.70,0.75,0.80,0.85

Calibration Tools

Automated Calibration Script

Usage:

poetry run python scripts/calibrate_thresholds.py \
  --dataset data/historical_messages.jsonl \
  --days 30 \
  --output calibration_results.json

Output:

Optimal thresholds for your data
Precision/recall trade-offs
Recommended configuration

Manual Calibration

Step-by-step:

Run pattern mining with different thresholds
Evaluate rules for each configuration
Compare metrics (precision, recall, coverage)
Choose configuration with best balance

Example:

# Test different min_spam_count values
for count in 5 10 15 20; do
  patas mine-patterns --min-spam-count=$count --days=30
  patas eval-rules --days=30
  # Review results
done

Best Practices

Start conservative: Use higher thresholds initially
Iterate gradually: Adjust thresholds in small increments
Test on sample data: Validate before applying to full dataset
Monitor metrics: Track precision/recall over time
Document changes: Keep track of threshold adjustments
Review regularly: Recalibrate as spam patterns evolve
Use profiles: Create custom profiles for different use cases
A/B testing: Compare different threshold configurations

Troubleshooting

Too Few Patterns Discovered

Symptoms:

Very few patterns discovered
Missing obvious spam patterns

Solutions:

Lower min_spam_count (e.g., 5 instead of 10)
Lower min_spam_ratio (e.g., 0.03 instead of 0.05)
Lower semantic_similarity_threshold (e.g., 0.70 instead of 0.75)
Enable semantic mining if disabled
Check if sufficient historical data available

Too Many False Positives

Symptoms:

High false positive rate
Legitimate messages blocked

Solutions:

Raise min_precision (e.g., 0.98 instead of 0.95)
Lower max_ham_hits (e.g., 3 instead of 5)
Raise semantic_similarity_threshold (e.g., 0.80 instead of 0.75)
Increase min_sample_size for evaluation
Review and manually deprecate problematic rules

Rules Too Specific

Symptoms:

Rules catch only exact matches
Missing variations of spam

Solutions:

Lower semantic_similarity_threshold (e.g., 0.70-0.75)
Enable semantic mining
Lower semantic_min_cluster_size (e.g., 2 instead of 3)
Check if embeddings are working correctly

Rules Too Broad

Symptoms:

Rules match too many messages
High coverage but low precision

Solutions:

Raise semantic_similarity_threshold (e.g., 0.80-0.85)
Raise min_spam_ratio (e.g., 0.10 instead of 0.05)
Increase min_sample_size for evaluation
Review patterns and split into more specific patterns

Additional Resources

Configuration Guide - Configuration options
Performance Guide - Performance optimization
FAQ - Frequently asked questions

For calibration questions or issues, please open an issue on GitHub.

Threshold Calibration Guide

PATAS Threshold Calibration Guide

Understanding Thresholds

Pattern Mining Thresholds

Rule Promotion Thresholds

Calibration Process

Step 1: Start with Defaults

Step 2: Run Initial Pattern Mining

Step 3: Evaluate Rules

Step 4: Adjust Thresholds

Step 5: Iterate

Threshold Recommendations by Spam Type

Concentrated Spam (Same Pattern Repeated)

Distributed Spam (Many Variations)

High False Positive Tolerance

Low False Positive Tolerance

Semantic Similarity Threshold Tuning

Understanding the Threshold

Threshold Ranges

Tuning Process

Tools for Tuning

Calibration Tools

Automated Calibration Script

Manual Calibration

Best Practices

Troubleshooting

Too Few Patterns Discovered

Too Many False Positives

Rules Too Specific

Rules Too Broad

Additional Resources

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!