-
Notifications
You must be signed in to change notification settings - Fork 0
Threshold Calibration Guide
This guide helps you calibrate PATAS thresholds for your specific spam patterns and requirements.
Key thresholds:
-
min_spam_count: Minimum number of spam messages required to create a pattern (default: 10) -
min_spam_ratio: Minimum ratio of spam messages in pattern (default: 0.05 = 5%) -
semantic_similarity_threshold: Cosine similarity threshold for clustering (default: 0.75) -
semantic_min_cluster_size: Minimum messages per semantic cluster (default: 3)
Key thresholds:
-
min_precision: Minimum precision required for promotion (default: 0.95) -
max_ham_hits: Maximum false positives allowed (default: 5) -
min_coverage: Minimum coverage required (default: 0.01 = 1%) -
min_sample_size: Minimum evaluation sample size (default: 100)
Initial configuration:
pattern_mining:
min_spam_count: 10
min_spam_ratio: 0.05
semantic_similarity_threshold: 0.75
semantic_min_cluster_size: 3
rule_lifecycle:
promotion:
min_precision: 0.95
max_ham_hits: 5
min_coverage: 0.01
min_sample_size: 100Run pattern mining on historical data:
patas mine-patterns --days=30Review results:
- Number of patterns discovered
- Number of rules generated
- Pattern types (URL, keyword, semantic, etc.)
Run shadow evaluation:
patas eval-rules --days=30Review metrics:
- Precision distribution
- Recall distribution
- Coverage distribution
- False positive rate
If missing patterns:
- Lower
min_spam_count(e.g., 5 instead of 10) - Lower
min_spam_ratio(e.g., 0.03 instead of 0.05) - Lower
semantic_similarity_threshold(e.g., 0.70 instead of 0.75)
If too many false positives:
- Raise
min_precision(e.g., 0.98 instead of 0.95) - Lower
max_ham_hits(e.g., 3 instead of 5) - Raise
semantic_similarity_threshold(e.g., 0.80 instead of 0.75)
If rules too specific:
- Lower
semantic_similarity_threshold(e.g., 0.70-0.75) - Enable semantic mining if disabled
- Lower
semantic_min_cluster_size(e.g., 2 instead of 3)
If rules too broad:
- Raise
semantic_similarity_threshold(e.g., 0.80-0.85) - Raise
min_spam_ratio(e.g., 0.10 instead of 0.05) - Increase
min_sample_sizefor evaluation
Repeat steps 2-4:
- Run pattern mining with adjusted thresholds
- Evaluate rules
- Review metrics
- Adjust thresholds based on results
- Continue until optimal balance
Characteristics:
- Same spam message sent many times
- High repetition rate
- Easy to detect with deterministic patterns
Recommended thresholds:
pattern_mining:
min_spam_count: 5 # Lower (pattern appears frequently)
min_spam_ratio: 0.10 # Higher (pattern is concentrated)
semantic_similarity_threshold: 0.80 # Higher (messages are very similar)
semantic_min_cluster_size: 5 # Higher (larger clusters expected)Characteristics:
- Spam messages vary in wording
- Same meaning, different words
- Requires semantic analysis
Recommended thresholds:
pattern_mining:
min_spam_count: 10 # Default
min_spam_ratio: 0.03 # Lower (pattern is distributed)
semantic_similarity_threshold: 0.70 # Lower (catch variations)
semantic_min_cluster_size: 3 # DefaultUse case:
- Can tolerate some false positives
- Want to catch more spam
- Manual review available
Recommended thresholds:
rule_lifecycle:
promotion:
min_precision: 0.90 # Lower (allow more false positives)
max_ham_hits: 10 # Higher (allow more false positives)
min_coverage: 0.01 # Default
min_sample_size: 50 # Lower (faster promotion)Use case:
- Cannot tolerate false positives
- High-traffic system
- Automated blocking
Recommended thresholds:
rule_lifecycle:
promotion:
min_precision: 0.98 # Higher (fewer false positives)
max_ham_hits: 3 # Lower (fewer false positives)
min_coverage: 0.01 # Default
min_sample_size: 200 # Higher (more confidence)Semantic similarity threshold:
- Range: 0.0 to 1.0
- Higher = stricter (only very similar messages)
- Lower = more lenient (catches more variations)
0.80-0.85: Very Strict
- Only very similar messages grouped
- Fewer false positives
- May miss legitimate variations
- Use when: False positives are critical
0.75-0.80: Balanced (Recommended)
- Good balance of precision and recall
- Catches most variations
- Acceptable false positive rate
- Use when: General purpose
0.70-0.75: More Lenient
- Catches more variations
- May include some false positives
- Better recall
- Use when: Missing too many spam patterns
<0.70: Very Lenient
- Catches many variations
- Higher false positive risk
- Best recall
- Use when: Need to catch all variations, manual review available
- Start with 0.75: Good default balance
- Run pattern mining: Discover patterns
- Evaluate clusters: Check if similar messages are grouped
-
Adjust based on results:
- Too many small clusters → Lower threshold
- Too many false positives → Raise threshold
- Missing variations → Lower threshold
- Too many false positives → Raise threshold
- Test on sample data: Validate before applying to full dataset
Clustering visualization:
# Visualize clusters to understand threshold impact
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
# Reduce embeddings to 2D for visualization
embeddings_2d = TSNE(n_components=2).fit_transform(embeddings)
# Plot clusters
plt.scatter(embeddings_2d[:, 0], embeddings_2d[:, 1], c=cluster_labels)
plt.show()Calibration script:
poetry run python scripts/calibrate_similarity_threshold.py \
--dataset data/sample_messages.jsonl \
--thresholds 0.70,0.75,0.80,0.85Usage:
poetry run python scripts/calibrate_thresholds.py \
--dataset data/historical_messages.jsonl \
--days 30 \
--output calibration_results.jsonOutput:
- Optimal thresholds for your data
- Precision/recall trade-offs
- Recommended configuration
Step-by-step:
- Run pattern mining with different thresholds
- Evaluate rules for each configuration
- Compare metrics (precision, recall, coverage)
- Choose configuration with best balance
Example:
# Test different min_spam_count values
for count in 5 10 15 20; do
patas mine-patterns --min-spam-count=$count --days=30
patas eval-rules --days=30
# Review results
done- Start conservative: Use higher thresholds initially
- Iterate gradually: Adjust thresholds in small increments
- Test on sample data: Validate before applying to full dataset
- Monitor metrics: Track precision/recall over time
- Document changes: Keep track of threshold adjustments
- Review regularly: Recalibrate as spam patterns evolve
- Use profiles: Create custom profiles for different use cases
- A/B testing: Compare different threshold configurations
Symptoms:
- Very few patterns discovered
- Missing obvious spam patterns
Solutions:
- Lower
min_spam_count(e.g., 5 instead of 10) - Lower
min_spam_ratio(e.g., 0.03 instead of 0.05) - Lower
semantic_similarity_threshold(e.g., 0.70 instead of 0.75) - Enable semantic mining if disabled
- Check if sufficient historical data available
Symptoms:
- High false positive rate
- Legitimate messages blocked
Solutions:
- Raise
min_precision(e.g., 0.98 instead of 0.95) - Lower
max_ham_hits(e.g., 3 instead of 5) - Raise
semantic_similarity_threshold(e.g., 0.80 instead of 0.75) - Increase
min_sample_sizefor evaluation - Review and manually deprecate problematic rules
Symptoms:
- Rules catch only exact matches
- Missing variations of spam
Solutions:
- Lower
semantic_similarity_threshold(e.g., 0.70-0.75) - Enable semantic mining
- Lower
semantic_min_cluster_size(e.g., 2 instead of 3) - Check if embeddings are working correctly
Symptoms:
- Rules match too many messages
- High coverage but low precision
Solutions:
- Raise
semantic_similarity_threshold(e.g., 0.80-0.85) - Raise
min_spam_ratio(e.g., 0.10 instead of 0.05) - Increase
min_sample_sizefor evaluation - Review patterns and split into more specific patterns
- Configuration Guide - Configuration options
- Performance Guide - Performance optimization
- FAQ - Frequently asked questions
For calibration questions or issues, please open an issue on GitHub.