This document outlines performance optimizations implemented in HEARTH and potential future improvements.
Problem: Duplicate detection required reading 69+ markdown files on every submission, taking 15-30 seconds in GitHub Actions.
Solution: SQLite database index with automatic updates.
Performance Gain:
- 30-60x faster duplicate detection
- Reduced from ~15-30s to ~0.5s in GitHub Actions
- 99% reduction in file I/O operations
Implementation Details:
- Database:
database/hunts.db(SQLite) - Builder script:
scripts/build_hunt_database.py - Auto-update workflow:
.github/workflows/update-hunt-database.yml - Fallback to file-based approach if database unavailable
Cost Savings:
- ~25-29 seconds saved per workflow run
- Approximately $0.0001-0.0002 saved per run
- Annual savings: ~$10-20 (based on 100 submissions/year)
Maintenance: Fully automated via GitHub Actions
Problem: CTI reports from modern websites were appearing as garbled binary data due to compression.
Solution: Added support for modern compression algorithms and intelligent content extraction.
Features:
- Brotli compression support (via
brotlipackage) - Zstandard compression support (via
zstandardpackage) - JavaScript rendering fallback (via
readability-lxml) - Multiple format support (HTML, PDF, DOCX)
- Smart content parsing targeting article containers
Performance Impact:
- Near 100% success rate on DFIR Report and similar sites
- Reduced manual intervention for CTI submissions
- Better content quality for AI analysis
Implementation:
- File:
.github/scripts/process_issue.py - Dependencies added to all CTI workflows
Status: Not yet implemented
Potential Savings:
- 67% reduction in prompt costs for repeated content
- Faster response times (reduced latency from cached prompts)
- Estimated $50-150/year savings based on current usage
How It Works: Anthropic's prompt caching allows you to cache large, static portions of prompts (like system instructions, examples, or reference material) and reuse them across multiple API calls. Cached content is charged at a 90% discount compared to regular input tokens.
HEARTH currently makes multiple Claude API calls per hunt generation:
-
CTI Summarization (if document > 20,000 chars)
- Chunks large documents into manageable pieces
- Each chunk processed separately (~5-10 API calls per large CTI report)
- Same system instructions repeated each time
-
Hunt Generation
- Multiple attempts (up to 5 retries) with
SYSTEM_PROMPT SYSTEM_PROMPTis ~2,000 tokens and repeated on every call- Same examples and guidelines sent every time
- Multiple attempts (up to 5 retries) with
-
Hunt Refinement (on regeneration)
- User feedback + original CTI + system prompt
- Can involve 2-5 API calls for iterative improvements
The following components are ideal for caching:
SYSTEM_PROMPT = """You are a threat hunter generating HEARTH markdown files.
Each file MUST focus on exactly ONE MITRE ATT&CK technique - no exceptions.
[...2000+ tokens of instructions and examples...]
"""- Size: ~2,000 tokens
- Frequency: Used in every hunt generation call (5+ times per submission)
- Cache Lifetime: 5 minutes (Anthropic default)
- Savings: 90% reduction on repeated calls within 5 minutes
For regeneration requests, the CTI content doesn't change:
cti_content = "..." # Full threat intelligence report- Size: 5,000-20,000 tokens (varies by report)
- Frequency: Used in regeneration attempts (2-5 calls)
- Savings: Significant for large CTI reports
When users request regeneration with feedback:
feedback_context = f"Previous hypothesis: {old_hypothesis}\nUser feedback: {feedback}"- Size: 500-2,000 tokens
- Frequency: Multiple refinement attempts
Code Changes Required:
# In generate_from_cti.py
def generate_hunt_with_claude(prompt, cti_content, temperature=0.7):
"""
Generate hunt using Claude with prompt caching enabled.
"""
response = anthropic_client.messages.create(
model=CLAUDE_MODEL,
max_tokens=1200,
temperature=temperature,
system=[
{
"type": "text",
"text": SYSTEM_PROMPT,
"cache_control": {"type": "ephemeral"} # Enable caching
}
],
messages=[
{
"role": "user",
"content": prompt
}
]
)
return response.content[0].text.strip()Expected Savings:
- First call: Full cost (e.g., 2,000 input tokens @ $0.003/1K = $0.006)
- Subsequent calls within 5 min: 90% discount on system prompt
- Cached: 2,000 tokens @ $0.0003/1K = $0.0006
- Dynamic: Only new user content charged at full rate
- Per submission: ~$0.025 → ~$0.008 (67% reduction)
For regeneration workflows, cache the CTI report:
def regenerate_hunt_with_feedback(cti_content, feedback, old_hypothesis):
"""
Regenerate hunt with user feedback, caching the CTI content.
"""
response = anthropic_client.messages.create(
model=CLAUDE_MODEL,
max_tokens=1200,
temperature=0.7,
system=[
{
"type": "text",
"text": SYSTEM_PROMPT,
"cache_control": {"type": "ephemeral"}
}
],
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": f"# Original CTI Report\n\n{cti_content}",
"cache_control": {"type": "ephemeral"} # Cache CTI too
},
{
"type": "text",
"text": f"# Previous Hypothesis\n\n{old_hypothesis}\n\n# User Feedback\n\n{feedback}\n\nPlease generate an improved hunt based on this feedback."
}
]
}
]
)
return response.content[0].text.strip()Expected Savings:
- First regeneration: Full cost
- Subsequent regenerations within 5 min: 90% discount on both system prompt AND CTI content
- Particularly valuable for hunts with 3-5 regeneration attempts
Cache Warming Strategy:
- Pre-warm cache when issue is labeled
intel-submission - Keep cache alive during active editing sessions
- Monitor cache hit rates via response metadata
Implementation:
def warm_cache_for_submission(cti_content):
"""
Pre-warm the cache with CTI content before generation starts.
This ensures cache is available for all subsequent calls.
"""
anthropic_client.messages.create(
model=CLAUDE_MODEL,
max_tokens=1, # Minimal response, just warming cache
system=[{
"type": "text",
"text": SYSTEM_PROMPT,
"cache_control": {"type": "ephemeral"}
}],
messages=[{
"role": "user",
"content": [{
"type": "text",
"text": f"CTI Report:\n\n{cti_content}",
"cache_control": {"type": "ephemeral"}
}]
}]
)Track cache efficiency using response metadata:
response = anthropic_client.messages.create(...)
# Access usage statistics
usage = response.usage
print(f"Input tokens: {usage.input_tokens}")
print(f"Cache creation tokens: {usage.cache_creation_input_tokens}")
print(f"Cache read tokens: {usage.cache_read_input_tokens}")
print(f"Output tokens: {usage.output_tokens}")
# Calculate savings
if usage.cache_read_input_tokens > 0:
cache_hit_rate = usage.cache_read_input_tokens / (usage.input_tokens + usage.cache_read_input_tokens)
print(f"Cache hit rate: {cache_hit_rate:.1%}")Add to GitHub Actions output:
- name: Report Cache Statistics
run: |
echo "📊 Cache Performance:"
echo " Cache Hit Rate: ${{ steps.generate.outputs.cache_hit_rate }}"
echo " Tokens Saved: ${{ steps.generate.outputs.tokens_saved }}"
echo " Cost Savings: ${{ steps.generate.outputs.cost_savings }}"Current Costs (estimated, 100 submissions/year):
- Average submission: 5 API calls × 3,000 tokens = 15,000 input tokens
- Cost per submission: 15,000 × $0.003/1K = $0.045
- Annual cost: $0.045 × 100 = $4.50/year
With Prompt Caching:
- First call: 3,000 tokens @ full rate = $0.009
- Calls 2-5 (within 5 min):
- 2,000 cached tokens @ 90% discount = $0.0006
- 1,000 new tokens @ full rate = $0.003
- Total per call: $0.0036
- Total per submission: $0.009 + (4 × $0.0036) = $0.0234
- Annual cost: $0.0234 × 100 = $2.34/year
Savings: $4.50 - $2.34 = $2.16/year (48% reduction)
With higher usage (1,000 submissions/year):
- Current: $45/year
- With caching: $23.40/year
- Savings: $21.60/year (48% reduction)
- Update
generate_from_cti.pyto use system parameter with cache_control - Add cache statistics tracking to all Claude API calls
- Update GitHub Actions workflows to report cache performance
- Add cache hit rate monitoring to workflow outputs
- Document cache behavior in troubleshooting guide
- Test cache performance with various submission types
- Monitor cache hit rates over first month
- Optimize cache strategy based on real-world data
Step 1: Update Anthropic client version
pip install --upgrade anthropic>=0.18.0Step 2: Modify API calls to use new format
# OLD FORMAT (deprecated)
response = anthropic_client.messages.create(
model=CLAUDE_MODEL,
max_tokens=1200,
messages=[{"role": "user", "content": f"System: {SYSTEM_PROMPT}\n\nUser: {prompt}"}]
)
# NEW FORMAT (with caching)
response = anthropic_client.messages.create(
model=CLAUDE_MODEL,
max_tokens=1200,
system=[{
"type": "text",
"text": SYSTEM_PROMPT,
"cache_control": {"type": "ephemeral"}
}],
messages=[{"role": "user", "content": prompt}]
)Step 3: Test in development
# Run with a test CTI report
AI_PROVIDER=claude python scripts/generate_from_cti.pyStep 4: Deploy to production
- Update GitHub Actions workflows with new dependencies
- Monitor first 10 submissions for cache performance
- Adjust strategy based on real-world hit rates
Status: Not yet implemented
Priority: ⭐⭐⭐ (Medium - Nice to have)
Problem: Tactic extraction uses keyword matching with ~85% accuracy.
Solution: Include MITRE ATT&CK framework data statically in repo.
Benefits:
- More accurate technique → tactic mapping
- Better technique descriptions for AI context
- No external API dependencies
- Faster lookups (local data)
Implementation:
# Download MITRE ATT&CK data
curl -o data/mitre-attack.json https://raw.githubusercontent.com/mitre/cti/master/enterprise-attack/enterprise-attack.json
# Update duplicate_detection.py to use local data
python scripts/update_mitre_data.pyEstimated Effort: 2-4 hours
Estimated Savings:
- Improved accuracy (85% → 98%)
- Reduced AI hallucinations on technique classification
Status: Not yet implemented
Priority: ⭐⭐ (Low - Advanced feature)
Problem: Current duplicate detection uses simple semantic comparison.
Solution: Pre-compute embeddings for all hunts, use vector similarity.
Benefits:
- More accurate duplicate detection
- Can find "similar but not duplicate" hunts
- Faster comparisons at scale (>1000 hunts)
Challenges:
- Requires embedding model (cost)
- Need vector database or storage approach
- More complex implementation
Estimated Effort: 8-16 hours
When to Implement: When hunt database exceeds 500 hunts
Status: Not yet implemented
Priority: ⭐ (Very Low - Edge case optimization)
Problem: Very long CTI reports (>50,000 tokens) are chunked and summarized.
Solution: Stream processing with incremental summarization.
Benefits:
- Reduced memory usage
- Faster processing for extremely long reports
- Better handling of rate limits
When to Implement: When average CTI report exceeds 50,000 tokens
Current performance baselines (as of November 2025):
| Metric | Before Optimization | After Optimization | Improvement |
|---|---|---|---|
| Duplicate Detection | 15-30s | 0.5s | 30-60x faster |
| Database Query Time | N/A (file-based) | <10ms | Instant |
| CTI Extraction Success Rate | ~60% | ~98% | 63% improvement |
| GitHub Actions Runtime | ~90s | ~65s | 28% faster |
| API Cost per Submission | $0.045 | $0.045 | No change yet |
With prompt caching implemented:
| Metric | Current | With Caching | Improvement |
|---|---|---|---|
| API Cost per Submission | $0.045 | $0.023 | 48% reduction |
| Response Latency | ~3-5s | ~1-2s | 40-60% faster |
| Annual API Costs (100 subs) | $4.50 | $2.34 | $2.16 saved |
| Annual API Costs (1000 subs) | $45.00 | $23.40 | $21.60 saved |
- Anthropic Prompt Caching Documentation
- HEARTH Database Architecture
- SQLite Performance Best Practices
- MITRE ATT&CK CTI Repository
Have ideas for additional optimizations? Open an issue or submit a PR!
Guidelines:
- Measure impact before and after optimization
- Document performance gains with real data
- Consider GitHub Actions cost implications
- Maintain backward compatibility where possible