Skip to content

Incremental Mining

Nick edited this page Nov 21, 2025 · 1 revision

Incremental Pattern Mining

Process only new messages instead of re-analyzing entire dataset.

Overview

Incremental mining uses checkpoint last_processed_message_id to process only messages added after the last mining run.

Benefits:

  • Faster processing (only new messages)
  • Lower costs (fewer LLM/embedding calls)
  • Suitable for continuous operation

Limitations:

  • Old patterns are not re-evaluated
  • New patterns are discovered only from new messages
  • Use full mining periodically to catch pattern evolution

Usage

CLI

# First run (full mining)
patas mine-patterns --days=7

# Subsequent runs (incremental, from last checkpoint)
patas mine-patterns --days=7 --since-checkpoint <checkpoint_id>

Finding Checkpoint ID

# List recent checkpoints
patas list-checkpoints

# Output shows checkpoint IDs and last_processed_message_id

API

import requests

# Get last checkpoint
response = requests.get("http://localhost:8000/api/v1/checkpoints?limit=1")
checkpoint_id = response.json()[0]["id"]

# Run incremental mining
response = requests.post(
    "http://localhost:8000/api/v1/patterns/mine",
    json={"days": 7, "since_checkpoint": checkpoint_id}
)

How It Works

  1. Checkpoint stores last_processed_message_id after each mining run
  2. Incremental mining filters messages: WHERE id > last_processed_message_id
  3. Only new messages are processed for pattern discovery
  4. Old patterns remain unchanged (not re-evaluated)

Best Practices

Daily Incremental Mining

# Daily cron job
0 2 * * * patas mine-patterns --days=1 --since-checkpoint $(patas list-checkpoints --limit=1 --status=completed | grep -o '[0-9]*' | head -1)

Weekly Full Mining

# Weekly full re-analysis (no --since-checkpoint)
0 3 * * 0 patas mine-patterns --days=7

Hybrid Approach

  • Daily: Incremental mining (new messages only)
  • Weekly: Full mining (catch pattern evolution)
  • Monthly: Full evaluation of all rules

Performance

Incremental (100K new messages):

  • Time: ~20-30 min (vs ~3.5 hours for full 500K)
  • Cost: ~$18 (vs ~$91 for full run)

Break-even: Incremental is 5-10x faster and cheaper for daily operations.

Clone this wiki locally