Skip to content

Checkpointing

Nick edited this page Mar 10, 2026 · 2 revisions

Pattern Mining Checkpointing

PATAS supports checkpointing for long-running pattern mining operations, allowing you to resume from where you left off if the process is interrupted.

Overview

Checkpointing saves progress during pattern mining operations, including:

  • Last processed message ID
  • Intermediate pattern results
  • Current stage (for two-stage pipeline)
  • Operation metadata

This is especially important for:

  • Large datasets (millions of messages)
  • Long-running operations (hours)
  • Unstable environments
  • Resource-constrained deployments

How It Works

Automatic Checkpointing

Checkpoints are automatically created and updated during pattern mining:

  1. Checkpoint Creation: Created at the start of pattern mining
  2. Periodic Updates: Updated every 5 chunks (configurable)
  3. Completion: Marked as completed when mining finishes
  4. Failure Handling: Marked as failed on errors

Checkpoint Data

Each checkpoint stores:

  • Parameters: days, min_spam_count
  • Progress: last_processed_message_id, chunk_index
  • State: patterns_in_progress (intermediate results)
  • Stage: Current pipeline stage (stage1, stage2, completed)
  • Metadata: Additional operation information

Usage

Resuming from Checkpoint

Use the CLI to resume a failed or interrupted pattern mining operation:

# List recent checkpoints
patas list-checkpoints

# Resume from a specific checkpoint
patas resume-mining <checkpoint_id> [--use-llm] [--use-semantic]

Example Workflow

# Start pattern mining (creates checkpoint automatically)
patas mine-patterns 30

# If interrupted, list checkpoints
patas list-checkpoints

# Resume from checkpoint ID 5
patas resume-mining 5 --use-semantic

Checkpoint Status

Checkpoints have three statuses:

  • RUNNING: Operation in progress
  • COMPLETED: Operation finished successfully
  • FAILED: Operation failed or was interrupted

CLI Commands

List Checkpoints

# List recent checkpoints (default: 10)
patas list-checkpoints [limit] [status]

# Examples
patas list-checkpoints 20              # List 20 most recent
patas list-checkpoints 10 running      # List running checkpoints only
patas list-checkpoints 5 failed        # List failed checkpoints

Resume Mining

patas resume-mining <checkpoint_id> [--use-llm] [--use-semantic]

# Examples
patas resume-mining 5                  # Resume without LLM
patas resume-mining 5 --use-llm        # Resume with LLM
patas resume-mining 5 --use-semantic   # Resume with semantic mining

Database Schema

Checkpoints are stored in the pattern_mining_checkpoints table:

CREATE TABLE pattern_mining_checkpoints (
    id INTEGER PRIMARY KEY,
    started_at TIMESTAMP NOT NULL,
    last_updated TIMESTAMP NOT NULL,
    status VARCHAR NOT NULL,  -- 'running', 'completed', 'failed'
    days INTEGER NOT NULL,
    min_spam_count INTEGER NOT NULL,
    last_processed_message_id INTEGER,
    patterns_in_progress JSON,
    stage VARCHAR,
    metadata JSON
);

Migration

To add checkpoint support to an existing database:

python scripts/migrate_add_checkpoint_table.py

Or the table will be created automatically on next startup if using SQLAlchemy's create_all.

Best Practices

  1. Monitor Checkpoints: Regularly check for failed checkpoints
  2. Cleanup Old Checkpoints: Remove completed checkpoints older than 30 days
  3. Resume Promptly: Resume failed operations as soon as possible
  4. Checkpoint Frequency: Adjust batch size if checkpoint updates are too frequent/infrequent

Troubleshooting

Checkpoint Not Created

If checkpoints aren't being created:

  1. Verify database connectivity
  2. Check database permissions
  3. Review application logs for errors

Resume Fails

If resuming from checkpoint fails:

  1. Verify checkpoint exists: patas list-checkpoints
  2. Check checkpoint status (should be running or failed)
  3. Review checkpoint metadata for clues
  4. Check database for related data integrity

Checkpoint Updates Too Slow

If checkpoint updates are impacting performance:

  1. Checkpoint updates happen every 5 chunks by default
  2. This is a balance between progress tracking and performance
  3. For very large datasets, consider increasing chunk size

Related Documentation

Clone this wiki locally