Skip to content

API Quickstart

Nick edited this page Nov 18, 2025 · 1 revision

PATAS API Quickstart

Quick guide to integrating PATAS into your system via the REST API.


Base URL

http://localhost:8000/api/v1

For on-premise deployments, use your internal PATAS server URL.


Authentication

Currently, PATAS API does not require authentication for local development. For production, implement API key authentication.


Main Endpoint: /api/v1/analyze

The /api/v1/analyze endpoint bundles the complete workflow:

  1. Ingest messages
  2. Run pattern mining (optional)
  3. Evaluate rules (optional)
  4. Export rules (optional)

Perfect for: Small-medium batch analysis and quick prototyping.

Request

POST /api/v1/analyze
Content-Type: application/json

{
  "messages": [
    {
      "id": "msg_001",
      "text": "Your message text here",
      "is_spam": true,
      "meta": {"lang": "en"}
    }
  ],
  "run_mining": true,
  "run_evaluation": true,
  "export_backend": "sql"
}

Response

{
  "patterns": [
    {
      "id": 1,
      "type": "URL",
      "description": "URL pattern: http://spam.com",
      "group_size": 45,
      "sources_count": 3,
      "senders_count": 2,
      "similarity_reason": "Messages contain the same suspicious URL: http://spam.com",
      "example_report_ids": ["msg_001", "msg_002", "msg_003"],
      "bot_likelihood": 0.85,
      "sql_query": "SELECT id, message_content, sender, source FROM reports WHERE LOWER(message_content) LIKE '%http://spam.com%'"
    }
  ],
  "rules": [
    {
      "id": 1,
      "pattern_id": 1,
      "status": "candidate",
      "sql_expression": "SELECT id FROM messages WHERE ...",
      "precision": 0.95,
      "coverage": 0.15,
      "hits_total": 100
    }
  ],
  "export": "SELECT id FROM messages WHERE ...",
  "meta": {
    "ingested_count": 10,
    "patterns_created": 3,
    "rules_created": 3,
    "evaluation_count": 3,
    "timings": {
      "ingest_seconds": 0.123,
      "mining_seconds": 2.456,
      "evaluation_seconds": 0.789,
      "total_seconds": 3.368
    }
  }
}

Pattern Groups and SQL Output

The /api/v1/analyze endpoint returns pattern groups - groups of similar spam messages with detailed statistics and SQL queries.

Pattern Response Fields

Each pattern in the response includes:

  • group_size: Number of messages in this pattern group
  • sources_count: Number of unique sources/chats where these messages appeared
  • senders_count: Number of unique senders who sent these messages
  • similarity_reason: Human-readable explanation of why messages are similar (e.g., "Job offers with identical structure and varied salary amounts")
  • example_report_ids: Sample message IDs from the group (up to 10)
  • bot_likelihood: Bot probability score (0.0-1.0), where higher values indicate automated/spam behavior
  • sql_query: SQL SELECT query against generic reports table that matches this pattern

SQL Query Format

The sql_query field contains a SQL SELECT query that can be executed against a generic reports table with the following schema:

CREATE TABLE reports (
    id VARCHAR PRIMARY KEY,
    message_content TEXT NOT NULL,
    sender VARCHAR,
    source VARCHAR,
    country VARCHAR,
    has_media BOOLEAN,
    label_spam BOOLEAN,
    created_at TIMESTAMP
);

Field Mapping:

  • PATAS messages.textreports.message_content
  • PATAS messages.meta.senderreports.sender
  • PATAS messages.meta.sourcereports.source
  • PATAS messages.meta.countryreports.country
  • PATAS messages.meta.has_mediareports.has_media
  • PATAS messages.is_spamreports.label_spam

Example SQL Query:

SELECT id, message_content, sender, source 
FROM reports 
WHERE LOWER(message_content) LIKE '%http://spam.com%'

This query can be used to:

  • Identify all reports matching the pattern
  • Apply blocking rules in your system
  • Generate reports and analytics

Use Cases

  1. Batch Analysis: Send a batch of labeled reports, get pattern groups with SQL queries
  2. Rule Generation: Use sql_query to create blocking rules in your database
  3. Pattern Discovery: Understand why messages are similar via similarity_reason
  4. Bot Detection: Use bot_likelihood to prioritize automated spam patterns

Example: cURL

curl -X POST http://localhost:8000/api/v1/analyze \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "id": "test_001",
        "text": "Buy now! Click here: http://spam.com",
        "is_spam": true,
        "meta": {"lang": "en"}
      }
    ],
    "run_mining": true,
    "run_evaluation": true,
    "export_backend": "sql"
  }'

Example: Python

import requests

# Prepare messages
messages = [
    {
        "id": "msg_001",
        "text": "Buy now! Click here: http://spam.com",
        "is_spam": True,
        "meta": {"lang": "en"}
    }
]

# Send request
response = requests.post(
    'http://localhost:8000/api/v1/analyze',
    json={
        'messages': messages,
        'run_mining': True,
        'run_evaluation': True,
        'export_backend': 'sql'
    }
)

# Process results
result = response.json()

print(f"Discovered {len(result['patterns'])} patterns")
print(f"Generated {len(result['rules'])} rules")

# Export SQL rules
if result.get('export'):
    print("\nSQL Rules:")
    print(result['export'])

Example: JavaScript/Node.js

const fetch = require('node-fetch');

// Prepare messages
const messages = [
  {
    id: 'msg_001',
    text: 'Buy now! Click here: http://spam.com',
    is_spam: true,
    meta: { lang: 'en' }
  }
];

// Send request
fetch('http://localhost:8000/api/v1/analyze', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    messages,
    run_mining: true,
    run_evaluation: true,
    export_backend: 'sql'
  })
})
  .then(res => res.json())
  .then(result => {
    console.log(`Discovered ${result.patterns.length} patterns`);
    console.log(`Generated ${result.rules.length} rules`);
    
    if (result.export) {
      console.log('\nSQL Rules:');
      console.log(result.export);
    }
  });

Lower-Level Endpoints (For Large-Scale Pipelines)

For large-scale or continuous pipelines, use the lower-level endpoints:

1. Ingest Messages

POST /api/v1/messages/ingest
Content-Type: application/json

{
  "messages": [...]
}

2. Run Pattern Mining

POST /api/v1/patterns/mine
Content-Type: application/json

{
  "from_timestamp": "2025-01-01T00:00:00Z",
  "to_timestamp": "2025-01-07T00:00:00Z"
}

3. Evaluate Rules

POST /api/v1/rules/eval-shadow
Content-Type: application/json

{
  "rule_ids": [1, 2, 3],  // Optional: specific rules, or empty for all shadow rules
  "days": 7
}

4. Promote Rules

POST /api/v1/rules/promote

5. Export Rules

GET /api/v1/rules/export?backend=sql
GET /api/v1/rules/export?backend=rol

6. List Patterns

GET /api/v1/patterns?limit=100&offset=0

7. List Rules

GET /api/v1/rules?status=active&include_evaluation=true

Request Limits

Recommended limits:

  • Batch size: Up to 10,000 messages per /api/v1/analyze request
  • Message length: Up to 8,192 characters per message
  • Rate: Adjust based on your server capacity

For larger datasets:

  • Use lower-level endpoints with pagination
  • Process in batches
  • Use async processing if available

Response Formats

Export Backends

SQL (export_backend: "sql"):

SELECT id FROM messages WHERE text LIKE '%spam-pattern%';
SELECT id FROM messages WHERE text REGEXP 'spam-regex';

ROL (export_backend: "rol"):

{
  "rules": [
    {
      "id": 1,
      "pattern": "...",
      "action": "block"
    }
  ]
}

Error Handling

400 Bad Request

{
  "detail": "Empty messages list"
}

500 Internal Server Error

{
  "detail": "Pattern mining failed: ..."
}

Best Practice: Always check response status and handle errors gracefully.


Typical Workflow

For Small Batches (< 10,000 messages)

  1. Use /api/v1/analyze with all options enabled
  2. Get patterns, rules, and export in one response
  3. Deploy exported rules to your filtering system

For Large Batches (> 10,000 messages)

  1. Ingest messages in batches via /api/v1/messages/ingest
  2. Mine patterns via /api/v1/patterns/mine
  3. Evaluate rules via /api/v1/rules/eval-shadow
  4. Promote good rules via /api/v1/rules/promote
  5. Export active rules via /api/v1/rules/export

For Continuous Pipelines

  1. Set up scheduled ingestion (cron, task queue, etc.)
  2. Run pattern mining periodically (daily, weekly)
  3. Evaluate shadow rules regularly
  4. Auto-promote based on metrics (or manual review)
  5. Export and deploy active rules

Next Steps


API Documentation

Full OpenAPI documentation available at:

http://localhost:8000/docs

Interactive Swagger UI for exploring all endpoints.


Tip: Start with /api/v1/analyze for quick integration, then move to lower-level endpoints as your needs grow.

Clone this wiki locally