Skip to content

SQL_LLM_VALIDATION

Nick edited this page Mar 10, 2026 · 1 revision

SQL Rule LLM Validation

PATAS includes an optional LLM-based quality validator for SQL rules to detect false positive risks before rules are activated.

Overview

After a SQL rule passes deterministic safety checks (validate_sql_rule, check_rule_coverage), an optional LLM validation step can assess the rule for false positive risks and quality issues.

Why LLM Validation?

Deterministic validation catches:

  • ✅ SQL syntax errors
  • ✅ Dangerous operations (UPDATE, DELETE, etc.)
  • ✅ Overly broad rules (matching >80% of messages)
  • ✅ Whitelist violations

LLM validation adds:

  • ✅ Semantic false positive risk assessment
  • ✅ Context-aware quality checks
  • ✅ Suggestions for improvement

How It Works

Integration Point

LLM validation runs after deterministic checks but before rule creation:

LLM generates SQL rule
  ↓
Deterministic validation (validate_sql_rule)
  ↓
Coverage check (check_rule_coverage)
  ↓
LLM quality validation (optional, if LLM available)
  ↓
Rule creation (if all checks pass)

Token Optimization

The validator is designed to be token-efficient:

  1. Reuses existing LLM client - Uses the same OpenAI client instance as pattern mining (no new API key needed)
  2. Compact prompt - Minimal prompt focusing only on false positive risk
  3. Limited examples - Only 3 spam examples, 2 ham examples (truncated to 60 chars)
  4. Short response - Max 800 tokens for response
  5. Low temperature - 0.1 for consistent validation

Prompt Structure

The validation prompt is compact and focused:

SQL rule quality check:

SQL: <sql_expression>
Pattern: <description>
SPAM examples: <3 truncated examples>
HAM examples: <2 truncated examples> (optional)

Check false positive risk (low/medium/high). JSON:
{"is_safe": bool, "risk_level": "low|medium|high", 
 "false_positive_risks": ["scenario"], 
 "suggestions": ["suggestion"], 
 "reasoning": "brief"}

Token estimate: ~200-400 tokens per validation (vs 1000-2000 for pattern generation)

Usage

Automatic Integration

LLM validation is automatically enabled when:

  • use_llm=True in pattern mining
  • LLM engine is available
  • SQL rule passes deterministic checks

Manual Usage

from app.v2_sql_llm_validator import create_sql_validator

# Reuse existing LLM engine
validator = create_sql_validator(
    llm_engine=mining_engine,  # Reuse from pattern mining
    model="gpt-4o-mini",
)

if validator:
    result = await validator.validate_rule_quality(
        sql_expression="SELECT id FROM messages WHERE text LIKE '%spam%'",
        pattern_description="Spam keyword pattern",
        example_spam_messages=["Buy now!", "Click here!"],
        example_ham_messages=["Hello, how are you?"],
    )
    
    if result['risk_level'] == 'high':
        # Reject rule
        pass
    elif result['risk_level'] == 'medium':
        # Log warning, proceed with caution
        pass

Response Format

{
  "is_safe": true,
  "risk_level": "low|medium|high",
  "false_positive_risks": [
    "Legitimate messages discussing prices might be blocked",
    "Messages with 'earn' in educational context might match"
  ],
  "suggestions": [
    "Add context check: exclude messages from trusted senders",
    "Require multiple indicators (not just single keyword)"
  ],
  "reasoning": "Rule is generally safe but may match some legitimate price discussions"
}

Decision Logic

  • risk_level == "high": Rule is rejected, not created
  • risk_level == "medium": Rule is created but warning is logged
  • risk_level == "low": Rule is created normally

Cost Optimization

Token Usage

  • Per validation: ~200-400 tokens (input + output)
  • Compared to pattern generation: ~10-20% of pattern generation cost
  • Total cost: Minimal when reusing existing LLM client

When to Skip

LLM validation can be skipped if:

  • LLM is not available (graceful degradation)
  • Cost is a concern (validation is optional)
  • Deterministic checks are sufficient

Configuration

LLM validation uses the same configuration as pattern mining:

  • Same API key (from PATAS_OPENAI_API_KEY or OPENAI_API_KEY)
  • Same model (default: gpt-4o-mini)
  • Same client instance (reused for efficiency)

Best Practices

  1. Use for high-stakes rules: Enable validation for rules that will be used in Conservative profile (auto-actions)
  2. Review medium-risk rules: Manually review rules flagged as medium risk
  3. Monitor costs: Track validation token usage if cost-sensitive
  4. Combine with deterministic checks: LLM validation complements, doesn't replace deterministic validation

Example

# In pattern mining pipeline
if sql_expr:
    # Deterministic checks
    is_valid, error = validate_sql_rule(sql_expr)
    if is_valid:
        coverage_valid, _, _ = await check_rule_coverage(...)
        if coverage_valid:
            # LLM validation (optional)
            if mining_engine and use_llm:
                validator = create_sql_validator(mining_engine)
                if validator:
                    result = await validator.validate_rule_quality(...)
                    if result['risk_level'] == 'high':
                        # Reject
                        continue
            
            # Create rule
            rule = await lifecycle.create_candidate_rule(...)

Limitations

  • Not deterministic: LLM validation can vary between runs
  • Token cost: Adds ~200-400 tokens per rule (minimal but not zero)
  • Requires LLM: Only works if LLM is available
  • False positives possible: LLM may flag safe rules or miss risky ones

Future Improvements

  • Cache validation results for identical SQL rules
  • Batch validation for multiple rules
  • Fine-tuned model for SQL rule validation
  • Integration with rule evaluation metrics (precision/recall)

Clone this wiki locally