Skip to content

Eliminate O(tags × entries) cross-join in TermQueryBuilder::applyCollectionAndTaxonomyWheres() #1449

@buddy94

Description

@buddy94

Summary

When using the Eloquent Driver with taxonomy terms, the TermQueryBuilder::applyCollectionAndTaxonomyWheres() method generates a query that performs a full cross-join between ALL taxonomy_terms and ALL entries using JSON_CONTAINS inside a WHERE EXISTS subquery. This query cannot use any index and scales as O(tags × entries), making it unusable at production scale.

Worse, this query is silently triggered by any template that accesses a tags field — even just {{ if tags }} — because Statamic's Antlers runtime intercepts Builder objects via PathDataManager::checkForValueIntercept() and executes them through the Query tag.


The Problem

The Killer Query

TermQueryBuilder::applyCollectionAndTaxonomyWheres() (lines 224–305 in statamic/eloquent-driver) generates this SQL:

SELECT slug FROM taxonomy_terms
WHERE taxonomy = 'tags'
  AND EXISTS (
    SELECT 1 FROM entries
    WHERE collection IN ('news')
      AND JSON_CONTAINS(entries.data->'$."tags"', CONCAT('"', taxonomy_terms.slug, '"'))
  )

This evaluates JSON_CONTAINS for every combination of term and entry. With 10,000 tags × 10,000 entries = 100 million evaluations.

Measured Performance

Scale Killer Query Time Page Load (cold)
2,100 tags × 119 news ~2.3s ~4s
2,100 tags × 3,119 news ~6.4s ~12-16s
10,100 tags × 10,119 news ~6s × 3 calls ~23s
12,359 tags × 3,264 news (production) estimated 40–120s TIMEOUT

Hidden Trigger: {{ if tags }} in Templates

The query fires not just when iterating over tags, but any time a template accesses a taxonomy term field:

{{ if tags }}           {{# This triggers the killer query #}}
    Show tags section
{{ /if }}

Call chain:

  1. PathDataManager::getData() (line 634) → accesses tags field
  2. Terms::augment() returns an OrderedQueryBuilder (no SQL yet)
  3. PathDataManager::checkForValueIntercept() (line 467) detects the Builder
  4. NodeProcessor::evaluateDeferredNodeAsTag() (line 486) invokes Statamic\Tags\Query->index()
  5. GetsQueryResults::results() (line 31) calls $query->get()
  6. TermQueryBuilder::get()applyCollectionAndTaxonomyWheres()KILLER QUERY

If a page has multiple sections that check {{ if tags }} (e.g., related news cards, most-read cards), the query fires multiple times per request — we measured 3 executions at ~6s each = ~18s just for tag existence checks.


Proposed Solutions

Solution A: Add whereIn('slug', [...]) optimization (Eloquent Driver)

When the query builder already has a whereIn('slug', $slugs) constraint (which it does — Terms::queryBuilder() adds it at line 182), applyCollectionAndTaxonomyWheres() should skip the expensive EXISTS subquery entirely.

The entry's tags are already stored as slugs in the JSON data. If we know which specific slugs we're querying for, we just need to verify they exist in taxonomy_terms — no need to reverse-scan all entries.

In TermQueryBuilder.php:

private function applyCollectionAndTaxonomyWheres()
{
    // If we already have a slug filter, skip the expensive collection check.
    // The Terms fieldtype always provides specific slugs via whereIn('slug', $ids).
    if ($this->hasSlugConstraint()) {
        return;
    }

    // ... existing logic for unconstrained queries (collection index pages, etc.)
}

Solution B: Lazy evaluation of taxonomy terms in Antlers (CMS core)

PathDataManager::checkForValueIntercept() should NOT execute a Builder when it's used in a boolean context ({{ if tags }}). Instead, it should recognize that checking "is this truthy?" only needs to verify the slug array is non-empty — not execute the full term query.

Solution C: Replace WHERE EXISTS + JSON_CONTAINS with a junction table

The fundamental problem is using JSON_CONTAINS for relational lookups. A proper entry_term junction table would reduce the query from O(tags × entries) to O(1) with proper indexes.


Our Workaround

We created a custom Antlers tag {{ entry_tags }} that reads tags directly from the DB:

// Reads raw tag slugs from entries.data JSON
$data = DB::table('entries')->select('data')->where('id', $entryId)->first();
$slugs = json_decode($data->data, true)['tags'] ?? [];

// Simple indexed lookup
$tags = DB::table('taxonomy_terms')
    ->where('taxonomy', 'tags')
    ->whereIn('slug', $slugs)
    ->get();

And replaced ALL template usage:

  • {{ if tags }}{{ if {entry_tags:has} }}
  • {{ tags }} loops → {{ entry_tags }} loops
  • {{ tags | count }}{{ entry_tags:count }}

Results After Workaround

Page Before After (cold) After (cached)
News show (10k+10k) 12–23s 1.3s 0.6s
Tag page (10k+10k) 12–16s 1.6s 0.6s

10–20x improvement by simply avoiding the augmentation pipeline.


Reproduction Steps

  1. Use statamic/eloquent-driver for entries and taxonomy terms
  2. Create 2000+ taxonomy terms and 1000+ entries with 3+ terms each
  3. Create a template with {{ if tags }}...{{ /if }}
  4. Load the page — observe multi-second query times from applyCollectionAndTaxonomyWheres()
  5. Scale to 10,000 terms — observe query time grows linearly

Key Files

File Location Relevant Lines
applyCollectionAndTaxonomyWheres() statamic/eloquent-driver/src/Taxonomies/TermQueryBuilder.php 224–305
checkForValueIntercept() statamic/cms/src/View/Antlers/Language/Runtime/PathDataManager.php 467–500
evaluateDeferredNodeAsTag() statamic/cms/src/View/Antlers/Language/Runtime/NodeProcessor.php ~1004
Query::index() + evaluate() statamic/cms/src/Tags/Query.php 17–35
GetsQueryResults::results() statamic/cms/src/Tags/Concerns/GetsQueryResults.php 8–34
Terms::augment() statamic/cms/src/Fieldtypes/Terms.php 155–195
OrderedQueryBuilder::get() statamic/cms/src/Query/OrderedQueryBuilder.php 21–29

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions