-
Notifications
You must be signed in to change notification settings - Fork 1
Eliminate O(tags × entries) cross-join in TermQueryBuilder::applyCollectionAndTaxonomyWheres() #1449
Description
Summary
When using the Eloquent Driver with taxonomy terms, the TermQueryBuilder::applyCollectionAndTaxonomyWheres() method generates a query that performs a full cross-join between ALL taxonomy_terms and ALL entries using JSON_CONTAINS inside a WHERE EXISTS subquery. This query cannot use any index and scales as O(tags × entries), making it unusable at production scale.
Worse, this query is silently triggered by any template that accesses a tags field — even just {{ if tags }} — because Statamic's Antlers runtime intercepts Builder objects via PathDataManager::checkForValueIntercept() and executes them through the Query tag.
The Problem
The Killer Query
TermQueryBuilder::applyCollectionAndTaxonomyWheres() (lines 224–305 in statamic/eloquent-driver) generates this SQL:
SELECT slug FROM taxonomy_terms
WHERE taxonomy = 'tags'
AND EXISTS (
SELECT 1 FROM entries
WHERE collection IN ('news')
AND JSON_CONTAINS(entries.data->'$."tags"', CONCAT('"', taxonomy_terms.slug, '"'))
)This evaluates JSON_CONTAINS for every combination of term and entry. With 10,000 tags × 10,000 entries = 100 million evaluations.
Measured Performance
| Scale | Killer Query Time | Page Load (cold) |
|---|---|---|
| 2,100 tags × 119 news | ~2.3s | ~4s |
| 2,100 tags × 3,119 news | ~6.4s | ~12-16s |
| 10,100 tags × 10,119 news | ~6s × 3 calls | ~23s |
| 12,359 tags × 3,264 news (production) | estimated 40–120s | TIMEOUT |
Hidden Trigger: {{ if tags }} in Templates
The query fires not just when iterating over tags, but any time a template accesses a taxonomy term field:
{{ if tags }} {{# This triggers the killer query #}}
Show tags section
{{ /if }}Call chain:
PathDataManager::getData()(line 634) → accessestagsfieldTerms::augment()returns anOrderedQueryBuilder(no SQL yet)PathDataManager::checkForValueIntercept()(line 467) detects the BuilderNodeProcessor::evaluateDeferredNodeAsTag()(line 486) invokesStatamic\Tags\Query->index()GetsQueryResults::results()(line 31) calls$query->get()TermQueryBuilder::get()→applyCollectionAndTaxonomyWheres()→ KILLER QUERY
If a page has multiple sections that check {{ if tags }} (e.g., related news cards, most-read cards), the query fires multiple times per request — we measured 3 executions at ~6s each = ~18s just for tag existence checks.
Proposed Solutions
Solution A: Add whereIn('slug', [...]) optimization (Eloquent Driver)
When the query builder already has a whereIn('slug', $slugs) constraint (which it does — Terms::queryBuilder() adds it at line 182), applyCollectionAndTaxonomyWheres() should skip the expensive EXISTS subquery entirely.
The entry's tags are already stored as slugs in the JSON data. If we know which specific slugs we're querying for, we just need to verify they exist in taxonomy_terms — no need to reverse-scan all entries.
In TermQueryBuilder.php:
private function applyCollectionAndTaxonomyWheres()
{
// If we already have a slug filter, skip the expensive collection check.
// The Terms fieldtype always provides specific slugs via whereIn('slug', $ids).
if ($this->hasSlugConstraint()) {
return;
}
// ... existing logic for unconstrained queries (collection index pages, etc.)
}Solution B: Lazy evaluation of taxonomy terms in Antlers (CMS core)
PathDataManager::checkForValueIntercept() should NOT execute a Builder when it's used in a boolean context ({{ if tags }}). Instead, it should recognize that checking "is this truthy?" only needs to verify the slug array is non-empty — not execute the full term query.
Solution C: Replace WHERE EXISTS + JSON_CONTAINS with a junction table
The fundamental problem is using JSON_CONTAINS for relational lookups. A proper entry_term junction table would reduce the query from O(tags × entries) to O(1) with proper indexes.
Our Workaround
We created a custom Antlers tag {{ entry_tags }} that reads tags directly from the DB:
// Reads raw tag slugs from entries.data JSON
$data = DB::table('entries')->select('data')->where('id', $entryId)->first();
$slugs = json_decode($data->data, true)['tags'] ?? [];
// Simple indexed lookup
$tags = DB::table('taxonomy_terms')
->where('taxonomy', 'tags')
->whereIn('slug', $slugs)
->get();And replaced ALL template usage:
{{ if tags }}→{{ if {entry_tags:has} }}{{ tags }}loops →{{ entry_tags }}loops{{ tags | count }}→{{ entry_tags:count }}
Results After Workaround
| Page | Before | After (cold) | After (cached) |
|---|---|---|---|
| News show (10k+10k) | 12–23s | 1.3s | 0.6s |
| Tag page (10k+10k) | 12–16s | 1.6s | 0.6s |
10–20x improvement by simply avoiding the augmentation pipeline.
Reproduction Steps
- Use
statamic/eloquent-driverfor entries and taxonomy terms - Create 2000+ taxonomy terms and 1000+ entries with 3+ terms each
- Create a template with
{{ if tags }}...{{ /if }} - Load the page — observe multi-second query times from
applyCollectionAndTaxonomyWheres() - Scale to 10,000 terms — observe query time grows linearly
Key Files
| File | Location | Relevant Lines |
|---|---|---|
applyCollectionAndTaxonomyWheres() |
statamic/eloquent-driver/src/Taxonomies/TermQueryBuilder.php |
224–305 |
checkForValueIntercept() |
statamic/cms/src/View/Antlers/Language/Runtime/PathDataManager.php |
467–500 |
evaluateDeferredNodeAsTag() |
statamic/cms/src/View/Antlers/Language/Runtime/NodeProcessor.php |
~1004 |
Query::index() + evaluate() |
statamic/cms/src/Tags/Query.php |
17–35 |
GetsQueryResults::results() |
statamic/cms/src/Tags/Concerns/GetsQueryResults.php |
8–34 |
Terms::augment() |
statamic/cms/src/Fieldtypes/Terms.php |
155–195 |
OrderedQueryBuilder::get() |
statamic/cms/src/Query/OrderedQueryBuilder.php |
21–29 |