perf(cdr): optimize queries for large CDR installations by edospadoni · Pull Request #207 · nethesis/nethvoice-report

edospadoni · 2026-02-26T14:35:44Z

Problem

The tasks component queries take 15-20 minutes on installations with ~2M CDR records.
Main bottlenecks:

date_format() on indexable columns preventing index usage
No indexes on generated cdr_YYYY and cdr_YYYY-MM tables
Correlated subqueries for trunk type detection executed per-row
Correlated subqueries for geo-lookup (region/province) executed millions of times
No indexes on the source cdr table
REGEXP usage where LIKE suffices
Sequential execution of SQL views

Optimizations

1. Indexes on source `cdr` table

Added idx_cdr_calldate (calldate) for range scans
Added idx_cdr_linkedid (linkedid) for GROUP BY
Created both in schema.sql.tmpl and idempotently in cdr.go (ensureCDRSourceIndexes)

2. Fix `date_format()` anti-pattern with range scans

cdr_year.sql: date_format(calldate, "%Y") = "YYYY" replaced with calldate >= 'YYYY-01-01' AND calldate < ... + INTERVAL 1 YEAR
cdr_month.sql: date_format(calldate, "%Y-%m") replaced with range scan using INTERVAL 1 MONTH
Daily INSERT IGNORE: uses DATE(NOW() - INTERVAL 1 DAY) / DATE(NOW())

3. Pre-compute trunk type with LEFT JOIN

Replaced correlated subquery get_trunk_name(channel) IN (SELECT channelid FROM asterisk.trunks) with a temp table _trunk_list + LEFT JOIN
From N subqueries per row to a single JOIN

4. Indexes on generated `cdr_YYYY` and `cdr_YYYY-MM` tables

Yearly tables: idx_type_calldate, idx_type, idx_cnum, idx_dst, idx_channel, idx_dstchannel, idx_type_cnum_calldate
Monthly tables: idx_type_calldate, idx_type, idx_cnum, idx_dst, idx_calldate
Created in SQL templates (ADD INDEX IF NOT EXISTS) and idempotently in Go (ensureTableIndexes)

5. Pre-computed geographic columns

Added 4 columns to cdr_YYYY: src_region, src_province, dst_region, dst_province
Populated via lookup table on unique phone numbers (order of magnitude fewer than total records)
Incremental updates: WHERE src_region IS NULL (only new records)
Dashboard views (6, 7, 17, 18) rewritten to read columns directly instead of correlated subqueries

6. REGEXP replaced with LIKE for dispositions

Replaced REGEXP 'ANSWERED' with LIKE '%ANSWERED%' across 12 SQL files
Updated Go function ExtractDispositions in utils.go
REGEXP 'FOO$' becomes LIKE '%FOO' (semantically equivalent, much faster)

7. Parallel view execution

views.go now executes SQL views in parallel with 4 worker goroutines
Views are independent, no conflicts

Benchmark (installation with ~665K records/year, ~8M total)

Typical dashboard query (SELECT type, COUNT(*) ... WHERE type='IN' AND calldate >= ... GROUP BY type):

	Time	Rows scanned
Before (full scan)	1.25s	665,282
After (index scan)	0.092s	332,641
Speedup	13.6x	Using index (index-only)

Daily INSERT on source cdr table:

	Scan type	Rows
Before	ALL	7,756,643
After	range	59,302
Speedup	130x fewer rows

Migration

No manual operations required. Migration is fully automatic:

SQL templates (cdr_year.sql, cdr_month.sql, dashboard views): re-executed nightly by cron (tasks cdr + tasks views), so they update automatically on the first run after the update.
Indexes on existing tables: ensureCDRSourceIndexes() and ensureTableIndexes() in cdr.go check for index existence via information_schema and add them if missing. After the first run they become a fast no-op (metadata SELECT only).
Geographic columns: ADD COLUMN IF NOT EXISTS + WHERE src_region IS NULL ensures:
- First run after update: all rows are updated (new columns = NULL)
- Subsequent runs: only yesterday's new records are updated (incremental)
Schema template (schema.sql.tmpl): indexes on cdr use information_schema logic for compatibility with the @database_name placeholder.

All operations are idempotent and safe to re-execute.

Test plan

Verify indexes: SHOW INDEX FROM cdr_YYYY WHERE Key_name LIKE 'idx_%'
Verify geo columns: SELECT src_region, COUNT(*) FROM cdr_YYYY WHERE type='IN' GROUP BY src_region
Verify EXPLAIN uses indexes: EXPLAIN SELECT COUNT(*) FROM cdr_YYYY WHERE type='IN' AND calldate >= ...
Run tasks cdr with DEBUG=1 and verify completion
Run tasks views and verify completion
Compare before/after times on installation with ~2M records

…ed geo columns Replace date_format() anti-patterns with range scans to enable index usage, add indexes on cdr source table and generated cdr_YYYY/cdr_YYYY-MM tables, pre-compute geographic columns (src_region, src_province, dst_region, dst_province) to eliminate millions of correlated subqueries in dashboard views.

…ize views - Replace correlated IN (SELECT) subqueries for trunk type detection with a temporary table + LEFT JOIN (avoids per-row subquery on 2M rows) - Replace all dispositions REGEXP patterns with LIKE equivalents in 12 SQL query templates and Go ExtractDispositions function - Parallelize view execution in views.go using goroutines with bounded concurrency (4 workers) for independent SQL view files

Points to nethesis/nethvoice-report#207 which includes: - Indexes on source cdr table and generated cdr_YYYY/cdr_YYYY-MM tables - Range scans replacing date_format() anti-patterns - Pre-computed geographic columns eliminating correlated subqueries - LEFT JOIN for trunk type detection instead of per-row subqueries - REGEXP replaced with LIKE for disposition filters - Parallel view execution with goroutines

Prevent update timeout by removing index creation from schema.sql.tmpl (runs during container startup) and SQL templates. All DDL and geo population now runs via separate db.Exec() calls in Go using a dedicated connection pool without read/write timeout. Geo UPDATEs are batched (100K rows) to keep each operation bounded.

Points to nethesis/nethvoice-report#207 which includes: - Indexes on source cdr table and generated cdr_YYYY/cdr_YYYY-MM tables - Range scans replacing date_format() anti-patterns - Pre-computed geographic columns eliminating correlated subqueries - LEFT JOIN for trunk type detection instead of per-row subqueries - REGEXP replaced with LIKE for disposition filters - Parallel view execution with goroutines

Add idempotent index creation for queue_log, queue_log_history, report_queue, and cdr in the miner script. These tables are filtered and joined repeatedly by time range, event, queue, callid, agent, and linkedid during each consolidation run. Without proper indexes, MySQL falls back to wide scans, making the job slow and increasing lock time on busy systems. Creating targeted composite indexes improves selectivity for the hottest predicates, reduces I/O and temporary work, and keeps report generation stable as data grows.

Month tables created before the geo migration don't have the src_region/src_province/dst_region/dst_province columns. When the year table gets them via migrateGeoColumns, the subsequent INSERT ... SELECT * into the month table fails with column count mismatch. Fix by ensuring geo columns exist on month tables before running the month template.

Tables created before the geo migration don't have the geo columns. When cdr_year.sql includes NULL geo placeholders in SELECT, or when cdr_month.sql does SELECT * from a year table with geo columns, the INSERT fails with column count mismatch. Fix by: - Including geo columns in cdr_year.sql CREATE TABLE definition and both SELECT statements (NULL placeholders) - Calling ensureGeoColumns() on both year and month tables before running their templates

This reverts commit 0104d29. The indexes update will be in freepbx container startup nethesis/ns8-nethvoice#717

Add conditional CDR index creation during FreePBX startup for upgrades, so existing installations get the required indexes only if missing. Add the same indexes to the CDR schema used on fresh installs to keep new deployments aligned with upgraded ones. Refs: nethesis/nethvoice-report#207

Extend slow database updates to create missing indexes on queue and report tables during upgrades, without changing fresh-install schemas. Add conditional indexes on queue_log, queue_log_history, and report_queue to improve report mining query performance. Refs: nethesis/nethvoice-report#207

Points to nethesis/nethvoice-report#207 which includes: - Indexes on source cdr table and generated cdr_YYYY/cdr_YYYY-MM tables - Range scans replacing date_format() anti-patterns - Pre-computed geographic columns eliminating correlated subqueries - LEFT JOIN for trunk type detection instead of per-row subqueries - REGEXP replaced with LIKE for disposition filters - Parallel view execution with goroutines

edospadoni added 2 commits February 26, 2026 15:03

This was referenced Feb 26, 2026

perf: update nethvoice-reports with CDR query optimizations nethesis/ns8-nethvoice#709

Closed

perf: update nethvoice-reports with CDR query optimizations nethesis/ns8-nethvoice#710

Open

Stell0 and others added 4 commits February 27, 2026 09:29

Revert "perf(queue-miner): Add indexes for queue report queries"

c96fb39

This reverts commit 0104d29. The indexes update will be in freepbx container startup nethesis/ns8-nethvoice#717

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(cdr): optimize queries for large CDR installations#207

perf(cdr): optimize queries for large CDR installations#207
edospadoni wants to merge 7 commits intons8from
perf/cdr-query-optimization

edospadoni commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

edospadoni commented Feb 26, 2026

Problem

Optimizations

1. Indexes on source cdr table

2. Fix date_format() anti-pattern with range scans

3. Pre-compute trunk type with LEFT JOIN

4. Indexes on generated cdr_YYYY and cdr_YYYY-MM tables

5. Pre-computed geographic columns

6. REGEXP replaced with LIKE for dispositions

7. Parallel view execution

Benchmark (installation with ~665K records/year, ~8M total)

Migration

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. Indexes on source `cdr` table

2. Fix `date_format()` anti-pattern with range scans

4. Indexes on generated `cdr_YYYY` and `cdr_YYYY-MM` tables