Skip to content

perf(cdr): optimize queries for large CDR installations#207

Open
edospadoni wants to merge 7 commits intons8from
perf/cdr-query-optimization
Open

perf(cdr): optimize queries for large CDR installations#207
edospadoni wants to merge 7 commits intons8from
perf/cdr-query-optimization

Conversation

@edospadoni
Copy link
Member

Problem

The tasks component queries take 15-20 minutes on installations with ~2M CDR records.
Main bottlenecks:

  • date_format() on indexable columns preventing index usage
  • No indexes on generated cdr_YYYY and cdr_YYYY-MM tables
  • Correlated subqueries for trunk type detection executed per-row
  • Correlated subqueries for geo-lookup (region/province) executed millions of times
  • No indexes on the source cdr table
  • REGEXP usage where LIKE suffices
  • Sequential execution of SQL views

Optimizations

1. Indexes on source cdr table

  • Added idx_cdr_calldate (calldate) for range scans
  • Added idx_cdr_linkedid (linkedid) for GROUP BY
  • Created both in schema.sql.tmpl and idempotently in cdr.go (ensureCDRSourceIndexes)

2. Fix date_format() anti-pattern with range scans

  • cdr_year.sql: date_format(calldate, "%Y") = "YYYY" replaced with calldate >= 'YYYY-01-01' AND calldate < ... + INTERVAL 1 YEAR
  • cdr_month.sql: date_format(calldate, "%Y-%m") replaced with range scan using INTERVAL 1 MONTH
  • Daily INSERT IGNORE: uses DATE(NOW() - INTERVAL 1 DAY) / DATE(NOW())

3. Pre-compute trunk type with LEFT JOIN

  • Replaced correlated subquery get_trunk_name(channel) IN (SELECT channelid FROM asterisk.trunks) with a temp table _trunk_list + LEFT JOIN
  • From N subqueries per row to a single JOIN

4. Indexes on generated cdr_YYYY and cdr_YYYY-MM tables

  • Yearly tables: idx_type_calldate, idx_type, idx_cnum, idx_dst, idx_channel, idx_dstchannel, idx_type_cnum_calldate
  • Monthly tables: idx_type_calldate, idx_type, idx_cnum, idx_dst, idx_calldate
  • Created in SQL templates (ADD INDEX IF NOT EXISTS) and idempotently in Go (ensureTableIndexes)

5. Pre-computed geographic columns

  • Added 4 columns to cdr_YYYY: src_region, src_province, dst_region, dst_province
  • Populated via lookup table on unique phone numbers (order of magnitude fewer than total records)
  • Incremental updates: WHERE src_region IS NULL (only new records)
  • Dashboard views (6, 7, 17, 18) rewritten to read columns directly instead of correlated subqueries

6. REGEXP replaced with LIKE for dispositions

  • Replaced REGEXP 'ANSWERED' with LIKE '%ANSWERED%' across 12 SQL files
  • Updated Go function ExtractDispositions in utils.go
  • REGEXP 'FOO$' becomes LIKE '%FOO' (semantically equivalent, much faster)

7. Parallel view execution

  • views.go now executes SQL views in parallel with 4 worker goroutines
  • Views are independent, no conflicts

Benchmark (installation with ~665K records/year, ~8M total)

Typical dashboard query (SELECT type, COUNT(*) ... WHERE type='IN' AND calldate >= ... GROUP BY type):

Time Rows scanned
Before (full scan) 1.25s 665,282
After (index scan) 0.092s 332,641
Speedup 13.6x Using index (index-only)

Daily INSERT on source cdr table:

Scan type Rows
Before ALL 7,756,643
After range 59,302
Speedup 130x fewer rows

Migration

No manual operations required. Migration is fully automatic:

  1. SQL templates (cdr_year.sql, cdr_month.sql, dashboard views): re-executed nightly by cron (tasks cdr + tasks views), so they update automatically on the first run after the update.

  2. Indexes on existing tables: ensureCDRSourceIndexes() and ensureTableIndexes() in cdr.go check for index existence via information_schema and add them if missing. After the first run they become a fast no-op (metadata SELECT only).

  3. Geographic columns: ADD COLUMN IF NOT EXISTS + WHERE src_region IS NULL ensures:

    • First run after update: all rows are updated (new columns = NULL)
    • Subsequent runs: only yesterday's new records are updated (incremental)
  4. Schema template (schema.sql.tmpl): indexes on cdr use information_schema logic for compatibility with the @database_name placeholder.

All operations are idempotent and safe to re-execute.

Test plan

  • Verify indexes: SHOW INDEX FROM cdr_YYYY WHERE Key_name LIKE 'idx_%'
  • Verify geo columns: SELECT src_region, COUNT(*) FROM cdr_YYYY WHERE type='IN' GROUP BY src_region
  • Verify EXPLAIN uses indexes: EXPLAIN SELECT COUNT(*) FROM cdr_YYYY WHERE type='IN' AND calldate >= ...
  • Run tasks cdr with DEBUG=1 and verify completion
  • Run tasks views and verify completion
  • Compare before/after times on installation with ~2M records

…ed geo columns

Replace date_format() anti-patterns with range scans to enable index usage,
add indexes on cdr source table and generated cdr_YYYY/cdr_YYYY-MM tables,
pre-compute geographic columns (src_region, src_province, dst_region, dst_province)
to eliminate millions of correlated subqueries in dashboard views.
…ize views

- Replace correlated IN (SELECT) subqueries for trunk type detection
  with a temporary table + LEFT JOIN (avoids per-row subquery on 2M rows)
- Replace all dispositions REGEXP patterns with LIKE equivalents
  in 12 SQL query templates and Go ExtractDispositions function
- Parallelize view execution in views.go using goroutines with
  bounded concurrency (4 workers) for independent SQL view files
edospadoni added a commit to nethesis/ns8-nethvoice that referenced this pull request Feb 26, 2026
Points to nethesis/nethvoice-report#207 which includes:
- Indexes on source cdr table and generated cdr_YYYY/cdr_YYYY-MM tables
- Range scans replacing date_format() anti-patterns
- Pre-computed geographic columns eliminating correlated subqueries
- LEFT JOIN for trunk type detection instead of per-row subqueries
- REGEXP replaced with LIKE for disposition filters
- Parallel view execution with goroutines
Prevent update timeout by removing index creation from schema.sql.tmpl
(runs during container startup) and SQL templates. All DDL and geo
population now runs via separate db.Exec() calls in Go using a dedicated
connection pool without read/write timeout. Geo UPDATEs are batched
(100K rows) to keep each operation bounded.
edospadoni added a commit to nethesis/ns8-nethvoice that referenced this pull request Feb 26, 2026
Points to nethesis/nethvoice-report#207 which includes:
- Indexes on source cdr table and generated cdr_YYYY/cdr_YYYY-MM tables
- Range scans replacing date_format() anti-patterns
- Pre-computed geographic columns eliminating correlated subqueries
- LEFT JOIN for trunk type detection instead of per-row subqueries
- REGEXP replaced with LIKE for disposition filters
- Parallel view execution with goroutines
Stell0 and others added 4 commits February 27, 2026 09:29
Add idempotent index creation for queue_log, queue_log_history,
report_queue, and cdr in the miner script.

These tables are filtered and joined repeatedly by time range, event,
queue, callid, agent, and linkedid during each consolidation run. Without
proper indexes, MySQL falls back to wide scans, making the job slow and
increasing lock time on busy systems.

Creating targeted composite indexes improves selectivity for the hottest
predicates, reduces I/O and temporary work, and keeps report generation
stable as data grows.
Month tables created before the geo migration don't have the
src_region/src_province/dst_region/dst_province columns. When
the year table gets them via migrateGeoColumns, the subsequent
INSERT ... SELECT * into the month table fails with column count
mismatch. Fix by ensuring geo columns exist on month tables before
running the month template.
Tables created before the geo migration don't have the geo columns.
When cdr_year.sql includes NULL geo placeholders in SELECT, or when
cdr_month.sql does SELECT * from a year table with geo columns, the
INSERT fails with column count mismatch.

Fix by:
- Including geo columns in cdr_year.sql CREATE TABLE definition and
  both SELECT statements (NULL placeholders)
- Calling ensureGeoColumns() on both year and month tables before
  running their templates
This reverts commit 0104d29.

The indexes update will be in freepbx container startup
nethesis/ns8-nethvoice#717
Stell0 added a commit to nethesis/ns8-nethvoice that referenced this pull request Mar 3, 2026
Add conditional CDR index creation during FreePBX startup for upgrades,
so existing installations get the required indexes only if missing.

Add the same indexes to the CDR schema used on fresh installs to keep
new deployments aligned with upgraded ones.

Refs: nethesis/nethvoice-report#207
Stell0 added a commit to nethesis/ns8-nethvoice that referenced this pull request Mar 3, 2026
Extend slow database updates to create missing indexes on queue and
report tables during upgrades, without changing fresh-install schemas.

Add conditional indexes on queue_log, queue_log_history, and
report_queue to improve report mining query performance.

Refs: nethesis/nethvoice-report#207
edospadoni added a commit to nethesis/ns8-nethvoice that referenced this pull request Mar 4, 2026
Points to nethesis/nethvoice-report#207 which includes:
- Indexes on source cdr table and generated cdr_YYYY/cdr_YYYY-MM tables
- Range scans replacing date_format() anti-patterns
- Pre-computed geographic columns eliminating correlated subqueries
- LEFT JOIN for trunk type detection instead of per-row subqueries
- REGEXP replaced with LIKE for disposition filters
- Parallel view execution with goroutines
edospadoni added a commit to nethesis/ns8-nethvoice that referenced this pull request Mar 5, 2026
Points to nethesis/nethvoice-report#207 which includes:
- Indexes on source cdr table and generated cdr_YYYY/cdr_YYYY-MM tables
- Range scans replacing date_format() anti-patterns
- Pre-computed geographic columns eliminating correlated subqueries
- LEFT JOIN for trunk type detection instead of per-row subqueries
- REGEXP replaced with LIKE for disposition filters
- Parallel view execution with goroutines
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants