Skip to content

time_bucket() silently dropped from projections after nodedb-sql migration #17

@emanzx

Description

@emanzx

Split out from #15 (numeric aggregate fix landed in PR #16, this part remains).

Build: af9a3ae on main.

time_bucket() is not being executed by the new nodedb-sql planner — function calls are silently elided rather than producing an error. All variants return either an empty result (with GROUP BY) or fall through to SELECT * (without GROUP BY).

Reproduction

rm -rf ~/.local/share/nodedb
NODEDB_PORT_ILP=8086 ./target/release/nodedb &
sleep 4
curl -s -X POST http://127.0.0.1:6480/query -H 'Content-Type: application/json' \
  -d '{"sql":"CREATE TIMESERIES dns_bench"}'

Send 100k rows spanning ~100 seconds (so a 1-hour bucket should trivially yield exactly one row of count = 100000):

import socket, random
from datetime import datetime, timezone
rng = random.Random(42)
ts = int(datetime.now(timezone.utc).timestamp() * 1e9)
for _ in range(20):
    lines = []
    for _ in range(5000):
        cip = "10.{}.{}.{}".format(rng.randint(0,255), rng.randint(0,255), rng.randint(1,254))
        lines.append("dns_bench,client_ip={},qtype=A,rcode=NOERROR,cached=true,qname=example.com elapsed_ms=0.5 {}".format(cip, ts))
        ts += 1_000_000
    s = socket.socket(); s.connect(("127.0.0.1",8086))
    s.sendall(("\n".join(lines)+"\n").encode())
    s.shutdown(socket.SHUT_WR)
    try: s.recv(1024)
    except: pass
    s.close()
Q() { curl -s -X POST http://127.0.0.1:6480/query -H 'Content-Type: application/json' -d "{\"sql\":\"$1\"}"; echo; }

Q "SELECT time_bucket('1 hour', timestamp) AS b, COUNT(*) FROM dns_bench GROUP BY b"
Q "SELECT time_bucket(timestamp, '1 hour') AS b, COUNT(*) FROM dns_bench GROUP BY b"
Q "SELECT time_bucket('1 hour', CAST(timestamp AS BIGINT)) AS b, COUNT(*) FROM dns_bench GROUP BY b"
Q "SELECT time_bucket(3600, timestamp) AS b, COUNT(*) FROM dns_bench GROUP BY b"
Q "SELECT time_bucket('5 minutes', timestamp) AS b, qtype, COUNT(*) FROM dns_bench GROUP BY b, qtype"
Q "SELECT time_bucket('1 hour', timestamp) FROM dns_bench LIMIT 3"

Actual

SELECT time_bucket('1 hour', timestamp) AS b, COUNT(*) ... GROUP BY b
  -> {"rows":[[]], "status":"ok"}    EMPTY

SELECT time_bucket(timestamp, '1 hour') AS b, COUNT(*) ... GROUP BY b
  -> {"rows":[[]], "status":"ok"}    EMPTY (arg-order swap)

SELECT time_bucket('1 hour', CAST(timestamp AS BIGINT)) AS b, COUNT(*) ... GROUP BY b
  -> {"rows":[[]], "status":"ok"}    EMPTY (BIGINT cast)

SELECT time_bucket(3600, timestamp) AS b, COUNT(*) ... GROUP BY b
  -> {"rows":[[]], "status":"ok"}    EMPTY (int seconds literal)

SELECT time_bucket('5 minutes', timestamp), qtype, COUNT(*) ... GROUP BY b, qtype
  -> {"rows":[[]], "status":"ok"}    EMPTY (multi-key group)

SELECT time_bucket('1 hour', timestamp) FROM dns_bench LIMIT 3
  -> {"rows":[[
       {"cached":"true","client_ip":"10.57.12.190","elapsed_ms":0.5,"qname":"example.com",
        "qtype":"A","rcode":"NOERROR","timestamp":1775602707091},
       {"cached":"true","client_ip":"10.140.125.58","elapsed_ms":0.5,"qname":"example.com",
        "qtype":"A","rcode":"NOERROR","timestamp":1775602707092},
       {"cached":"true","client_ip":"10.71.52.174","elapsed_ms":0.5,"qname":"example.com",
        "qtype":"A","rcode":"NOERROR","timestamp":1775602707093}
     ]], "status":"ok"}

The last query is the smoking gun: the projection time_bucket('1 hour', timestamp) is silently dropped, the planner falls through to SELECT *, and the original row schema leaks back. No time_bucket(...) column appears in the output, and there is no error returned. This strongly suggests the function is unregistered in the nodedb-sql function table and the call is being elided rather than rejected.

Expected

SELECT time_bucket('1 hour', timestamp), COUNT(*) FROM dns_bench GROUP BY 1
  -> exactly one row, count_all = 100000
SELECT time_bucket('5 minutes', timestamp), COUNT(*) FROM dns_bench GROUP BY 1
  -> ~2 rows (data spans ~100s, may straddle a 5-min boundary)
SELECT time_bucket('1 hour', timestamp) FROM dns_bench LIMIT 3
  -> 3 rows of a single column named like `time_bucket('1 hour', timestamp)` or aliased

Notes

  • Same behavior in b0d0f68 for the literal-arg form, but in that build the CAST(timestamp AS BIGINT) workaround did return a valid bucket. After the nodedb-sql migration even the cast workaround is broken — nothing routes through. So this is a regression of an already-fragile path; the previous DataFusion-backed time_bucket(Utf8, Int64) registration is gone and nothing replaces it.
  • Related: any benchmark or dashboard query that buckets by time will hit this. It is the only query family in our 10-query DNS-telemetry benchmark suite that still fails after PR fix: resolve null aggregates and broken time_bucket after SQL migration #16; everything else (COUNT, SELECT *, WHERE, GROUP BY tag, MIN/MAX/SUM/AVG) works against ILP-ingested timeseries collections.
  • Suggestion: when the function is registered in nodedb-sql, add a regression test asserting both the GROUP BY shape and the bare-projection shape (SELECT time_bucket(...) FROM t LIMIT 1) so a future silent-drop regression can't recur.

Build: af9a3ae on main. Linux x86_64, release build.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions