feat: Implement date_part scalar function #27005
feat: Implement date_part scalar function #27005devanbenz wants to merge 62 commits intomaster-1.xfrom
date_part scalar function #27005Conversation
Remove a lot of code that wasn't needed for date_part including iterator creation. We can just map values similar to simple math functions.
|
|
||
| const ( | ||
| DatePartString = "date_part" | ||
| DatePartTimeString = "date_part_time" |
There was a problem hiding this comment.
What is date_part_time? I don't see any tests for it.
There was a problem hiding this comment.
It's used to create a reference to time since time is an auxiliary field https://github.com/influxdata/influxdb/pull/27005/files#diff-609a7e16be956ed6386e1a4a4efadf600b7d4de7dcfea27330dc692d1e901dc8R930-R944 I'm going to create some ValueMapper tests for this.
There was a problem hiding this comment.
@gwossum I can add tests for this but it would likely require exporting
Line 881 in 362217b
davidby-influx
left a comment
There was a problem hiding this comment.
Some changes from the first pass. Will review again after changes.
| // Multiple selectors WITH date_part should also error | ||
| {s: `SELECT value, first(value), last(value), date_part('dow', time) FROM cpu`, err: `mixing multiple selector functions with tags or fields is not supported`}, | ||
| // date_part subquery validation - cannot be sole field | ||
| {s: `SELECT date_part('dow', value) FROM (SELECT value FROM cpu)`, err: `date_part: second argument must be time VarRef`}, |
There was a problem hiding this comment.
In the spec, it says:
expression: Time expression to operate on. Can be a constant, column, or function.
Here it looks like you are prohibiting anything not the time column. Which is correct?
There was a problem hiding this comment.
I need to modify our version of the spec. I think it should only operate on the time column. For v3 they can decide if they would like to add more features and allow it to work on constants.
There was a problem hiding this comment.
I just changed the spec in the PR description to match: #27001 I recall we had a conversation yesterday during standup about the usefulness of constant timestamps. I verified that for other functions in influxdb 1.x (math functions predominately) we don't allow constants, so implementing a constant here would be non-trivial.
in the group by clause, for example: SELECT count(*) FROM cpu GROUP BY date_part('year', time),date_part('month', time),host
|
Testing with influx cli changes: |
There was a problem hiding this comment.
Pull request overview
Adds InfluxQL support for a date_part(part, time) scalar function (including validation, evaluation, and query planning hooks) and extends query execution to support GROUP BY date_part(...) by threading date-part-derived grouping metadata through iterators and result emission.
Changes:
- Implement
date_partparsing/validation, runtime evaluation (DatePartValuer), and date-part extraction helpers. - Add iterator/planner plumbing to (a) evaluate
date_part()in WHERE conditions using the point timestamp and (b) supportGROUP BY date_part(...)via an extensibleDimensionGrouper. - Extend result/CLI structures to carry and display non-tag grouping metadata (
grouping_keys) and add broad integration tests.
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| tsdb/shard.go | Switch illegal "time" tag/field checks to use models.TimeBytes. |
| tsdb/field_validator.go | Switch illegal "time" field checks to use models.TimeBytes. |
| tsdb/engine/tsm1/iterator.gen.go.tmpl | Add timestamp mapping for condition eval and compute date-part aux values for grouping. |
| tsdb/engine/tsm1/iterator.gen.go | Generated iterator changes mirroring template updates (time mapping + date-part aux). |
| query/date_part.go | New: date_part core implementation (validation, extraction, valuer, grouper + encoding). |
| query/dimension_grouper.go | New: abstraction for non-tag/time grouping key resolution. |
| query/iterator.go | Extend IteratorOptions with date-part dimension support and cache NeedTimeRef. |
| query/select.go | Add date-part GROUP BY dimensions as output columns and adjust mapper handling for date_part. |
| query/iterator.gen.go.tmpl | Teach scanners/reducers about date-part grouping entries and time constant usage. |
| query/iterator.gen.go | Generated query iterator changes for grouping-key resolution and scan handling. |
| query/cursor.go | Add grouping_keys propagation via scanner cursor and enable date_part evaluation via timestamp injection. |
| query/emitter.go | Include grouping keys in row emission and header suppression behavior. |
| query/functions.go | Register date_part return type as Integer. |
| query/compile.go | Allow date_part() in dimensions and conditions; validate date_part calls. |
| query/compile_test.go | Add compile success/failure coverage for date_part. |
| query/date_part_test.go | New: unit tests for parsing/validation/valuer and date-part grouper round-trips. |
| models/time.go | Add models.TimeBytes/models.TimeString shared constants. |
| models/rows.go | Add GroupingKeys to API row model and incorporate it into series equality logic. |
| coordinator/statement_executor.go | Use models.TimeString for result column naming. |
| cmd/influx/cli/cli.go | Suppress headers only when grouping keys match; display grouping keys in formatted output. |
| tests/server_test.go | Large integration suite for date_part in WHERE/SELECT/GROUP BY/subqueries. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
… it out of bounds during min/max
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 21 out of 21 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // date_part in both SELECT and GROUP BY with tag | ||
| // NOTE: The explicit SELECT date_part shows incorrect values (always first group's value) | ||
| // when combined with GROUP BY date_part + tag. | ||
| &Query{ | ||
| name: `SELECT date_part with GROUP BY host and year`, | ||
| command: `SELECT COUNT(value), date_part('year', time) FROM db0.rp0.cpu WHERE time >= '2023-01-01T00:00:00Z' AND time <= '2025-12-31T23:59:59Z' GROUP BY host, date_part('year', time)`, | ||
| exp: `{"results":[{"statement_id":0,"series":[` + | ||
| `{"name":"cpu","tags":{"host":"server01"},"grouping_keys":["year"],"columns":["time","count","date_part","year"],"values":[` + | ||
| `["2023-01-01T00:00:00Z",6,2023,2023]` + | ||
| `]},` + | ||
| `{"name":"cpu","tags":{"host":"server02"},"grouping_keys":["year"],"columns":["time","count","date_part","year"],"values":[` + | ||
| `["2023-01-01T00:00:00Z",6,2023,2024]` + | ||
| `]},` + | ||
| `{"name":"cpu","tags":{"host":"server03"},"grouping_keys":["year"],"columns":["time","count","date_part","year"],"values":[` + | ||
| `["2023-01-01T00:00:00Z",7,2023,2025]` + | ||
| `]}]}]}`, |
There was a problem hiding this comment.
This test explicitly encodes known-wrong behavior: SELECT COUNT(value), date_part('year', time) ... GROUP BY ... returns date_part=2023 for all groups (see note), even though the grouped year column is correct (2024/2025). This should be fixed in the implementation (e.g., rewrite date_part(...) in the SELECT list to reference the computed date_part dimension VarRef when it matches a GROUP BY date_part), and then the expected JSON should be updated to assert the correct per-group date_part value.
Implement
date_part(part, expression)Arguments
Example usage:
Please see #27001 (comment) for additional information on date_part limitations in 1.x.