Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #406 +/- ##
==========================================
+ Coverage 81.80% 82.07% +0.27%
==========================================
Files 124 124
Lines 14230 14419 +189
==========================================
+ Hits 11641 11835 +194
+ Misses 2589 2584 -5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Adds local time support for tz-naive timestamps by introducing Chronify-based localization during dataset registration, along with new/updated time-zone configuration semantics and tests.
Changes:
- Adds Chronify-powered timestamp localization helpers and wires them into dataset registration.
- Updates time zone format modeling (rename to
aligned_in_local_std_time, allowtime_zone=Nonefor aligned absolute time). - Adds tests for localization routing and UTC-offset parsing in time-in-parts conversion.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_localize_timestamps_if_necessary.py | New tests validating localization routing and no-op behavior. |
| tests/test_create_time_dimensions.py | Minor test fix for periods argument type. |
| tests/test_convert_time_format_if_necessary.py | New tests for UTC offset parsing and timestamp transformation behavior. |
| tests/data/dimension_models/minimal/dimension_test_time.json5 | Updates time zone format string to new enum value. |
| pyproject.toml | Switches Chronify dependency to a Git URL/branch for local-time work. |
| missing_associations/geography__subsector.csv | Adds missing associations CSV fixture. |
| dsgrid/utils/dataset.py | Adds Chronify localization helpers and localize_timestamps_if_necessary. |
| dsgrid/registry/dataset_registry_manager.py | Integrates time-in-parts conversion and timestamp localization into registration. |
| dsgrid/query/query_submitter.py | Uses TIME_ZONE_COLUMN consistently; refactors Chronify conversion routing. |
| dsgrid/dimension/time.py | Renames time zone enum value to aligned_in_local_std_time. |
| dsgrid/dataset/dataset_schema_handler_base.py | Adds Chronify-based timestamp localization pathway at schema-handler level. |
| dsgrid/config/index_time_dimension_config.py | Updates Chronify return type hints for index time config. |
| dsgrid/config/dimensions.py | Updates time config models (offset column, time zone formats, removes legacy model). |
| dsgrid/config/date_time_dimension_config.py | Adds Chronify dtype support + localization plan logic. |
| dsgrid/common.py | Adds TIME_COLUMN constant. |
| dsgrid-test-data | Updates submodule pointer. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
dsgrid/query/query_submitter.py
Outdated
| match (config.backend_engine, config.use_hive_metastore): | ||
| case (BackendEngine.SPARK, True): | ||
| df = convert_time_zone_with_chronify_spark_hive( | ||
| df=df, | ||
| value_column=VALUE_COLUMN, | ||
| from_time_dim=time_dim, | ||
| time_zone=model.result.time_zone, | ||
| scratch_dir_context=scratch_dir_context, | ||
| ) |
There was a problem hiding this comment.
This refactor removed the explicit time_dim.supports_chronify() validation/guard. If a non-Chronify-capable time dimension reaches this path, failures will likely be harder to interpret (thrown deeper in Chronify conversion). Consider restoring the explicit validation and raising a clear DSGInvalidParameter when Chronify conversion isn’t supported.
dsgrid/utils/dataset.py
Outdated
| runtime_config = dsgrid.runtime_config | ||
| match localization_plan: | ||
| case "localize_to_single_tz": | ||
| to_time_zone = time_dim._get_chronify_time_zone() |
There was a problem hiding this comment.
This shouldn't rely on a private method. Why do we need a chronify-specific method? Can't it be time_dim.get_time_zone?
There was a problem hiding this comment.
Because we need the ZoneInfo obj version of time_dim.get_time_zone.
I don't want to make it a public method, though I can, because the method is only for the datetime time config.
elainethale
left a comment
There was a problem hiding this comment.
Looking at this code brings two main questions to mind:
-
Is it possible to know what operations will be/are applied to bring one time convention in line with another?
-
Are we supporting all time dimension types (when reasonably compatible) as possible project time base dimensions and/or supplemental dimensions (are they queryable)?
elainethale
left a comment
There was a problem hiding this comment.
Here are a couple of additional points raised by Claude that make sense to me and we might want to address:
If someone set an ANNUAL or REPRESENTATIVE_PERIOD [project] base dimension, there's no guard — it would fail only at dataset registration or query time with a confusing error. Consider adding a project-level validator or at least documenting that DATETIME is the expected base type.
AlignedTimeSingleTimeZone.time_zone is now str | None. When None, get_time_zones() returns []. This flows through to get_localization_plan() returning None (no localization). This enables true tz-naive datetime data, which is useful. But it also means ALIGNED_IN_ABSOLUTE_TIME with time_zone=None and TIMESTAMP_NTZ produces data with no timezone information anywhere — which may be confusing for downstream consumers. Worth documenting what this combination means.
This reverts commit e15b0d4.
elainethale
left a comment
There was a problem hiding this comment.
Not quite all the way through, but I want to provide feedback ahead of a meeting.
Changes generally look good. There is a little more work to do on the documentation. Please see my comments and generally take one more pass through from a user perspective.
There was a problem hiding this comment.
@lixiliu, please review all the Time content on this page for accuracy and clarity, starting from the Time heading.
There was a problem hiding this comment.
(Feel free to streamline/remove content in favor of links to other locations if that's appropriate.)
|
|
||
| Although the schema accepts multiple `ranges` entries, dsgrid currently only supports a single continuous range. The time zone is specified through `time_zone_format`, which supports two variants: | ||
|
|
||
| - **`aligned_in_absolute_time`** — all geographies share the same timestamps in absolute time. Provide a single `time_zone` (an IANA time zone string such as `"America/New_York"` or `"Etc/GMT+5"`). |
There was a problem hiding this comment.
Is America/New_York supported? If not, please don't list it and also add some clarifying text so users know they have to use constant offset time zones. If so, are there specific requirements for timestamps around DST transitions?
There was a problem hiding this comment.
Alignment is separate from column_format. Localization is only required when column_format is NTZ and the only time time zones are restricted to fixed offset. In short, time zone requirement is tied to column_format, not alignment. It is why all that is already mentioned in the column_format section preceding it.
| Although the schema accepts multiple `ranges` entries, dsgrid currently only supports a single continuous range. The time zone is specified through `time_zone_format`, which supports two variants: | ||
|
|
||
| - **`aligned_in_absolute_time`** — all geographies share the same timestamps in absolute time. Provide a single `time_zone` (an IANA time zone string such as `"America/New_York"` or `"Etc/GMT+5"`). | ||
| - **`aligned_in_std_clock_time`** — timestamps cover the same interval of standard clock time across geographies (e.g., all of 2012 as experienced locally in standard time). The data table must have a `time_zone` column with per-row IANA time zones. Provide a `time_zones` list of all unique time zones in the data table. |
There was a problem hiding this comment.
This description needs to clarify that only constant-offset IANA time zones are allowed.
Also, can we provide users with a reason why they need to list the time_zones they are using? Do we use that list to pre-check that the time zones are all constant-offset, for example?
There was a problem hiding this comment.
I see that this clarification is below, but it's worth repeating early, often, and very clearly. We can't count on people reading everything carefully and all the way through.
There was a problem hiding this comment.
See column_format preceding it.
|
|
||
| - [How to Create Dataset Dimensions](../how_tos/how_to_dimensions) | ||
| - [How to Define a Time Dimension](../how_tos/how_to_time_dimension) | ||
| - [How to Convert Time Zones](../how_tos/how_to_convert_time_zone) |
There was a problem hiding this comment.
Does time zone conversion belong here or under dataset submittal?
There was a problem hiding this comment.
Good point, it should fall under project query rather than dataset submittal. However I don't see a section on that.
| ## Choosing between the two patterns | ||
|
|
||
| | Scenario | Use | | ||
| |---|---| | ||
| | All data should reflect the same local time | `convert_time_zone` | | ||
| | Each geography has its own local time zone | `convert_time_zone_by_column` | | ||
|
|
||
| --- |
There was a problem hiding this comment.
I find this kind of confusing. dsgrid users don't generally choose the pattern--they describe what data they are providing and what they want out and then dsgrid converts as needed to do what the user wants.
|
|
||
| --- | ||
|
|
||
| ## Query backend |
There was a problem hiding this comment.
This information seems more like detailed information than a how-to. Perhaps this should be a page under "Project Queries"? And then I do think a link from this how-to to that page would be appropriate.
There was a problem hiding this comment.
And as its own page, or as part of its own page, it could be expanded. What might users want to know about the differences between DuckDB and Spark? (And do we have some of that information somewhere else already?)
|
|
||
| | Operation | Input | Output | | ||
| |---|---|---| | ||
| | `localize_time_zone` | `timestamp_ntz` | `timestamp_tz` | |
There was a problem hiding this comment.
Above didn't you say that dsgrid always outputs timestamp_ntz because the backends like to report in system time zone???
There was a problem hiding this comment.
Localize_time_zone makes the timestamps tz-aware...
Convert_time_zone is when dsgrid outputs the timestamps as tz-naive to show the actual time zone conversion. This was what we talked about during the meeting.
There was a problem hiding this comment.
LGTM, @lixiliu -- thanks for sticking with it! @daniel-thom, any remaining concerns?
| ) | ||
| if not is_tz_naive: | ||
| return self | ||
| for tz_name in self.time_zone_format.get_time_zones(): |
There was a problem hiding this comment.
Is this the only real use of the time_zones list in the config?
* rename config params * basic implementation * update * update localization calls * fix integration errors * update test * clean up * remove time zone from column_format * update submodule * Revert "update submodule" This reverts commit e15b0d4. * remove stale features * update * update offset parsing * fix offset parsing * refactor tests * add bound on offset * fix bugs * make spark df * change spark session * update * fix bugs * update localize test * fix bugs * update * make time_zone_column optional * last try * address PR comments * Update submodules * Update chronify branch * address comments * Update instruction on datetime time config * add how-to on time dimension + add tz validator * update * Minor documentation edit. * address comments * Delete missing_associations/geography__subsector.csv * revise test_localize_timestamps_if_necessary * add convert_time_zone how-to * add reference to new material throughout * update doc * fix test for CI * update test for CI * update chronify version * final cleanup * Drop deprecated time config class DatetimeExternalTimeZoneDimensionModel from documentation. * update chronify version * Additional clarification for LocalTimeMultipleTimeZones. * clean up * Edits for time zones, utc offsets, ... section. * reorganize / revise instructions on time * fix reference links * Bump the chronify version. * Point to latest dsgrid-test-data. * update test data * commit for now * fix remaining errors * fix errors * update test * reset chronify version --------- Co-authored-by: Daniel Thom <daniel.thom@nrel.gov> Co-authored-by: Elaine Hale <elaine.hale@nrel.gov> Co-authored-by: Elaine Hale <elaine.hale@nlr.gov>
Closes GitHub Issue #
Description
Local time support.
Allows config to describe tz-naive timestamps that are:
aligned_in_absolute_time(with or without time zone) oraligned_in_local_std_time(where time zones are given by a TIME_ZONE_COLUMN in the data table)Time zone localization is triggered during dataset registration when:
aligned_in_local_std_time, localization is based on the TIME_ZONE_COLUMN from the geography dimension.No time zone localization available when dataset is submitted to project
Chronify: NatLabRockies/chronify#61
Checklist