Skip to content

Ll/local time part2#406

Merged
lixiliu merged 61 commits intomainfrom
ll/local_time_part2
Mar 23, 2026
Merged

Ll/local time part2#406
lixiliu merged 61 commits intomainfrom
ll/local_time_part2

Conversation

@lixiliu
Copy link
Copy Markdown
Contributor

@lixiliu lixiliu commented Jan 23, 2026

Closes GitHub Issue #

Description

Local time support.

Allows config to describe tz-naive timestamps that are:

  • aligned_in_absolute_time(with or without time zone) or
  • aligned_in_local_std_time (where time zones are given by a TIME_ZONE_COLUMN in the data table)

Time zone localization is triggered during dataset registration when:

  • timestamps in data table are tz-naive (TIMESTAMP_NTZ) but time config has a time zone.
  • For aligned_in_local_std_time, localization is based on the TIME_ZONE_COLUMN from the geography dimension.

No time zone localization available when dataset is submitted to project

Chronify: NatLabRockies/chronify#61

Checklist

  • Tests exercising the new feature or bug fix
  • All tests pass
  • At least one code review approval
  • Consider transferring TODOs to GitHub

@codecov
Copy link
Copy Markdown

codecov bot commented Jan 23, 2026

Codecov Report

❌ Patch coverage is 86.87500% with 42 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.07%. Comparing base (13f0f42) to head (d5c9b9b).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
dsgrid/config/dimensions.py 69.38% 15 Missing ⚠️
dsgrid/utils/dataset.py 87.82% 14 Missing ⚠️
dsgrid/config/date_time_dimension_config.py 84.00% 8 Missing ⚠️
dsgrid/dimension/time.py 50.00% 3 Missing ⚠️
dsgrid/query/query_submitter.py 92.85% 1 Missing ⚠️
dsgrid/registry/dataset_registry_manager.py 98.79% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #406      +/-   ##
==========================================
+ Coverage   81.80%   82.07%   +0.27%     
==========================================
  Files         124      124              
  Lines       14230    14419     +189     
==========================================
+ Hits        11641    11835     +194     
+ Misses       2589     2584       -5     
Flag Coverage Δ
Linux 80.39% <74.06%> (-1.33%) ⬇️
Windows 80.47% <74.06%> (+0.11%) ⬆️
spark 77.32% <79.06%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lixiliu lixiliu self-assigned this Jan 24, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds local time support for tz-naive timestamps by introducing Chronify-based localization during dataset registration, along with new/updated time-zone configuration semantics and tests.

Changes:

  • Adds Chronify-powered timestamp localization helpers and wires them into dataset registration.
  • Updates time zone format modeling (rename to aligned_in_local_std_time, allow time_zone=None for aligned absolute time).
  • Adds tests for localization routing and UTC-offset parsing in time-in-parts conversion.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
tests/test_localize_timestamps_if_necessary.py New tests validating localization routing and no-op behavior.
tests/test_create_time_dimensions.py Minor test fix for periods argument type.
tests/test_convert_time_format_if_necessary.py New tests for UTC offset parsing and timestamp transformation behavior.
tests/data/dimension_models/minimal/dimension_test_time.json5 Updates time zone format string to new enum value.
pyproject.toml Switches Chronify dependency to a Git URL/branch for local-time work.
missing_associations/geography__subsector.csv Adds missing associations CSV fixture.
dsgrid/utils/dataset.py Adds Chronify localization helpers and localize_timestamps_if_necessary.
dsgrid/registry/dataset_registry_manager.py Integrates time-in-parts conversion and timestamp localization into registration.
dsgrid/query/query_submitter.py Uses TIME_ZONE_COLUMN consistently; refactors Chronify conversion routing.
dsgrid/dimension/time.py Renames time zone enum value to aligned_in_local_std_time.
dsgrid/dataset/dataset_schema_handler_base.py Adds Chronify-based timestamp localization pathway at schema-handler level.
dsgrid/config/index_time_dimension_config.py Updates Chronify return type hints for index time config.
dsgrid/config/dimensions.py Updates time config models (offset column, time zone formats, removes legacy model).
dsgrid/config/date_time_dimension_config.py Adds Chronify dtype support + localization plan logic.
dsgrid/common.py Adds TIME_COLUMN constant.
dsgrid-test-data Updates submodule pointer.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +355 to +363
match (config.backend_engine, config.use_hive_metastore):
case (BackendEngine.SPARK, True):
df = convert_time_zone_with_chronify_spark_hive(
df=df,
value_column=VALUE_COLUMN,
from_time_dim=time_dim,
time_zone=model.result.time_zone,
scratch_dir_context=scratch_dir_context,
)
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refactor removed the explicit time_dim.supports_chronify() validation/guard. If a non-Chronify-capable time dimension reaches this path, failures will likely be harder to interpret (thrown deeper in Chronify conversion). Consider restoring the explicit validation and raising a clear DSGInvalidParameter when Chronify conversion isn’t supported.

Copilot uses AI. Check for mistakes.
runtime_config = dsgrid.runtime_config
match localization_plan:
case "localize_to_single_tz":
to_time_zone = time_dim._get_chronify_time_zone()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't rely on a private method. Why do we need a chronify-specific method? Can't it be time_dim.get_time_zone?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we need the ZoneInfo obj version of time_dim.get_time_zone.

I don't want to make it a public method, though I can, because the method is only for the datetime time config.

@lixiliu lixiliu requested a review from daniel-thom February 16, 2026 10:28
@elainethale elainethale self-requested a review February 19, 2026 20:12
Copy link
Copy Markdown
Contributor

@elainethale elainethale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this code brings two main questions to mind:

  1. Is it possible to know what operations will be/are applied to bring one time convention in line with another?

  2. Are we supporting all time dimension types (when reasonably compatible) as possible project time base dimensions and/or supplemental dimensions (are they queryable)?

Copy link
Copy Markdown
Contributor

@elainethale elainethale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are a couple of additional points raised by Claude that make sense to me and we might want to address:

If someone set an ANNUAL or REPRESENTATIVE_PERIOD [project] base dimension, there's no guard — it would fail only at dataset registration or query time with a confusing error. Consider adding a project-level validator or at least documenting that DATETIME is the expected base type.
AlignedTimeSingleTimeZone.time_zone is now str | None. When None, get_time_zones() returns []. This flows through to get_localization_plan() returning None (no localization). This enables true tz-naive datetime data, which is useful. But it also means ALIGNED_IN_ABSOLUTE_TIME with time_zone=None and TIMESTAMP_NTZ produces data with no timezone information anywhere — which may be confusing for downstream consumers. Worth documenting what this combination means.

@lixiliu lixiliu requested a review from elainethale March 13, 2026 01:25
Copy link
Copy Markdown
Contributor

@elainethale elainethale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite all the way through, but I want to provide feedback ahead of a meeting.

Changes generally look good. There is a little more work to do on the documentation. Please see my comments and generally take one more pass through from a user perspective.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lixiliu, please review all the Time content on this page for accuracy and clarity, starting from the Time heading.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Feel free to streamline/remove content in favor of links to other locations if that's appropriate.)


Although the schema accepts multiple `ranges` entries, dsgrid currently only supports a single continuous range. The time zone is specified through `time_zone_format`, which supports two variants:

- **`aligned_in_absolute_time`** — all geographies share the same timestamps in absolute time. Provide a single `time_zone` (an IANA time zone string such as `"America/New_York"` or `"Etc/GMT+5"`).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is America/New_York supported? If not, please don't list it and also add some clarifying text so users know they have to use constant offset time zones. If so, are there specific requirements for timestamps around DST transitions?

Copy link
Copy Markdown
Contributor Author

@lixiliu lixiliu Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alignment is separate from column_format. Localization is only required when column_format is NTZ and the only time time zones are restricted to fixed offset. In short, time zone requirement is tied to column_format, not alignment. It is why all that is already mentioned in the column_format section preceding it.

Although the schema accepts multiple `ranges` entries, dsgrid currently only supports a single continuous range. The time zone is specified through `time_zone_format`, which supports two variants:

- **`aligned_in_absolute_time`** — all geographies share the same timestamps in absolute time. Provide a single `time_zone` (an IANA time zone string such as `"America/New_York"` or `"Etc/GMT+5"`).
- **`aligned_in_std_clock_time`** — timestamps cover the same interval of standard clock time across geographies (e.g., all of 2012 as experienced locally in standard time). The data table must have a `time_zone` column with per-row IANA time zones. Provide a `time_zones` list of all unique time zones in the data table.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description needs to clarify that only constant-offset IANA time zones are allowed.

Also, can we provide users with a reason why they need to list the time_zones they are using? Do we use that list to pre-check that the time zones are all constant-offset, for example?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this clarification is below, but it's worth repeating early, often, and very clearly. We can't count on people reading everything carefully and all the way through.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See column_format preceding it.


- [How to Create Dataset Dimensions](../how_tos/how_to_dimensions)
- [How to Define a Time Dimension](../how_tos/how_to_time_dimension)
- [How to Convert Time Zones](../how_tos/how_to_convert_time_zone)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does time zone conversion belong here or under dataset submittal?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, it should fall under project query rather than dataset submittal. However I don't see a section on that.

Comment on lines +186 to +193
## Choosing between the two patterns

| Scenario | Use |
|---|---|
| All data should reflect the same local time | `convert_time_zone` |
| Each geography has its own local time zone | `convert_time_zone_by_column` |

---
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this kind of confusing. dsgrid users don't generally choose the pattern--they describe what data they are providing and what they want out and then dsgrid converts as needed to do what the user wants.


---

## Query backend
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This information seems more like detailed information than a how-to. Perhaps this should be a page under "Project Queries"? And then I do think a link from this how-to to that page would be appropriate.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And as its own page, or as part of its own page, it could be expanded. What might users want to know about the differences between DuckDB and Spark? (And do we have some of that information somewhere else already?)


| Operation | Input | Output |
|---|---|---|
| `localize_time_zone` | `timestamp_ntz` | `timestamp_tz` |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above didn't you say that dsgrid always outputs timestamp_ntz because the backends like to report in system time zone???

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Localize_time_zone makes the timestamps tz-aware...
Convert_time_zone is when dsgrid outputs the timestamps as tz-naive to show the actual time zone conversion. This was what we talked about during the meeting.

@lixiliu lixiliu requested a review from elainethale March 17, 2026 05:23
@elainethale elainethale added the run-spark-tests If included, CI runs Spark tests. Required before merging. label Mar 17, 2026
Copy link
Copy Markdown
Contributor

@elainethale elainethale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @lixiliu -- thanks for sticking with it! @daniel-thom, any remaining concerns?

)
if not is_tz_naive:
return self
for tz_name in self.time_zone_format.get_time_zones():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the only real use of the time_zones list in the config?

@lixiliu lixiliu merged commit e85bd71 into main Mar 23, 2026
6 checks passed
github-actions bot pushed a commit that referenced this pull request Mar 23, 2026
* rename config params

* basic implementation

* update

* update localization calls

* fix integration errors

* update test

* clean up

* remove time zone from column_format

* update submodule

* Revert "update submodule"

This reverts commit e15b0d4.

* remove stale features

* update

* update offset parsing

* fix offset parsing

* refactor tests

* add bound on offset

* fix bugs

* make spark df

* change spark session

* update

* fix bugs

* update localize test

* fix bugs

* update

* make time_zone_column optional

* last try

* address PR comments

* Update submodules

* Update chronify branch

* address comments

* Update instruction on datetime time config

* add how-to on time dimension + add tz validator

* update

* Minor documentation edit.

* address comments

* Delete missing_associations/geography__subsector.csv

* revise test_localize_timestamps_if_necessary

* add convert_time_zone how-to

* add reference to new material throughout

* update doc

* fix test for CI

* update test for CI

* update chronify version

* final cleanup

* Drop deprecated time config class DatetimeExternalTimeZoneDimensionModel from documentation.

* update chronify version

* Additional clarification for LocalTimeMultipleTimeZones.

* clean up

* Edits for time zones, utc offsets, ... section.

* reorganize / revise instructions on time

* fix reference links

* Bump the chronify version.

* Point to latest dsgrid-test-data.

* update test data

* commit for now

* fix remaining errors

* fix errors

* update test

* reset chronify version

---------

Co-authored-by: Daniel Thom <daniel.thom@nrel.gov>
Co-authored-by: Elaine Hale <elaine.hale@nrel.gov>
Co-authored-by: Elaine Hale <elaine.hale@nlr.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-spark-tests If included, CI runs Spark tests. Required before merging.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants