Skip to content

feat: add tags to asset migration#653

Open
seanmreidy wants to merge 2 commits intomainfrom
03-12-feat_add_tags_to_asset_migration
Open

feat: add tags to asset migration#653
seanmreidy wants to merge 2 commits intomainfrom
03-12-feat_add_tags_to_asset_migration

Conversation

@seanmreidy
Copy link
Contributor

@seanmreidy seanmreidy commented Mar 12, 2026

Fixes MDEV-67

Description

We weren't migrating tags as part of datascope migration in the asset_migrator - this change does that.

Testing

Testing details in Linear ticket.

Copy link
Contributor Author

seanmreidy commented Mar 12, 2026

@seanmreidy seanmreidy marked this pull request as ready for review March 12, 2026 13:49
@greptile-apps
Copy link

greptile-apps bot commented Mar 12, 2026

Greptile Summary

This PR fixes missing series_tags propagation during datascope migration in AssetMigrator. Previously, add_dataset was called without passing the tags from the source data scope; the new implementation fetches the raw DataScope objects via _list_dataset_scopes() to capture series_tags and passes them through to new_asset.add_dataset(...). A few housekeeping refactors accompany the fix: the _copy_optional_runs helper is inlined and the dataset-copy path is now gated behind an explicit dataset_config is not None check.

Key observations:

  • Core fix looks correct: source_data_scope.series_tags is properly captured and forwarded to add_dataset, which accepts Mapping[str, str] | None — the types align.
  • Dead code: _resolve_destination_dataset (lines 93–110) is no longer called anywhere after its logic was inlined into _copy_asset_datasets; it should be removed.
  • Extra API round-trip: _list_dataset_scopes() and list_datasets() each call _get_latest_api() independently, resulting in two asset-fetch requests where one would suffice.
  • The behavior change around inaccessible datasets (raising ValueError instead of silently skipping) was raised in a previous review thread and is out of scope here.

Confidence Score: 4/5

  • Safe to merge — the core tag-propagation fix is correct and the issues found are minor style/performance concerns.
  • The primary functional change (forwarding series_tags via _list_dataset_scopes()) is correctly implemented and the types align with add_dataset's signature. The two flagged issues (dead code and an extra _get_latest_api() round-trip) are non-blocking; they don't affect correctness or data integrity.
  • nominal/experimental/migration/migrator/asset_migrator.py — contains dead _resolve_destination_dataset method and a redundant API call in _copy_asset_datasets.

Important Files Changed

Filename Overview
nominal/experimental/migration/migrator/asset_migrator.py Adds series_tags propagation when migrating asset datasets; refactors _copy_optional_runs inline and guards _copy_asset_datasets with a dataset_config is not None check. Two minor issues: _resolve_destination_dataset is now dead code, and _list_dataset_scopes() + list_datasets() make redundant back-to-back _get_latest_api() calls.

Sequence Diagram

sequenceDiagram
    participant AM as AssetMigrator
    participant SA as source_asset (Asset)
    participant API as _get_latest_api()
    participant Cat as Catalog API
    participant DM as DatasetMigrator
    participant DA as destination_asset (Asset)

    AM->>SA: _list_dataset_scopes()
    SA->>API: _get_latest_api() [call 1]
    API-->>SA: DataScope[] (with series_tags)
    SA-->>AM: source_data_scopes

    AM->>SA: list_datasets()
    SA->>API: _get_latest_api() [call 2 – redundant]
    API-->>SA: scope RID map
    SA->>Cat: _get_datasets(rids)
    Cat-->>SA: Dataset objects
    SA-->>AM: {dataset_rid: Dataset}

    loop for each DataScope
        AM->>DM: copy_from(source_dataset, DatasetCopyOptions)
        DM-->>AM: new_dataset
        AM->>DA: add_dataset(scope_name, new_dataset, series_tags=source_series_tags)
    end
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: nominal/experimental/migration/migrator/asset_migrator.py
Line: 93-110

Comment:
**`_resolve_destination_dataset` is now dead code**

After the refactor, `_resolve_destination_dataset` is defined but never called anywhere — the logic it previously encapsulated was inlined directly into the new `_copy_asset_datasets` implementation. A quick grep confirms there are no callers of this method anywhere in the repository.

Consider removing the method to avoid confusion for future readers who might wonder why it exists.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: nominal/experimental/migration/migrator/asset_migrator.py
Line: 125-126

Comment:
**Redundant `_get_latest_api()` call**

Both `_list_dataset_scopes()` and `list_datasets()` independently call `_get_latest_api()` internally (which makes a network round-trip to fetch the asset). This means the asset is fetched **twice** in sequence — once to get the `DataScope` objects (for `series_tags`) and once more to get the dataset RID-to-scope mapping.

The two results could be derived from a single `_get_latest_api()` call, e.g. by iterating over `source_asset._list_dataset_scopes()` alone and then calling `_get_datasets()` directly on the collected RIDs, avoiding the extra hop entirely.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: 671c45c

greptile-apps[bot]

This comment was marked as resolved.

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Open in Devin Review

@seanmreidy seanmreidy force-pushed the 03-12-feat_add_tags_to_asset_migration branch from aba04d4 to ae1edad Compare March 12, 2026 17:30
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 7 additional findings in Devin Review.

Open in Devin Review

@seanmreidy seanmreidy force-pushed the 03-12-feat_add_tags_to_asset_migration branch from ae1edad to ab864e7 Compare March 12, 2026 18:02
devin-ai-integration[bot]

This comment was marked as resolved.

@seanmreidy seanmreidy force-pushed the 03-12-feat_add_tags_to_asset_migration branch from ab864e7 to 08979cd Compare March 12, 2026 18:06
devin-ai-integration[bot]

This comment was marked as resolved.

@seanmreidy seanmreidy changed the base branch from 03-03-feat_refactor_migration_library_and_introduce_statefulness to graphite-base/653 March 12, 2026 21:02
@seanmreidy seanmreidy force-pushed the 03-12-feat_add_tags_to_asset_migration branch from 08979cd to 1eda91a Compare March 12, 2026 21:12
@seanmreidy seanmreidy changed the base branch from graphite-base/653 to main March 12, 2026 21:12
),
)

def _copy_asset_datasets(self, source_asset: Asset, new_asset: Asset, options: AssetCopyOptions) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we consistent with this naming scheme? I could have sworn we were doing source_xyz vs. dest_xyz but who knows.

dataset_mapping = options.old_to_new_dataset_rid_mapping

source_data_scopes = source_asset._list_dataset_scopes()
source_datasets = {data_scope[1].rid: data_scope[1] for data_scope in source_asset.list_datasets()}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more readable like:

source_datasets = {ds.rid: ds for _, ds in source_asset.list_datasets()}

Comment on lines +138 to +148
if source_dataset_rid in dataset_mapping:
new_dataset_rid = dataset_mapping[source_dataset_rid]
new_dataset = self.ctx.destination_client.get_dataset(new_dataset_rid)
else:
new_dataset = dataset_migrator.copy_from(
source_dataset,
DatasetCopyOptions(
include_files=options.dataset_config.include_dataset_files,
preserve_uuid=options.dataset_config.preserve_dataset_uuid,
),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conditional assignment :)

@drake-nominal drake-nominal force-pushed the 03-12-feat_add_tags_to_asset_migration branch from 1eda91a to 671c45c Compare March 12, 2026 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants