Conversation
There was a problem hiding this comment.
Code Review
This pull request implements a deduplication strategy for metrics by introducing a write_epoch to track the latest writes for specific key-step pairs. The dedupe_metrics_by_key_step function is utilized in the upload pipeline to filter redundant data. Feedback suggests using generator expressions for memory efficiency and merging a duplicated test class.
| scalars = dedupe_metrics_by_key_step( | ||
| list(filter(self._filter_scalar_by_step, [scalar.to_scalar_model() for scalar in self._scalars])) | ||
| ) | ||
| medias = dedupe_metrics_by_key_step( | ||
| list(filter(self._filter_media_by_step, [media.to_media_model(self._run_store.media_dir) for media in self._medias])) | ||
| ) |
There was a problem hiding this comment.
Using list comprehensions [...] here creates full lists of ScalarModel and MediaModel objects in memory before filtering. For a large number of metrics, this can be memory-intensive. Using generator expressions (...) would be more memory-efficient as it would create the models one by one as filter consumes them.
| scalars = dedupe_metrics_by_key_step( | |
| list(filter(self._filter_scalar_by_step, [scalar.to_scalar_model() for scalar in self._scalars])) | |
| ) | |
| medias = dedupe_metrics_by_key_step( | |
| list(filter(self._filter_media_by_step, [media.to_media_model(self._run_store.media_dir) for media in self._medias])) | |
| ) | |
| scalars = dedupe_metrics_by_key_step( | |
| list(filter(self._filter_scalar_by_step, (scalar.to_scalar_model() for scalar in self._scalars))) | |
| ) | |
| medias = dedupe_metrics_by_key_step( | |
| list(filter(self._filter_media_by_step, (media.to_media_model(self._run_store.media_dir) for media in self._medias))) | |
| ) |
| def test_overwrite_uses_latest_write_epoch_but_same_storage_file(self): | ||
| with UseMockRunState() as run_state: | ||
| key_obj = self._new_key(run_state) | ||
|
|
||
| first = self._add_line(key_obj, 1, 0) | ||
| overwrite = self._add_line(key_obj, 2, 0) | ||
|
|
||
| assert overwrite.metric_overwrite is True | ||
| assert key_obj._step_epochs[0] == 1 | ||
| assert key_obj._write_epoch == overwrite.metric_epoch | ||
| assert overwrite.metric_epoch == first.metric_epoch + 1 | ||
| assert overwrite.metric_file_path == first.metric_file_path |
There was a problem hiding this comment.
|
本地去重策略有粗糙,等新版 sdk 重构 |
Description
Closes: #(issue)
🎯 PRs Should Target Issues
None