-
Notifications
You must be signed in to change notification settings - Fork 581
feat(pt): add plugin for data modifier #4661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
feat(pt): add plugin for data modifier #4661
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughAdds a data-modifier subsystem: BaseModifier class and factory, dataset/loader integration and caching, propagation into training/inference/freeze flows (serialize as extra file in .pth), ModelWrapper support to subtract modifier outputs at runtime, and unit tests for modifier behaviors. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor CLI as Entrypoint
participant Factory as get_data_modifier
participant Loader as DpLoaderSet / Dataset
participant Data as DeepmdData
participant Trainer as ModelWrapper
participant Freezer as Script Export
participant Eval as DeepEval
CLI->>Factory: request modifier from model params
Factory-->>CLI: BaseModifier instance (jitable or not)
CLI->>Loader: construct loaders with modifier
Loader->>Data: attach modifier to dataset
Loader->>Data: preload_and_modify_all_data_torch(num_worker)
Data->>Data: apply modifier per-frame (threaded) and cache
Loader-->>CLI: loaders ready
CLI->>Trainer: instantiate ModelWrapper with modifier
Trainer->>Trainer: (inference) compute model_pred
Trainer->>Trainer: compute modifier_pred and subtract from model_pred
CLI->>Freezer: freeze -> serialize modifier into extra_files["data_modifier.pth"]
Eval->>Freezer: load .pth with extra_files
Freezer-->>Eval: provide `data_modifier.pth` bytes
Eval->>Factory: torch.jit.load bytes -> modifier instance
Eval->>Trainer: instantiate ModelWrapper with loaded modifier
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
📜 Recent review detailsConfiguration used: Repository UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
🧰 Additional context used🧬 Code graph analysis (1)deepmd/pt/utils/dataset.py (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
🔇 Additional comments (4)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
deepmd/pt/modifier/base_modifier.py (2)
41-44: Consider simplifying the box assignment.
A ternary operator can reduce verbosity here:- if data["box"] is None: - box = None - else: - box = data["box"][:get_nframes, :] + box = None if data["box"] is None else data["box"][:get_nframes, :]🧰 Tools
🪛 Ruff (0.8.2)
41-44: Use ternary operator
box = None if data["box"] is None else data["box"][:get_nframes, :]instead ofif-else-blockReplace
if-else-block withbox = None if data["box"] is None else data["box"][:get_nframes, :](SIM108)
47-47: Remove or use thenframesvariable.
Currently,nframes = coord.shape[0]is not used, which may confuse future maintainers.🧰 Tools
🪛 Ruff (0.8.2)
47-47: Local variable
nframesis assigned to but never usedRemove assignment to unused variable
nframes(F841)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
deepmd/pt/modifier/__init__.py(1 hunks)deepmd/pt/modifier/base_modifier.py(1 hunks)deepmd/pt/train/training.py(7 hunks)deepmd/pt/utils/stat.py(1 hunks)
🧰 Additional context used
🧬 Code Definitions (2)
deepmd/pt/modifier/__init__.py (1)
deepmd/pt/modifier/base_modifier.py (1) (1)
BaseModifier(9-56)
deepmd/pt/train/training.py (3)
deepmd/pt/modifier/base_modifier.py (2) (2)
BaseModifier(9-56)modify_data(14-56)deepmd/pd/train/training.py (1) (1)
get_additional_data_requirement(1163-1187)deepmd/pd/utils/stat.py (1) (1)
make_stat_input(40-85)
🪛 Ruff (0.8.2)
deepmd/pt/modifier/base_modifier.py
41-44: Use ternary operator box = None if data["box"] is None else data["box"][:get_nframes, :] instead of if-else-block
Replace if-else-block with box = None if data["box"] is None else data["box"][:get_nframes, :]
(SIM108)
47-47: Local variable nframes is assigned to but never used
Remove assignment to unused variable nframes
(F841)
⏰ Context from checks skipped due to timeout of 90000ms (29)
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Test Python (4, 3.9)
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Test C++ (false)
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Analyze (python)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Test C++ (true)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Test Python (1, 3.9)
🔇 Additional comments (11)
deepmd/pt/modifier/__init__.py (1)
1-8: Good job exposing the BaseModifier API.
This new__init__.pycleanly re-exports theBaseModifierclass and ensures users can import it directly fromdeepmd.pt.modifier.deepmd/pt/utils/stat.py (2)
50-51: Conditional logging is well-handled.
Only logging whennbatches > 0helps keep logs cleaner in scenarios where no batches are processed.
56-59: Logic for handling nbatches == -1 is clear and correct.
This new condition ensures the entire dataset is used whennbatchesis -1. No issues found.deepmd/pt/train/training.py (8)
39-41: Import of BaseModifier is appropriate.
This import makes the newly introduced functionality available where needed.
140-149: Modifier parameter handling is well-structured.
The assertion preventing usage in multi-task scenarios is clear and avoids incompatible configurations.
231-231: Defaultingmodifierto None is appropriate.
Makes the modifier usage optional without complicating the training interface.
239-250: Verify data modification logic.
Applyingmodifier.modify_datato every system might lead to repeated transformations ifsingle_model_statis called multiple times. Confirm this matches your intended workflow.
345-345: Single-model signature usage is consistent.
Passingmodifier=self.modifierensures the same modifier instance is applied throughout the training flow.
384-384: Multi-task signature usage is consistent.
Again, passingmodifier=self.modifierallows uniform data processing across tasks if needed.
1075-1081: Storing thedata_modifierstate is a good idea.
Consider providing a loading mechanism in the future so thatdata_modifiercan be restored automatically.
1389-1400: Factory function for data modifiers looks good.
Encapsulates logic for dynamically obtaining modifier classes, making the code more extensible.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## devel #4661 +/- ##
==========================================
- Coverage 82.15% 81.95% -0.21%
==========================================
Files 709 712 +3
Lines 72468 72824 +356
Branches 3616 3616
==========================================
+ Hits 59535 59681 +146
- Misses 11769 11981 +212
+ Partials 1164 1162 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
wanghan-iapcm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add UT for the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a new plugin for modifying data in the PyTorch backend and integrates it into the training workflow. Key changes include:
- Creation of a new BaseModifier class and registration in the modifier package.
- Integration of the data modifier into the training process, including saving its state.
- Minor adjustments to the statistics data preparation in the utils module.
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| deepmd/pt/modifier/init.py | Exposes BaseModifier for external use. |
| deepmd/pt/modifier/base_modifier.py | Adds a new BaseModifier class for data modification. |
| deepmd/pt/train/training.py | Integrates the data modifier into training data preparation and model saving. |
| deepmd/pt/utils/stat.py | Tweaks logging and batch calculation in the statistics utility. |
Comments suppressed due to low confidence (2)
deepmd/pt/modifier/base_modifier.py:9
- Ensure that make_base_modifier() returns a valid class to use for multiple inheritance with torch.nn.Module. If it does not, consider revising the inheritance structure or renaming for clarity.
class BaseModifier(torch.nn.Module, make_base_modifier()):
deepmd/pt/modifier/base_modifier.py:40
- The variable get_nframes is explicitly set to None, which will slice the full array; if a limit on the number of frames was intended, assign get_nframes an appropriate value.
coord = data["coord"][:get_nframes, :]
8d02057 to
859292f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
deepmd/pt/utils/dataset.py (1)
74-75: Remove or implement the commented-outclear_modified_frame_cachemethod.Similar to the dataloader, this commented-out method suggests incomplete implementation. Either implement the functionality if needed, or remove the comment to reduce clutter.
This is the same issue as in
deepmd/pt/utils/dataloader.pylines 245-247. Consider a unified approach to cache management across both files.deepmd/pt/modifier/base_modifier.py (1)
50-50: Remove or use the commented-outnframesvariable.The variable
nframesis calculated but immediately commented out and never used. Either remove the comment and use the variable, or delete the line entirely to reduce clutter.Based on learnings, if this is intentionally kept for future use, please clarify with a TODO comment.
🧹 Nitpick comments (2)
deepmd/pt/modifier/base_modifier.py (1)
17-17: Clarify the purpose of the unuseddata_sysparameter.The
data_sysparameter is flagged as unused by static analysis. If it's intended for future use or required by subclasses, consider adding a docstring note or a TODO comment. Otherwise, consider removing it to reduce the parameter surface.deepmd/pt/utils/dataloader.py (1)
245-247: Remove or implement the commented-outclear_modified_frame_cachemethod.The commented-out method suggests incomplete implementation. Either implement the cache-clearing functionality if it's needed, or remove the comment to keep the codebase clean.
If cache clearing is required in the future, consider opening an issue to track the feature.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
deepmd/pt/entrypoints/main.py(4 hunks)deepmd/pt/modifier/__init__.py(1 hunks)deepmd/pt/modifier/base_modifier.py(1 hunks)deepmd/pt/train/training.py(2 hunks)deepmd/pt/utils/dataloader.py(4 hunks)deepmd/pt/utils/dataset.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- deepmd/pt/modifier/init.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-10-16T21:50:10.680Z
Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4226
File: deepmd/dpmodel/model/make_model.py:370-373
Timestamp: 2024-10-16T21:50:10.680Z
Learning: In `deepmd/dpmodel/model/make_model.py`, the variable `nall` assigned but not used is intentional and should not be flagged in future reviews.
Applied to files:
deepmd/pt/modifier/base_modifier.py
🧬 Code graph analysis (4)
deepmd/pt/train/training.py (3)
deepmd/pt/utils/dataloader.py (1)
preload_and_modify_all_data(241-243)deepmd/pt/utils/dataset.py (1)
preload_and_modify_all_data(71-72)deepmd/utils/data.py (1)
preload_and_modify_all_data(498-513)
deepmd/pt/entrypoints/main.py (3)
deepmd/pt/modifier/base_modifier.py (1)
BaseModifier(12-60)deepmd/utils/plugin.py (1)
get_class_by_type(144-154)deepmd/pt/utils/dataloader.py (1)
DpLoaderSet(65-247)
deepmd/pt/utils/dataset.py (2)
deepmd/pt/modifier/base_modifier.py (1)
BaseModifier(12-60)deepmd/pt/utils/dataloader.py (1)
preload_and_modify_all_data(241-243)
deepmd/pt/utils/dataloader.py (3)
deepmd/pt/modifier/base_modifier.py (1)
BaseModifier(12-60)deepmd/pt/utils/dataset.py (1)
preload_and_modify_all_data(71-72)deepmd/utils/data.py (1)
preload_and_modify_all_data(498-513)
🪛 Ruff (0.14.8)
deepmd/pt/entrypoints/main.py
116-116: Avoid specifying long messages outside the exception class
(TRY003)
deepmd/pt/modifier/base_modifier.py
17-17: Unused method argument: data_sys
(ARG002)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: Test Python (6, 3.13)
- GitHub Check: Test Python (4, 3.13)
- GitHub Check: Test Python (5, 3.13)
- GitHub Check: Test Python (6, 3.10)
- GitHub Check: Test Python (2, 3.13)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Test Python (5, 3.10)
- GitHub Check: Test C++ (false, true, true, false)
- GitHub Check: Test C++ (true, true, true, false)
- GitHub Check: Test Python (2, 3.10)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Test Python (1, 3.13)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Test Python (4, 3.10)
- GitHub Check: Test C++ (false, false, false, true)
- GitHub Check: Test Python (3, 3.10)
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Test Python (3, 3.13)
- GitHub Check: Test Python (1, 3.10)
- GitHub Check: Test C++ (true, false, false, true)
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Analyze (python)
🔇 Additional comments (9)
deepmd/pt/entrypoints/main.py (3)
111-121: LGTM: Data modifier instantiation is well-structured.The
get_data_modifierfunction correctly validates the presence of thetypefield, instantiates the modifier using the plugin system, and moves it to the appropriate device.
129-133: LGTM: Modifier wiring integrates cleanly.The modifier is correctly extracted from model parameters and instantiated only when present, maintaining backward compatibility for configurations without modifiers.
164-180: LGTM: Modifier is correctly passed to both training and validation data loaders.The modifier is properly threaded through to both
DpLoaderSetinstances, ensuring consistent data preprocessing across training and validation.deepmd/pt/train/training.py (2)
353-355: LGTM: Preload and modify step correctly positioned.The
preload_and_modify_all_data()calls are correctly placed after data requirements are added bysingle_model_statand before data loaders are constructed. This ensures the modifier operates on the complete, configured data before training begins.
394-396: LGTM: Multi-model path mirrors single-model preload logic.The multi-model code path correctly applies
preload_and_modify_all_data()to each model's training and validation data, maintaining consistency with the single-model approach.deepmd/pt/utils/dataloader.py (2)
82-102: LGTM: Modifier parameter correctly threaded through DpLoaderSet.The
modifierparameter is properly added to the constructor signature, stored, and forwarded to eachDeepmdDataSetForLoaderinstance. The optional typing (BaseModifier | None) maintains backward compatibility.
241-243: LGTM: Preload method delegates correctly.The
preload_and_modify_all_data()method properly iterates over all systems and delegates to each system's preload method, enabling batch preprocessing before training.deepmd/pt/utils/dataset.py (2)
22-40: LGTM: Modifier integration is clean and complete.The
modifierparameter is properly added to the constructor, documented, stored as an instance attribute, and forwarded to the underlyingDeepmdDatainstance. This enables the data modification pipeline while maintaining a clear separation of concerns.
71-72: LGTM: Preload method correctly delegates to the data system.The
preload_and_modify_all_data()method properly delegates toself._data_system.preload_and_modify_all_data(), maintaining the abstraction layers and enabling efficient batch preprocessing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (2)
deepmd/pt/modifier/base_modifier.py (2)
43-43: Clarify or remove the get_nframes pattern.Setting
get_nframes = Noneand then using it in slicing operations (e.g.,[:get_nframes, :]) is equivalent to[:None, :]which means "all frames". This pattern appears to be a placeholder for future functionality but currently adds no value and may confuse readers.If this is intentional for future extension, consider adding a comment explaining the planned usage. Otherwise, simplify to direct slicing:
- get_nframes = None - coord = data["coord"][:get_nframes, :] + coord = data["coord"]
54-59: Consider adding shape validation before reshape.The code assumes that the outputs from
forward()are compatible with the target shapes indata. If a subclass returns tensors with incompatible shapes, thereshape()operations will fail with potentially unclear error messages.Consider adding assertions to provide clearer error messages:
if "find_energy" in data and data["find_energy"] == 1.0: expected_shape = data["energy"].shape assert tot_e.numel() == np.prod(expected_shape), \ f"Energy shape mismatch: forward returned {tot_e.shape}, expected {expected_shape}" data["energy"] -= tot_e.reshape(expected_shape)
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
deepmd/pt/modifier/base_modifier.py(1 hunks)deepmd/pt/utils/dataloader.py(4 hunks)deepmd/pt/utils/dataset.py(2 hunks)source/tests/pt/test_data_modifier.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-10-16T21:50:10.680Z
Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4226
File: deepmd/dpmodel/model/make_model.py:370-373
Timestamp: 2024-10-16T21:50:10.680Z
Learning: In `deepmd/dpmodel/model/make_model.py`, the variable `nall` assigned but not used is intentional and should not be flagged in future reviews.
Applied to files:
deepmd/pt/modifier/base_modifier.py
🧬 Code graph analysis (3)
deepmd/pt/utils/dataset.py (2)
deepmd/pt/modifier/base_modifier.py (1)
BaseModifier(12-59)deepmd/utils/data.py (2)
DeepmdData(34-1069)preload_and_modify_all_data(498-513)
deepmd/pt/modifier/base_modifier.py (1)
deepmd/dpmodel/modifier/base_modifier.py (1)
make_base_modifier(17-77)
deepmd/pt/utils/dataloader.py (3)
deepmd/pt/modifier/base_modifier.py (1)
BaseModifier(12-59)deepmd/pt/utils/dataset.py (1)
preload_and_modify_all_data(71-72)deepmd/utils/data.py (1)
preload_and_modify_all_data(498-513)
🪛 Ruff (0.14.8)
source/tests/pt/test_data_modifier.py
81-81: Unused method argument: data_sys
(ARG002)
141-141: Unused method argument: data_sys
(ARG002)
deepmd/pt/modifier/base_modifier.py
17-17: Unused method argument: data_sys
(ARG002)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Test C++ (false, true, true, false)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Test C++ (false, false, false, true)
- GitHub Check: Test C++ (true, false, false, true)
- GitHub Check: Analyze (python)
- GitHub Check: Test C++ (true, true, true, false)
- GitHub Check: Test Python (5, 3.10)
- GitHub Check: Test Python (6, 3.13)
- GitHub Check: Test Python (2, 3.13)
- GitHub Check: Test Python (3, 3.13)
- GitHub Check: Test Python (1, 3.13)
- GitHub Check: Test Python (4, 3.13)
- GitHub Check: Test Python (4, 3.10)
- GitHub Check: Test Python (5, 3.13)
- GitHub Check: Test Python (6, 3.10)
- GitHub Check: Test Python (2, 3.10)
- GitHub Check: Test Python (3, 3.10)
- GitHub Check: Test Python (1, 3.10)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
🔇 Additional comments (9)
deepmd/pt/modifier/base_modifier.py (2)
1-15: LGTM!The imports and class structure are clean. The BaseModifier correctly inherits from torch.nn.Module and the dpmodel base class.
36-41: Good defensive check.The early return when no find_* flags are present avoids unnecessary computation.
source/tests/pt/test_data_modifier.py (3)
28-35: LGTM!The plugin registrations for test modifiers are correctly structured.
159-173: Good test setup.The test configuration properly loads a base config and customizes training parameters for testing.
175-223: Excellent test coverage.The tests effectively validate:
- That data modification is applied during initialization (zero_tester)
- That modification happens exactly once and persists through training (random_tester)
This covers the key behaviors of the modifier system.
deepmd/pt/utils/dataloader.py (2)
28-30: Clean integration of modifier support.The modifier parameter is properly typed, documented in the docstring (implicitly through the system parameter), and correctly propagated to the dataset constructor.
Also applies to: 82-102
241-243: LGTM!The
preload_and_modify_all_datamethod correctly delegates to each underlying system, providing a convenient way to preload and modify all data in the loader set.deepmd/pt/utils/dataset.py (2)
12-14: Excellent integration of modifier support.The modifier parameter is properly:
- Typed with
BaseModifier | None- Documented in the docstring
- Stored as an instance variable
- Propagated to the DeepmdData constructor
The implementation is clean and consistent with the overall modifier pattern.
Also applies to: 22-44
71-72: LGTM!The
preload_and_modify_all_datamethod correctly delegates to the underlying data system, maintaining a clean separation of concerns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
deepmd/pt/modifier/base_modifier.py (1)
93-107: Clarify the obscureget_nframes = Noneusage.Line 93 sets
get_nframes = Noneand then uses it in slicing operations likedata["coord"][:get_nframes, :]. Whenget_nframesisNone, the slice[:None, :]is equivalent to[:, :](selecting all frames).This pattern is intentional but not immediately obvious to readers.
Consider adding a comment to clarify:
- get_nframes = None + get_nframes = None # None in slice [:None] means select all frames coord = data["coord"][:get_nframes, :]Or use a more explicit pattern:
- get_nframes = None - coord = data["coord"][:get_nframes, :] - atype = data["atype"][:get_nframes, :] + # Process all frames in the batch + coord = data["coord"] + atype = data["atype"]
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
deepmd/pt/modifier/base_modifier.py(1 hunks)deepmd/utils/data.py(3 hunks)source/tests/pt/test_data_modifier.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-10-16T21:50:10.680Z
Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4226
File: deepmd/dpmodel/model/make_model.py:370-373
Timestamp: 2024-10-16T21:50:10.680Z
Learning: In `deepmd/dpmodel/model/make_model.py`, the variable `nall` assigned but not used is intentional and should not be flagged in future reviews.
Applied to files:
deepmd/pt/modifier/base_modifier.py
🧬 Code graph analysis (1)
deepmd/utils/data.py (3)
deepmd/pt/modifier/base_modifier.py (1)
modify_data(66-117)deepmd/pt/utils/dataset.py (1)
preload_and_modify_all_data(71-72)deepmd/pt/utils/dataloader.py (1)
preload_and_modify_all_data(241-243)
🪛 Ruff (0.14.8)
source/tests/pt/test_data_modifier.py
52-52: Unused method argument: atype
(ARG002)
53-53: Unused method argument: box
(ARG002)
54-54: Unused method argument: fparam
(ARG002)
55-55: Unused method argument: aparam
(ARG002)
59-59: Unused method argument: data_sys
(ARG002)
89-89: Unused method argument: atype
(ARG002)
90-90: Unused method argument: box
(ARG002)
91-91: Unused method argument: fparam
(ARG002)
92-92: Unused method argument: aparam
(ARG002)
96-96: Unused method argument: data_sys
(ARG002)
deepmd/pt/modifier/base_modifier.py
66-66: Unused method argument: data_sys
(ARG002)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Test Python (3, 3.13)
- GitHub Check: Test Python (2, 3.10)
- GitHub Check: Test Python (1, 3.10)
- GitHub Check: Test Python (5, 3.10)
- GitHub Check: Test Python (2, 3.13)
- GitHub Check: Test Python (6, 3.13)
- GitHub Check: Test Python (5, 3.13)
- GitHub Check: Test Python (6, 3.10)
- GitHub Check: Test Python (4, 3.10)
- GitHub Check: Test Python (1, 3.13)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Test Python (4, 3.13)
- GitHub Check: Test Python (3, 3.10)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Test C++ (true, true, true, false)
- GitHub Check: Test C++ (false, true, true, false)
- GitHub Check: Test C++ (false, false, false, true)
- GitHub Check: Test C++ (true, false, false, true)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
🔇 Additional comments (8)
deepmd/utils/data.py (3)
142-147: LGTM! Safe initialization of modifier caching.The caching mechanism is initialized correctly with appropriate guards. The
_modified_frame_cacheis only created when a modifier exists, and all cache accesses inget_single_frameproperly checkself.modifier is not Nonebefore accessing the cache.
387-394: LGTM! Efficient cache lookup with proper guards.The early return from cache is well-guarded with all necessary conditions and provides a good optimization for repeated frame access.
498-514: LGTM! Well-implemented preload method with progress logging.The preload method correctly handles early returns, efficiently skips already-cached frames, and provides useful progress feedback. The implementation aligns well with the broader data loading pipeline shown in the related files.
deepmd/pt/modifier/base_modifier.py (1)
1-65: LGTM! Well-structured base class with proper abstractions.The class correctly inherits from
torch.nn.Moduleand uses@abstractmethodto enforce subclass implementation offorward(). The serialization methods follow standard patterns.source/tests/pt/test_data_modifier.py (4)
29-36: LGTM! Plugin registrations correctly implemented.Both modifier plugins are properly registered with the
modifier_args_pluginregistry and return empty argument lists, which is appropriate for these test modifiers.
39-73: LGTM! Test modifier correctly implements custom logic.The
ModifierRandomTesterclass correctly:
- Sets
modifier_type = "random_tester"matching its registration name (line 47)- Overrides
modify_datawith custom randomization logic (lines 59-73)- The
forwardmethod returns minimal data sincemodify_datais completely overridden and never calls the base implementationThe unused parameters flagged by static analysis (
atype,box,fparam,aparaminforward, anddata_sysinmodify_data) are required by the base class interface but not used in this simple test implementation. This is acceptable for test code.
76-110: LGTM! Test modifier correctly implements zeroing behavior.The
ModifierZeroTesterclass correctly setsmodifier_type = "zero_tester"matching its registration and implements the expected zeroing logic inmodify_data. LikeModifierRandomTester, it completely overrides the basemodify_dataimplementation, so the minimalforwardmethod is acceptable.
113-187: LGTM! Comprehensive test coverage for modifier integration.The test cases effectively verify:
test_init_modify_data- confirms thatzero_testercorrectly zeros out training and validation data (lines 130-147)test_full_modify_data- validates thatrandom_testerproduces consistent results before and aftertrainer.run(), which tests the caching mechanism (lines 149-178)The tests cover the critical integration points between modifiers, data loading, and the training pipeline. The cleanup in
tearDownis thorough.
149ddf3 to
d1c269d
Compare
- Add data modifier support in model inference pipeline - Enable saving and loading data modifiers with frozen models - Add ModifierScalingTester for scaling model predictions as data modification - Update test cases to verify data modifier functionality in inference - Enhance modifier argument registration with documentation This allows data modifiers to be applied during model inference and preserves them when saving frozen models for consistent behavior across training and inference stages.
d1c269d to
e898764
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
deepmd/pt/train/training.py (1)
244-254: Remove obsolete commented-out code.The signature of
single_model_statwas changed to remove_validation_dataand_data_requirementparameters, but the function body still contains commented-out code referencing these removed parameters. Since the data requirement logic has been moved outside this function (lines 340-349 for single-task, lines 384-395 for multi-task), this commented-out code is no longer needed and should be removed for consistency.🔎 Proposed fix
def single_model_stat( _model: Any, _data_stat_nbatch: int, _training_data: DpLoaderSet, _stat_file_path: str | None, finetune_has_new_type: bool = False, ) -> Callable[[], Any]: - # _data_requirement += get_additional_data_requirement(_model) - # _training_data.add_data_requirement(_data_requirement) - # if _validation_data is not None: - # _validation_data.add_data_requirement(_data_requirement) @functools.lru_cache def get_sample() -> Any:
🧹 Nitpick comments (3)
deepmd/pt/modifier/base_modifier.py (1)
115-115: Remove commented-out code.This appears to be leftover debugging code. The static analysis also flagged this as commented-out code.
🔎 Proposed fix
- # model = self.dp.to(DEVICE) prec = NP_PRECISION_DICT[RESERVED_PRECISION_DICT[GLOBAL_PT_FLOAT_PRECISION]]source/tests/pt/test_data_modifier.py (1)
89-91: Consider removing unnecessary__new__overrides.The
__new__methods in all three test modifiers simply delegate tosuper().__new__(cls)without any customization. These are likely artifacts from the plugin pattern but are redundant here since the base class already handles this.🔎 Proposed fix for ModifierRandomTester (apply similar pattern to others)
@BaseModifier.register("random_tester") class ModifierRandomTester(BaseModifier): - def __new__(cls, *args, **kwargs): - return super().__new__(cls) - def __init__( self, seed: int = 1,Also applies to: 133-134, 172-173
deepmd/pt/entrypoints/main.py (1)
387-390: Variable shadowing is acceptable here but could be clearer.The
modelparameter (a string path) is reassigned totester.model(atorch.nn.Module). While this works, using distinct names likemodel_pathandmodelwould improve readability. This is a minor stylistic point.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
deepmd/pt/entrypoints/main.pydeepmd/pt/infer/deep_eval.pydeepmd/pt/infer/inference.pydeepmd/pt/modifier/__init__.pydeepmd/pt/modifier/base_modifier.pydeepmd/pt/train/training.pydeepmd/pt/train/wrapper.pydeepmd/pt/utils/dataloader.pydeepmd/pt/utils/dataset.pydeepmd/utils/data.pysource/tests/pt/test_data_modifier.py
🚧 Files skipped from review as they are similar to previous changes (1)
- deepmd/pt/infer/inference.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-10-16T21:50:10.680Z
Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4226
File: deepmd/dpmodel/model/make_model.py:370-373
Timestamp: 2024-10-16T21:50:10.680Z
Learning: In `deepmd/dpmodel/model/make_model.py`, the variable `nall` assigned but not used is intentional and should not be flagged in future reviews.
Applied to files:
deepmd/pt/modifier/base_modifier.py
🧬 Code graph analysis (7)
deepmd/pt/entrypoints/main.py (2)
deepmd/pt/modifier/__init__.py (1)
get_data_modifier(17-23)deepmd/pt/infer/inference.py (1)
Tester(28-76)
deepmd/utils/data.py (4)
deepmd/pt/modifier/base_modifier.py (1)
modify_data(88-186)source/tests/pt/test_data_modifier.py (2)
modify_data(114-128)modify_data(153-167)deepmd/pt/utils/dataloader.py (1)
preload_and_modify_all_data(241-243)deepmd/pt/utils/dataset.py (1)
preload_and_modify_all_data(71-72)
deepmd/pt/modifier/base_modifier.py (3)
deepmd/dpmodel/modifier/base_modifier.py (1)
make_base_modifier(17-77)deepmd/utils/data.py (1)
DeepmdData(35-1061)source/tests/pt/test_data_modifier.py (5)
forward(102-112)forward(141-151)forward(187-209)modify_data(114-128)modify_data(153-167)
deepmd/pt/train/training.py (6)
deepmd/pt/loss/property.py (1)
label_requirement(209-221)deepmd/pt/loss/tensor.py (1)
label_requirement(173-207)deepmd/pd/train/training.py (1)
get_additional_data_requirement(1222-1246)deepmd/pt/utils/dataloader.py (2)
add_data_requirement(215-218)preload_and_modify_all_data(241-243)deepmd/pt/utils/dataset.py (2)
add_data_requirement(55-69)preload_and_modify_all_data(71-72)deepmd/utils/data.py (1)
preload_and_modify_all_data(499-514)
source/tests/pt/test_data_modifier.py (4)
deepmd/pt/infer/deep_eval.py (2)
DeepEval(87-863)eval(321-395)deepmd/pt/modifier/base_modifier.py (3)
BaseModifier(30-186)forward(76-85)modify_data(88-186)deepmd/utils/data.py (2)
DeepmdData(35-1061)add(150-203)deepmd/pt/train/wrapper.py (1)
forward(155-205)
deepmd/pt/infer/deep_eval.py (1)
deepmd/pt/train/wrapper.py (1)
ModelWrapper(16-217)
deepmd/pt/modifier/__init__.py (1)
deepmd/utils/plugin.py (1)
get_class_by_type(144-154)
🪛 GitHub Check: CodeQL
deepmd/pt/train/training.py
[notice] 253-254: Commented-out code
This comment appears to contain commented-out code.
🪛 Ruff (0.14.10)
deepmd/pt/modifier/base_modifier.py
88-88: Unused method argument: data_sys
(ARG002)
162-165: Avoid specifying long messages outside the exception class
(TRY003)
171-174: Avoid specifying long messages outside the exception class
(TRY003)
180-183: Avoid specifying long messages outside the exception class
(TRY003)
source/tests/pt/test_data_modifier.py
89-89: Unused static method argument: args
(ARG004)
89-89: Unused static method argument: kwargs
(ARG004)
104-104: Unused method argument: coord
(ARG002)
105-105: Unused method argument: atype
(ARG002)
106-106: Unused method argument: box
(ARG002)
107-107: Unused method argument: fparam
(ARG002)
108-108: Unused method argument: aparam
(ARG002)
109-109: Unused method argument: do_atomic_virial
(ARG002)
114-114: Unused method argument: data_sys
(ARG002)
133-133: Unused static method argument: args
(ARG004)
133-133: Unused static method argument: kwargs
(ARG004)
143-143: Unused method argument: coord
(ARG002)
144-144: Unused method argument: atype
(ARG002)
145-145: Unused method argument: box
(ARG002)
146-146: Unused method argument: fparam
(ARG002)
147-147: Unused method argument: aparam
(ARG002)
148-148: Unused method argument: do_atomic_virial
(ARG002)
153-153: Unused method argument: data_sys
(ARG002)
172-172: Unused static method argument: args
(ARG004)
172-172: Unused static method argument: kwargs
(ARG004)
deepmd/pt/modifier/__init__.py
22-22: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: Test Python (6, 3.10)
- GitHub Check: Test Python (6, 3.13)
- GitHub Check: Test Python (5, 3.10)
- GitHub Check: Test Python (3, 3.10)
- GitHub Check: Test Python (4, 3.13)
- GitHub Check: Test Python (5, 3.13)
- GitHub Check: Test Python (3, 3.13)
- GitHub Check: Test Python (2, 3.10)
- GitHub Check: Test Python (1, 3.10)
- GitHub Check: Test Python (2, 3.13)
- GitHub Check: Test Python (1, 3.13)
- GitHub Check: Test Python (4, 3.10)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Test C++ (true, false, false, true)
- GitHub Check: Test C++ (false, false, false, true)
- GitHub Check: Test C++ (true, true, true, false)
- GitHub Check: Test C++ (false, true, true, false)
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (c-cpp)
🔇 Additional comments (22)
deepmd/pt/modifier/__init__.py (1)
17-23: LGTM!The factory function correctly deep-copies the input to avoid mutation, validates the required "type" key, and delegates to the plugin registry. The
from Nonepattern cleanly suppresses the originalKeyErrortraceback.deepmd/pt/modifier/base_modifier.py (1)
87-186: Solid implementation of the data modification pipeline.The
modify_datamethod properly handles tensor conversion, calls the abstractforwardmethod, validates output keys, and applies the modifications. Thedata_sysparameter is unused here but is part of the interface contract for subclasses that may need access to the data system.source/tests/pt/test_data_modifier.py (2)
296-362: Well-structured inference test with clear verification logic.The test properly:
- Creates a baseline model and freezes it for use as the modifier.
- Creates a modified model with the scaling modifier.
- Verifies the relationship:
output_model - sfactor * output_modifier = output_ref.The test provides good coverage of the modifier-in-inference path.
216-295: Good test coverage for modifier behavior.The parameterized tests effectively cover:
test_init_modify_data: Verifies modifier is applied during initialization.test_full_modify_data: Ensures modifier is applied exactly once (not reapplied on subsequent accesses).The use of
@parameterizedwith different batch sizes provides good coverage across edge cases.deepmd/utils/data.py (3)
143-148: Good initialization of modifier caching infrastructure.The code properly:
- Defaults
apply_modifier_at_loadtoTrue.- Allows modifiers to override this behavior via their own attribute.
- Initializes the cache only when a modifier is present.
Minor note: The
_modified_frame_cacheis only initialized whenmodifier is not None, so ensure all cache accesses are guarded by the same condition (which they are in lines 388-394 and 490-495).
490-495: Correct modifier application and caching logic.The implementation correctly:
- Always applies the modifier when present.
- Only caches when
apply_modifier_at_loadisTrue.- Uses
copy.deepcopyto ensure cached frames are independent snapshots.This addresses the previous review feedback about shallow vs. deep copy.
499-514: Useful preloading method for performance optimization.The
preload_and_modify_all_datamethod:
- Correctly guards against no-op scenarios (no modifier or not applying at load).
- Provides progress logging every 100 frames.
- Leverages
get_single_framewhich handles caching internally.One consideration: For large datasets, memory usage could be significant since all frames are cached. This is a reasonable trade-off documented by the method's purpose.
deepmd/pt/infer/deep_eval.py (1)
174-189: Clean implementation of modifier loading from frozen model.The code correctly:
- Extracts the modifier from
extra_filesembedded in the.pthfile.- Uses
io.BytesIOfor in-memory loading, avoiding temp file cleanup issues.- Handles both cases: modifier present and absent.
- Passes the modifier to
ModelWrapperfor inference-time application.deepmd/pt/utils/dataset.py (2)
22-44: Clean integration of modifier into dataset constructor.The modifier parameter is properly:
- Type-hinted with
BaseModifier | None.- Stored as an instance attribute.
- Forwarded to the underlying
DeepmdDataconstructor.- Documented in the docstring.
71-72: Simple delegation pattern for preload method.The
preload_and_modify_all_datamethod correctly delegates to the underlying data system. This maintains the layered architecture whereDeepmdDataSetForLoaderis a thin wrapper.deepmd/pt/train/wrapper.py (2)
17-62: Good extension of ModelWrapper to support modifiers.The constructor properly:
- Accepts an optional
modifierparameter.- Stores it as an instance attribute.
- Keeps it separate from the model/loss module dictionaries.
The modifier is typed as
torch.nn.Module | None, which is appropriate since modifiers extendtorch.nn.ModuleviaBaseModifier.
189-195: Correct application of modifier during inference.The modifier is applied only during inference (
self.inference_only or inference_only), not during training. This is intentional: during training, the modifier adjusts the training data labels (viamodify_data), while during inference, the modifier output is subtracted from the model output to produce the corrected prediction.The subtraction
model_pred[k] = model_pred[k] - vcorrectly reverses the modification applied to training labels.deepmd/pt/entrypoints/main.py (2)
118-122: Clean integration of modifier creation in trainer preparation.The code properly:
- Extracts modifier params from model configuration.
- Creates the modifier only when parameters are present.
- Moves the modifier to the target device.
This follows the pattern established in
deepmd/pt/infer/inference.py.
391-407: Excellent use of in-memory buffer for modifier serialization.This implements the suggestion from the past review to use
io.BytesIOinstead of temporary files:
- Creates an in-memory buffer.
- Serializes the JIT-scripted modifier to the buffer.
- Embeds the buffer contents in
extra_files.This approach is cleaner and avoids temp file cleanup issues entirely.
deepmd/pt/utils/dataloader.py (4)
28-30: LGTM!The import of
BaseModifieris clean and necessary for the type annotation in theDpLoaderSetconstructor.
89-89: LGTM!The optional
modifierparameter is appropriately typed and defaults toNone, making the data modification feature opt-in.
97-102: LGTM!The modifier is correctly forwarded to each
DeepmdDataSetForLoaderinstance during construction, ensuring consistent modifier application across all systems.
241-243: LGTM!The convenience method correctly delegates preloading and modification to each system. Based on the training flow in
deepmd/pt/train/training.py, this method is called after data requirements are added and before model statistics are computed, ensuring statistics are computed on modified data.deepmd/pt/train/training.py (4)
340-349: LGTM! Correct ordering of data preparation steps.The sequence is now properly ordered:
- Construct data requirements from loss labels and model parameters
- Add requirements to training and validation datasets
- Preload and modify all data
- Compute model statistics (line 350)
This ensures that model statistics are computed on the modified data, which addresses the concern raised in previous reviews.
350-358: LGTM!The call to
single_model_statis consistent with the updated signature, correctly omitting the removed_validation_dataand_data_requirementparameters.
384-395: LGTM! Multi-task path mirrors single-task logic correctly.The data preparation steps are correctly applied per model key, maintaining the proper sequence: construct requirements → add to datasets → preload and modify → compute statistics. This ensures each model branch's statistics are computed on appropriately modified data.
396-406: LGTM!The per-model-key call to
single_model_statis consistent with the updated signature and correctly parameterized for the multi-task scenario.
18508da to
e4d5ce2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
deepmd/pt/train/training.py (3)
244-267: Document the precondition for single_model_stat.The refactored
single_model_statfunction now assumes that data requirements have already been added to_training_databefore invocation. Consider adding a docstring or inline comment documenting this precondition to help future maintainers understand the expected calling pattern.Suggested documentation
def single_model_stat( _model: Any, _data_stat_nbatch: int, _training_data: DpLoaderSet, _stat_file_path: str | None, finetune_has_new_type: bool = False, ) -> Callable[[], Any]: + """Compute or load model statistics from training data. + + Note: Assumes data requirements have already been added to _training_data + and data has been preloaded/modified if needed. + """ @functools.lru_cache def get_sample() -> Any:
335-344: Data requirement and modification flow looks correct.The sequence is proper: data requirements are computed and attached, data is preloaded and modified, then statistics are computed. This correctly addresses the earlier concern about computing statistics on modified data.
Consider adding a brief comment before the
preload_and_modify_all_data()calls to document their purpose, such as:training_data.add_data_requirement(data_requirement) if validation_data is not None: validation_data.add_data_requirement(data_requirement) + # Preload and apply modifiers to all data before computing statistics training_data.preload_and_modify_all_data()This would help future maintainers understand why this step occurs before
single_model_stat.
379-390: Multi-task data requirement and modification flow is consistent.The multi-task path correctly mirrors the single-task logic, applying the same data requirement and modification flow for each model key. The per-model data requirement calculation ensures each task gets the correct data.
As with the single-task path, consider adding a comment before the
preload_and_modify_all_data()calls to document the purpose of this step in the data preparation pipeline.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
deepmd/pt/train/training.py
🧰 Additional context used
🧬 Code graph analysis (1)
deepmd/pt/train/training.py (7)
deepmd/pt/loss/ener_spin.py (1)
label_requirement(287-340)deepmd/pt/loss/tensor.py (1)
label_requirement(173-207)deepmd/pt/loss/ener.py (2)
label_requirement(410-485)label_requirement(602-615)deepmd/pd/train/training.py (2)
get_additional_data_requirement(1222-1246)single_model_stat(209-239)deepmd/pt/utils/dataset.py (2)
add_data_requirement(55-69)preload_and_modify_all_data(71-72)deepmd/pt/utils/dataloader.py (2)
add_data_requirement(215-218)preload_and_modify_all_data(241-243)deepmd/utils/finetune.py (1)
get_has_new_type(43-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: Test Python (4, 3.13)
- GitHub Check: Test Python (2, 3.13)
- GitHub Check: Test Python (5, 3.10)
- GitHub Check: Test Python (6, 3.13)
- GitHub Check: Test Python (3, 3.10)
- GitHub Check: Test Python (6, 3.10)
- GitHub Check: Test Python (2, 3.10)
- GitHub Check: Test Python (3, 3.13)
- GitHub Check: Test Python (1, 3.10)
- GitHub Check: Test Python (5, 3.13)
- GitHub Check: Test Python (4, 3.10)
- GitHub Check: Test Python (1, 3.13)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Test C++ (true, false, false, true)
- GitHub Check: Test C++ (true, true, true, false)
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Test C++ (false, true, true, false)
- GitHub Check: Analyze (python)
- GitHub Check: Test C++ (false, false, false, true)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
🔇 Additional comments (1)
deepmd/pt/train/training.py (1)
1365-1398: LGTM: Additional data requirement extraction is well-implemented.The function correctly extracts additional data requirements (fparam, aparam, spin) from the model, properly handling optional parameters with defaults and required parameters. The defensive check for callable
has_spinis good practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
deepmd/pt/modifier/base_modifier.py (2)
117-119: Simplify unnecessary tiling operation.Since
nframesis hardcoded to1, the tiling and reshaping on line 119 is redundant. The operationnp.tile(data["atype"], 1).reshape(1, -1)is equivalent todata["atype"].reshape(1, -1).🔎 Proposed simplification
nframes = 1 natoms = len(data["atype"]) -atom_types = np.tile(data["atype"], nframes).reshape(nframes, -1) +atom_types = data["atype"].reshape(nframes, -1)
147-147: Consider documenting whydo_atomic_virialis hardcoded toFalse.The parameter is always set to
Falsewhen callingforward(). If this is intentional (e.g., atomic virial modifications are not supported in the current implementation), consider adding a comment explaining this limitation.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
deepmd/pt/modifier/base_modifier.pydeepmd/pt/train/training.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-10-16T21:50:10.680Z
Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4226
File: deepmd/dpmodel/model/make_model.py:370-373
Timestamp: 2024-10-16T21:50:10.680Z
Learning: In `deepmd/dpmodel/model/make_model.py`, the variable `nall` assigned but not used is intentional and should not be flagged in future reviews.
Applied to files:
deepmd/pt/modifier/base_modifier.py
🧬 Code graph analysis (1)
deepmd/pt/train/training.py (7)
deepmd/pt/loss/dos.py (1)
label_requirement(243-266)deepmd/pt/loss/ener_spin.py (1)
label_requirement(287-340)deepmd/pt/loss/property.py (1)
label_requirement(209-221)deepmd/pt/loss/tensor.py (1)
label_requirement(173-207)deepmd/pd/train/training.py (2)
get_additional_data_requirement(1222-1246)single_model_stat(209-239)deepmd/pt/utils/dataloader.py (2)
add_data_requirement(215-218)preload_and_modify_all_data(241-243)deepmd/utils/data.py (1)
preload_and_modify_all_data(499-514)
🪛 Ruff (0.14.10)
deepmd/pt/modifier/base_modifier.py
88-88: Unused method argument: data_sys
(ARG002)
161-164: Avoid specifying long messages outside the exception class
(TRY003)
170-173: Avoid specifying long messages outside the exception class
(TRY003)
179-182: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Test Python (4, 3.10)
- GitHub Check: Test Python (6, 3.10)
- GitHub Check: Test Python (6, 3.13)
- GitHub Check: Test Python (5, 3.10)
- GitHub Check: Test Python (4, 3.13)
- GitHub Check: Test Python (5, 3.13)
- GitHub Check: Test Python (1, 3.10)
- GitHub Check: Test Python (3, 3.10)
- GitHub Check: Test Python (1, 3.13)
- GitHub Check: Test Python (2, 3.10)
- GitHub Check: Test Python (2, 3.13)
- GitHub Check: Test Python (3, 3.13)
- GitHub Check: Test C++ (true, false, false, true)
- GitHub Check: Test C++ (false, true, true, false)
- GitHub Check: Test C++ (true, true, true, false)
- GitHub Check: Test C++ (false, false, false, true)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Build C++ (clang, clang)
🔇 Additional comments (9)
deepmd/pt/modifier/base_modifier.py (6)
31-35: LGTM!The initialization is clean and sets appropriate defaults for the base modifier.
37-50: LGTM!Serialization correctly includes class metadata and version information.
52-72: LGTM!The deserialization correctly removes metadata keys before passing to the constructor. This pattern works well for the base class and provides a foundation for subclasses.
74-85: LGTM!The abstract
forwardmethod is properly declared and provides clear interface expectations for subclasses.
88-88: Unused parameterdata_sysis likely intentional.Static analysis flags
data_sysas unused. This is acceptable for a base class method signature that subclasses might override and use, or for future extensibility.
159-185: LGTM!The error checking for missing keys in
modifier_datais comprehensive and provides clear diagnostic messages. The reshape operations ensure shape compatibility.deepmd/pt/train/training.py (3)
335-344: LGTM! Data modification flow is correctly ordered.The data requirements are properly constructed from loss labels and additional model requirements, then applied to both training and validation datasets. Calling
preload_and_modify_all_data()beforesingle_model_stat()ensures that statistics are computed on modified data, which is the correct behavior for the modifier plugin.
379-390: LGTM! Multi-task data modification flow mirrors single-task correctly.The multi-task path properly handles per-model data requirements and modification, maintaining the correct ordering (requirements → preload/modify → statistics → data loader) for each model branch.
1365-1398: LGTM! Well-implemented helper function.The
get_additional_data_requirementfunction properly aggregates optional data requirements (fparam, aparam, spin). The fparam handling is particularly well done, supporting optional defaults whenhas_default_fparam()returns True, which is more sophisticated than the equivalent PaddlePaddle implementation.
deepmd/utils/data.py
Outdated
| # The prefix sum stores the range of indices contained in each directory, which is needed by get_item method | ||
| self.prefix_sum = np.cumsum(frames_list).tolist() | ||
|
|
||
| self.apply_modifier_at_load = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache may not work for large dataset due to memory limitation. I would suggest providing users with option to control this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I make this flag of using cache as user-defined. However, there is still an issue about CUDA re-initializiation if the modifier involve torch models. Currently, I skip the UT for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CUDA re-init issue has been fixed.
…to save the data modification before training or to perform modification on-the-fly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
deepmd/utils/data.py (1)
490-496: Consider returning the cached copy to avoid an extra deep copy.The current pattern applies the modifier to
frame_data, then deep copies it into the cache, and returns the originalframe_data. This means non-cached frames are returned directly (mutable), while subsequent calls return cached copies. For consistency and to avoid the extra deep copy operation:🔎 Proposed optimization
if self.modifier is not None: # Apply modifier if it exists self.modifier.modify_data(frame_data, self) if self.use_modifier_cache: # Cache the modified frame to avoid recomputation - self._modified_frame_cache[index] = copy.deepcopy(frame_data) + cached_frame = copy.deepcopy(frame_data) + self._modified_frame_cache[index] = cached_frame + return cached_frame return frame_data
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
deepmd/pt/modifier/base_modifier.pydeepmd/pt/train/training.pydeepmd/utils/data.pysource/tests/pt/test_data_modifier.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-10-16T21:50:10.680Z
Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4226
File: deepmd/dpmodel/model/make_model.py:370-373
Timestamp: 2024-10-16T21:50:10.680Z
Learning: In `deepmd/dpmodel/model/make_model.py`, the variable `nall` assigned but not used is intentional and should not be flagged in future reviews.
Applied to files:
deepmd/pt/modifier/base_modifier.py
🧬 Code graph analysis (2)
deepmd/pt/modifier/base_modifier.py (1)
deepmd/dpmodel/modifier/base_modifier.py (1)
make_base_modifier(17-77)
deepmd/utils/data.py (4)
deepmd/pt/modifier/base_modifier.py (1)
modify_data(90-187)source/tests/pt/test_data_modifier.py (2)
modify_data(123-137)modify_data(165-179)deepmd/pt/utils/dataset.py (1)
preload_and_modify_all_data(71-72)deepmd/pt/utils/dataloader.py (1)
preload_and_modify_all_data(241-243)
🪛 Ruff (0.14.10)
deepmd/pt/modifier/base_modifier.py
90-90: Unused method argument: data_sys
(ARG002)
163-166: Avoid specifying long messages outside the exception class
(TRY003)
172-175: Avoid specifying long messages outside the exception class
(TRY003)
181-184: Avoid specifying long messages outside the exception class
(TRY003)
source/tests/pt/test_data_modifier.py
96-96: Unused static method argument: args
(ARG004)
96-96: Unused static method argument: kwargs
(ARG004)
113-113: Unused method argument: coord
(ARG002)
114-114: Unused method argument: atype
(ARG002)
115-115: Unused method argument: box
(ARG002)
116-116: Unused method argument: fparam
(ARG002)
117-117: Unused method argument: aparam
(ARG002)
118-118: Unused method argument: do_atomic_virial
(ARG002)
123-123: Unused method argument: data_sys
(ARG002)
142-142: Unused static method argument: args
(ARG004)
142-142: Unused static method argument: kwargs
(ARG004)
155-155: Unused method argument: coord
(ARG002)
156-156: Unused method argument: atype
(ARG002)
157-157: Unused method argument: box
(ARG002)
158-158: Unused method argument: fparam
(ARG002)
159-159: Unused method argument: aparam
(ARG002)
160-160: Unused method argument: do_atomic_virial
(ARG002)
165-165: Unused method argument: data_sys
(ARG002)
184-184: Unused static method argument: args
(ARG004)
184-184: Unused static method argument: kwargs
(ARG004)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: Test Python (4, 3.10)
- GitHub Check: Test Python (2, 3.13)
- GitHub Check: Test Python (5, 3.10)
- GitHub Check: Test Python (6, 3.13)
- GitHub Check: Test Python (2, 3.10)
- GitHub Check: Test Python (6, 3.10)
- GitHub Check: Test Python (5, 3.13)
- GitHub Check: Test Python (4, 3.13)
- GitHub Check: Test Python (1, 3.13)
- GitHub Check: Test Python (3, 3.13)
- GitHub Check: Test Python (1, 3.10)
- GitHub Check: Test Python (3, 3.10)
- GitHub Check: Test C++ (false, true, true, false)
- GitHub Check: Analyze (python)
- GitHub Check: Test C++ (true, false, false, true)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Test C++ (false, false, false, true)
- GitHub Check: Test C++ (true, true, true, false)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build wheels for cp310-manylinux_aarch64
🔇 Additional comments (23)
deepmd/pt/modifier/base_modifier.py (7)
30-38: LGTM! Clean base class initialization.The class properly inherits from both
torch.nn.Moduleand the dynamically created base modifier. Theuse_cacheparameter is correctly stored for controlling modification caching behavior.
39-52: LGTM! Serialization provides essential metadata.The serialize method correctly includes class, type, and version metadata needed for deserialization and plugin registry lookup.
54-74: LGTM! Deserialization properly handles metadata removal.The fix from past reviews is correctly implemented - metadata keys are popped before passing to constructor.
76-88: LGTM! Abstract forward method with JIT export.The abstract method is correctly declared with
@abstractmethodand@torch.jit.exportdecorators. The return type annotationdict[str, torch.Tensor]properly describes the expected output format.
89-116: Thedata_sysparameter is unused but part of the API contract.The static analysis correctly identifies
data_sysas unused (ARG002). However, this parameter is intentionally part of the method signature to provide subclasses access to theDeepmdDatainstance when they overridemodify_data. This is acceptable for an extensible API.
117-159: LGTM! Robust input tensor construction.The implementation correctly:
- Handles single-frame data with
nframes=1- Reshapes coordinates and box to expected formats
- Uses appropriate precision and device settings
- Handles optional
fparamandaparaminputs
161-187: LGTM! Proper validation of modifier outputs.The implementation correctly validates that required outputs are present before applying modifications. The KeyError messages are descriptive and helpful for debugging, which justifies the slightly long exception messages flagged by static analysis.
source/tests/pt/test_data_modifier.py (9)
64-91: LGTM! Plugin registration for test modifiers.The modifier argument definitions are properly registered with the plugin system and include appropriate documentation strings. The
use_cacheparameter is consistently available across all modifier types.
94-138: LGTM! ModifierRandomTester implementation is correct.The test modifier properly:
- Uses a seeded RNG for deterministic behavior
- Implements the abstract
forwardmethod (empty return is valid for test purposes sincemodify_datais overridden)- Correctly scales energy, force, and virial in
modify_dataThe unused
forwardparameters are acceptable since this test modifier overridesmodify_datadirectly.
140-180: LGTM! ModifierZeroTester implementation is correct.The zeroing logic using
data["energy"] -= data["energy"]is a valid way to zero out values while preserving the array structure and dtype.
182-223: LGTM! ModifierScalingTester demonstrates model-based modification.This test modifier shows the intended use case of applying a frozen model's predictions as data modifications. The implementation correctly:
- Loads a JIT-scripted model
- Calls the model with proper arguments
- Scales the outputs by
sfactor
225-273: LGTM! Test setup and basic modification test.The parameterized test setup is well-structured.
test_init_modify_datacorrectly verifies that the zero_tester produces all-zero outputs.
275-312: LGTM! Test verifies modification is applied only once.
test_full_modify_datacorrectly validates that data modification is idempotent - comparing data before and aftertrainer.run()ensures the modifier is not applied multiple times.
314-381: LGTM! Inference test validates end-to-end modifier integration.The test correctly verifies that
output_model - sfactor * output_modifierequals the reference model output, confirming proper modifier integration during inference.
383-398: LGTM! Robust tearDown with error handling.The cleanup properly handles potential failures during file removal, ensuring all cleanup attempts are made even if individual deletions fail.
400-499: LGTM! Well-designed helper methods for test data management.The helper methods
get_dataset_nframes,get_sampled_data, andcheck_sampled_dataare well-structured and provide clear abstractions for test data handling.deepmd/utils/data.py (4)
5-5: LGTM! Correct import for deep copying.The
copymodule import is properly added to support thecopy.deepcopy()call for cache immutability.
143-148: LGTM! Conditional cache initialization.The cache is only initialized when a modifier is present, avoiding unnecessary memory allocation for non-modifier use cases. The
use_modifier_cacheflag correctly respects the modifier's preference when available.
388-395: LGTM! Efficient cache lookup with early return.The cache check correctly requires both
use_modifier_cachebeing true and the modifier being present before attempting the cache lookup.
498-513: LGTM! Preload method with progress logging.The method properly:
- Short-circuits when caching is disabled or no modifier exists
- Iterates through all frames and populates the cache
- Provides periodic progress updates every 100 frames
deepmd/pt/train/training.py (3)
335-353: LGTM! Correct ordering: requirements → preload → statistics.The implementation correctly:
- Adds data requirements to training/validation data
- Calls
preload_and_modify_all_data()to apply modifiers- Then computes statistics via
single_model_statThis ensures model statistics are computed on the modified data, which is the intended behavior.
379-401: LGTM! Multi-task path mirrors single-task logic.The per-model-key handling correctly applies the same pattern: add requirements, preload and modify, then compute statistics. The validation data check (
if validation_data[model_key] is not None) properly handles cases where validation data may not exist for all model keys.
244-267: LGTM! Simplifiedsingle_model_statsignature.The function signature has been correctly simplified by removing
_validation_dataand_data_requirementparameters, as data requirements are now attached before this function is called and validation data is not needed for computing statistics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
deepmd/utils/data.py (1)
501-513: Exceptions from modifier will be silently swallowed.The
executor.submit()returns a Future, but without calling.result()on it, any exception raised bymodify_datais silently lost. While thewithblock will wait for completion, it won't propagate exceptions.This could lead to data being cached/returned in an inconsistent state if the modifier fails.
🔎 Proposed fix to capture and propagate exceptions
if self.modifier is not None: with ThreadPoolExecutor(max_workers=num_worker) as executor: # Apply modifier if it exists - executor.submit( + future = executor.submit( self.modifier.modify_data, frame_data, self, ) + # Wait for completion and propagate any exceptions + future.result() if self.use_modifier_cache: # Cache the modified frame to avoid recomputation self._modified_frame_cache[index] = copy.deepcopy(frame_data)
🧹 Nitpick comments (1)
deepmd/utils/data.py (1)
515-530: Consider batching ThreadPoolExecutor creation for preload efficiency.Currently, each frame creates a new
ThreadPoolExecutoringet_single_frame. For large datasets, this overhead adds up. Consider creating the executor once in this method and passing it down, or processing frames in parallel batches.However, per the PR description, the ThreadPoolExecutor is specifically used to avoid CUDA re-initialization, so the current approach may be intentional for isolation.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
deepmd/pd/utils/dataset.pydeepmd/pt/train/training.pydeepmd/pt/utils/dataloader.pydeepmd/pt/utils/dataset.pydeepmd/utils/data.pysource/tests/pt/test_data_modifier.py
🚧 Files skipped from review as they are similar to previous changes (1)
- deepmd/pt/utils/dataloader.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-10-08T15:32:11.479Z
Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4144
File: source/api_cc/tests/test_deeppot_dpa_pt.cc:166-246
Timestamp: 2024-10-08T15:32:11.479Z
Learning: Refactoring between test classes `TestInferDeepPotDpaPt` and `TestInferDeepPotDpaPtNopbc` is addressed in PR #3905.
Applied to files:
source/tests/pt/test_data_modifier.py
🧬 Code graph analysis (4)
deepmd/pt/train/training.py (2)
deepmd/pd/train/training.py (1)
get_additional_data_requirement(1222-1246)deepmd/pt/utils/dataset.py (2)
add_data_requirement(58-72)preload_and_modify_all_data_torch(74-75)
deepmd/pt/utils/dataset.py (4)
deepmd/pt/modifier/base_modifier.py (1)
BaseModifier(30-187)deepmd/utils/data.py (4)
DataRequirementItem(1080-1160)DeepmdData(35-1077)get_item_torch(256-270)preload_and_modify_all_data_torch(515-530)deepmd/pd/utils/dataset.py (1)
DeepmdDataSetForLoader(17-56)deepmd/pt/utils/dataloader.py (1)
preload_and_modify_all_data_torch(241-243)
deepmd/pd/utils/dataset.py (1)
deepmd/utils/data.py (1)
get_item_paddle(272-287)
deepmd/utils/data.py (4)
source/tests/pt/test_data_modifier.py (2)
modify_data(123-137)modify_data(165-179)deepmd/pt/modifier/base_modifier.py (1)
modify_data(90-187)deepmd/pt/utils/dataloader.py (1)
preload_and_modify_all_data_torch(241-243)deepmd/pt/utils/dataset.py (1)
preload_and_modify_all_data_torch(74-75)
🪛 Ruff (0.14.10)
source/tests/pt/test_data_modifier.py
96-96: Unused static method argument: args
(ARG004)
96-96: Unused static method argument: kwargs
(ARG004)
113-113: Unused method argument: coord
(ARG002)
114-114: Unused method argument: atype
(ARG002)
115-115: Unused method argument: box
(ARG002)
116-116: Unused method argument: fparam
(ARG002)
117-117: Unused method argument: aparam
(ARG002)
118-118: Unused method argument: do_atomic_virial
(ARG002)
123-123: Unused method argument: data_sys
(ARG002)
142-142: Unused static method argument: args
(ARG004)
142-142: Unused static method argument: kwargs
(ARG004)
155-155: Unused method argument: coord
(ARG002)
156-156: Unused method argument: atype
(ARG002)
157-157: Unused method argument: box
(ARG002)
158-158: Unused method argument: fparam
(ARG002)
159-159: Unused method argument: aparam
(ARG002)
160-160: Unused method argument: do_atomic_virial
(ARG002)
165-165: Unused method argument: data_sys
(ARG002)
184-184: Unused static method argument: args
(ARG004)
184-184: Unused static method argument: kwargs
(ARG004)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: Test Python (5, 3.10)
- GitHub Check: Test Python (6, 3.13)
- GitHub Check: Test Python (4, 3.13)
- GitHub Check: Test Python (1, 3.13)
- GitHub Check: Test Python (2, 3.10)
- GitHub Check: Test Python (5, 3.13)
- GitHub Check: Test Python (6, 3.10)
- GitHub Check: Test Python (4, 3.10)
- GitHub Check: Test Python (3, 3.10)
- GitHub Check: Test Python (1, 3.10)
- GitHub Check: Test Python (3, 3.13)
- GitHub Check: Test Python (2, 3.13)
- GitHub Check: Test C++ (false, true, true, false)
- GitHub Check: Test C++ (false, false, false, true)
- GitHub Check: Test C++ (true, true, true, false)
- GitHub Check: Test C++ (true, false, false, true)
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Analyze (python)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build C++ (cpu, cpu)
🔇 Additional comments (16)
deepmd/pd/utils/dataset.py (1)
8-10: Review NUM_WORKERS default value logic.
NUM_WORKERSis properly defined and exported indeepmd/pd/utils/env.pyat line 28. However, the default value calculation appears problematic:min(0, ncpus)will always evaluate to 0 regardless of the number of CPUs available. This should likely bemax(0, ncpus)to meaningfully default to the number of CPUs, unless defaulting to 0 as a sentinel value (treated as 1 by the dataloader) is intentional.deepmd/pt/train/training.py (2)
334-360: LGTM - Data modification correctly applied before statistics computation.The data modification workflow is properly sequenced:
- Build
data_requirementfrom loss labels and model requirements- Add requirements to training/validation data
- Preload and apply modifiers via
preload_and_modify_all_data_torch()- Then compute statistics via
single_model_stat()This addresses the previous Copilot comment about ordering - statistics are now computed on modified data.
378-401: Multi-task data modification follows consistent pattern.The multi-task path correctly mirrors the single-task implementation, applying data modification per model key before computing statistics.
source/tests/pt/test_data_modifier.py (7)
64-91: Plugin registration functions are well-documented.The registration functions properly document parameters with clear descriptions. The
use_cacheparameter aligns with the broader modifier framework.
94-138: Test modifier implementation is appropriate.The unused parameters flagged by static analysis (
args,kwargsin__new__, andcoord,atype, etc. inforward) are required for interface compliance withBaseModifier. Theforwardmethod returning an empty dict is intentional since this modifier only operates viamodify_datafor testing the data pipeline.
140-179: Zero tester modifier correctly implements data zeroing.The implementation properly subtracts values from themselves to zero out data, which is a valid approach for testing that modifications are applied.
182-222: Scaling tester provides realistic model-based modification.This modifier appropriately loads a frozen model and applies scaled predictions, testing the integration of model inference within the modifier framework.
253-312: Test coverage appropriately validates modifier behavior.The tests verify:
test_init_modify_data: Modifications are applied during trainer initializationtest_full_modify_data: Modifications are applied exactly once (comparing before/after training)Both use parameterized batch sizes and cache settings for thorough coverage.
314-381: Inference test validates end-to-end modifier integration.The test properly validates the mathematical relationship:
output_model = output_ref - sfactor * output_modifier, confirming that modifier predictions are correctly applied during inference.
383-398: Robust tearDown implementation.The try-except block ensures cleanup continues even if individual file removals fail, addressing the past review suggestion.
deepmd/pt/utils/dataset.py (3)
24-47: Clean modifier integration into dataset wrapper.The modifier is properly propagated to the underlying
DeepmdDatasystem, maintaining a clean delegation pattern.
52-56: Correct NUM_WORKERS propagation.The
NUM_WORKERSparameter is now properly passed toget_item_torch, enabling parallel data modification as designed.
73-75: Preload method delegates correctly.Simple delegation to the underlying data system's preload method, maintaining the wrapper pattern.
deepmd/utils/data.py (3)
143-148: Cache initialization correctly handles optional modifier settings.The code properly checks for
use_cacheattribute on the modifier and initializes the frame cache dictionary when a modifier is present.
256-287: Consistent interface updates for parallel data modification.Both
get_item_torchandget_item_paddleproperly pass thenum_workerparameter toget_single_frame, maintaining API consistency.
398-406: Efficient cache lookup with early return.The cache check occurs before any data loading, avoiding unnecessary I/O when modified frames are already cached.
…dification during training.
20b8739 to
23eb6a5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
source/tests/pt/test_data_modifier.py (4)
65-83: Minor inconsistency in return type annotations.
modifier_random_tester()andmodifier_zero_tester()uselistwhilemodifier_scaling_tester()useslist[Argument]. Consider using the more specific type hint consistently.🔎 Proposed fix
@modifier_args_plugin.register("random_tester", doc=doc_random_tester) -def modifier_random_tester() -> list: +def modifier_random_tester() -> list[Argument]: doc_seed = "Random seed used to initialize the random number generator for deterministic scaling factors." doc_use_cache = "Whether to cache modified frames to improve performance by avoiding recomputation." return [ Argument("seed", int, optional=True, doc=doc_seed), Argument("use_cache", bool, optional=True, doc=doc_use_cache), ] @modifier_args_plugin.register("zero_tester", doc=doc_zero_tester) -def modifier_zero_tester() -> list: +def modifier_zero_tester() -> list[Argument]: doc_use_cache = "Whether to cache modified frames to improve performance by avoiding recomputation." return [ Argument("use_cache", bool, optional=True, doc=doc_use_cache), ]
174-179: Consider clearer zeroing syntax.Using
data["energy"] -= data["energy"]works butdata["energy"] *= 0or explicit zero assignment would be more readable. The current approach does preserve shape/dtype which may be intentional.🔎 Proposed alternative
if "find_energy" in data and data["find_energy"] == 1.0: - data["energy"] -= data["energy"] + data["energy"] *= 0 if "find_force" in data and data["find_force"] == 1.0: - data["force"] -= data["force"] + data["force"] *= 0 if "find_virial" in data and data["find_virial"] == 1.0: - data["virial"] -= data["virial"] + data["virial"] *= 0
378-381: Consider specifying tolerance for floating-point comparison.The
np.testing.assert_allcloseuses defaultrtol=1e-7which may be too strict for GPU/floating-point arithmetic after model inference and scaling operations. Consider specifying explicit tolerances.🔎 Proposed fix
for ii in range(3): np.testing.assert_allclose( - model_pred[ii], model_pred_ref[ii] - sfactor * modifier_pred[ii] + model_pred[ii], model_pred_ref[ii] - sfactor * modifier_pred[ii], + rtol=1e-5, atol=1e-8 )
489-498: Potential KeyError if test_data is missing system or frame IDs.The method assumes
test_datacontains the same system IDs and frame IDs asref_data. If data collection is incomplete,test_data[sid]ortest_sys[fid]could raiseKeyError. Consider adding defensive checks or a clearer assertion message.🔎 Proposed fix
for sid in ref_data.keys(): + assert sid in test_data, f"System ID {sid} missing from test data" ref_sys = ref_data[sid] test_sys = test_data[sid] for fid in ref_sys.keys(): + assert fid in test_sys, f"Frame ID {fid} missing from test data for system {sid}" # compare common elements try: ref_label = to_numpy_array(ref_sys[fid][label_kw]) test_label = to_numpy_array(test_sys[fid][label_kw]) except KeyError: continue np.testing.assert_allclose(ref_label, test_label)deepmd/utils/data.py (1)
502-509: Consider simplifying ThreadPoolExecutor usage for single-task execution.The code creates a thread pool with
max_workers=num_workerbut submits only a single task. While thewithstatement correctly waits for task completion viashutdown(wait=True), consider whether a simpler threading approach would be clearer:if self.modifier is not None: import threading t = threading.Thread(target=self.modifier.modify_data, args=(frame_data, self)) t.start() t.join() if self.use_modifier_cache: self._modified_frame_cache[index] = copy.deepcopy(frame_data)This more directly expresses the intent: run the modifier in a separate thread to avoid CUDA re-initialization issues.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
deepmd/pd/utils/dataset.pydeepmd/pt/train/training.pydeepmd/pt/utils/dataloader.pydeepmd/pt/utils/dataset.pydeepmd/utils/data.pysource/tests/pt/test_data_modifier.py
🚧 Files skipped from review as they are similar to previous changes (2)
- deepmd/pt/utils/dataloader.py
- deepmd/pd/utils/dataset.py
🧰 Additional context used
🧬 Code graph analysis (3)
deepmd/pt/train/training.py (5)
deepmd/pt/loss/property.py (1)
label_requirement(209-221)deepmd/pt/loss/tensor.py (1)
label_requirement(173-207)deepmd/pd/train/training.py (1)
get_additional_data_requirement(1222-1246)deepmd/pt/utils/dataloader.py (2)
add_data_requirement(215-218)preload_and_modify_all_data_torch(241-243)deepmd/pt/utils/dataset.py (2)
add_data_requirement(58-72)preload_and_modify_all_data_torch(74-75)
deepmd/pt/utils/dataset.py (3)
deepmd/pt/modifier/base_modifier.py (1)
BaseModifier(30-187)deepmd/utils/data.py (2)
DataRequirementItem(1080-1160)DeepmdData(35-1077)deepmd/pd/utils/dataset.py (1)
DeepmdDataSetForLoader(17-56)
deepmd/utils/data.py (5)
data/raw/copy_raw.py (1)
copy(11-71)source/tests/pt/test_data_modifier.py (2)
modify_data(123-137)modify_data(165-179)deepmd/pt/modifier/base_modifier.py (1)
modify_data(90-187)deepmd/pt/utils/dataloader.py (1)
preload_and_modify_all_data_torch(241-243)deepmd/pt/utils/dataset.py (1)
preload_and_modify_all_data_torch(74-75)
🪛 Ruff (0.14.10)
source/tests/pt/test_data_modifier.py
96-96: Unused static method argument: args
(ARG004)
96-96: Unused static method argument: kwargs
(ARG004)
113-113: Unused method argument: coord
(ARG002)
114-114: Unused method argument: atype
(ARG002)
115-115: Unused method argument: box
(ARG002)
116-116: Unused method argument: fparam
(ARG002)
117-117: Unused method argument: aparam
(ARG002)
118-118: Unused method argument: do_atomic_virial
(ARG002)
123-123: Unused method argument: data_sys
(ARG002)
142-142: Unused static method argument: args
(ARG004)
142-142: Unused static method argument: kwargs
(ARG004)
155-155: Unused method argument: coord
(ARG002)
156-156: Unused method argument: atype
(ARG002)
157-157: Unused method argument: box
(ARG002)
158-158: Unused method argument: fparam
(ARG002)
159-159: Unused method argument: aparam
(ARG002)
160-160: Unused method argument: do_atomic_virial
(ARG002)
165-165: Unused method argument: data_sys
(ARG002)
184-184: Unused static method argument: args
(ARG004)
184-184: Unused static method argument: kwargs
(ARG004)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Test Python (4, 3.10)
- GitHub Check: Test Python (5, 3.13)
- GitHub Check: Test Python (1, 3.10)
- GitHub Check: Test Python (6, 3.13)
- GitHub Check: Test Python (2, 3.10)
- GitHub Check: Test Python (5, 3.10)
- GitHub Check: Test Python (4, 3.13)
- GitHub Check: Test Python (2, 3.13)
- GitHub Check: Test Python (6, 3.10)
- GitHub Check: Test Python (3, 3.13)
- GitHub Check: Test Python (3, 3.10)
- GitHub Check: Test Python (1, 3.13)
- GitHub Check: Test C++ (true, false, false, true)
- GitHub Check: Test C++ (false, true, true, false)
- GitHub Check: Test C++ (true, true, true, false)
- GitHub Check: Test C++ (false, false, false, true)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build C++ (rocm, rocm)
🔇 Additional comments (15)
source/tests/pt/test_data_modifier.py (3)
1-57: LGTM!The module docstring is comprehensive and imports are well-organized. The license header and documentation clearly explain the test scope.
94-138: LGTM!The test modifier implementation is correct. The unused arguments in
__new__,forward(), andmodify_data()are expected since this is a test stub implementing theBaseModifierinterface. The deterministic random scaling logic with seeded RNG is appropriate for reproducible tests.
182-222: LGTM!The
ModifierScalingTestercorrectly loads a frozen model and applies scaling to predictions. The tuple handling at lines 218-219 properly handles different model return formats.deepmd/pt/utils/dataset.py (4)
12-17: LGTM!The imports are appropriate for the modifier functionality being added.
25-47: LGTM!The modifier parameter is properly threaded through to the underlying
DeepmdDatainstance, and_natoms_vecinitialization is consistent with the Paddle backend implementation.
54-54: LGTM!Passing
NUM_WORKERStoget_item_torchenables parallel modifier execution during data loading.
74-75: LGTM!The preload method provides a clean interface for eager modifier application and caching.
deepmd/pt/train/training.py (3)
335-344: LGTM!The data requirements are properly assembled from loss and model requirements, and preloading with modifier application occurs before computing statistics—ensuring stats reflect the modified data.
379-390: LGTM!The multi-task path correctly mirrors the single-task logic, building per-model data requirements and preloading modified data before computing statistics for each model.
1365-1398: LGTM!The helper function correctly builds additional data requirements for fparam, aparam, and spin based on model configuration, with appropriate must/default settings.
deepmd/utils/data.py (5)
5-5: LGTM!The
copymodule import is necessary for the deepcopy operation used in frame caching.
143-148: LGTM!Cache initialization correctly defaults to enabled and respects the modifier's
use_cachepreference when available.
256-287: LGTM!The
num_workerparameter is consistently added to both PyTorch and Paddle item accessors, with proper documentation.
510-512: LGTM!The caching logic correctly uses
deepcopyto ensure cached frames are immutable, addressing the concern raised in past reviews about shallow copy issues.
515-530: LGTM!The preload method efficiently populates the cache by delegating to
get_single_frame, with helpful progress logging for large datasets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
deepmd/utils/data.py (2)
502-514: Consider optimizing ThreadPoolExecutor usage.The code creates a ThreadPoolExecutor with
max_workers=num_workerbut submits only a single task. While this works (and reportedly fixes CUDA re-initialization issues), most worker threads remain idle.If the goal is thread isolation, consider using
max_workers=1. If parallel execution is intended for future enhancements, document the design intent.🔎 Potential optimization
- if self.modifier is not None: - with ThreadPoolExecutor(max_workers=num_worker) as executor: - # Apply modifier if it exists - future = executor.submit( - self.modifier.modify_data, - frame_data, - self, - ) - # Wait for completion and propagate any exceptions - future.result() + if self.modifier is not None: + with ThreadPoolExecutor(max_workers=1) as executor: + future = executor.submit( + self.modifier.modify_data, + frame_data, + self, + ) + # Wait for completion and propagate any exceptions + future.result()
517-533: Consider parallelizing preload for large datasets.The method sequentially processes frames, which may be slow for large datasets. If thread safety permits, consider using ThreadPoolExecutor or multiprocessing to preload multiple frames in parallel.
def preload_and_modify_all_data_torch(self, num_worker: int) -> None: """Preload all frames and apply modifier to cache them. This method is useful when use_modifier_cache is True and you want to avoid applying the modifier repeatedly during training. """ if not self.use_modifier_cache or self.modifier is None: return log.info("Preloading and modifying all data frames...") # Option: Parallelize frame loading with ThreadPoolExecutor(max_workers=num_worker) as executor: futures = [ executor.submit(self.get_single_frame, i, 1) for i in range(self.nframes) if i not in self._modified_frame_cache ] for i, future in enumerate(as_completed(futures)): future.result() # Propagate exceptions if (i + 1) % 100 == 0: log.info(f"Processed {i + 1}/{self.nframes} frames") log.info("All frames preloaded and modified.")Note: Only implement if thread safety of the modifier and caching mechanism is verified.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
deepmd/utils/data.pysource/tests/pt/test_data_modifier.py
🧰 Additional context used
🧬 Code graph analysis (1)
deepmd/utils/data.py (3)
deepmd/pt/modifier/base_modifier.py (1)
modify_data(90-187)deepmd/pt/utils/dataset.py (1)
preload_and_modify_all_data_torch(74-75)deepmd/pt/utils/dataloader.py (1)
preload_and_modify_all_data_torch(241-243)
🪛 Ruff (0.14.10)
source/tests/pt/test_data_modifier.py
96-96: Unused static method argument: args
(ARG004)
96-96: Unused static method argument: kwargs
(ARG004)
113-113: Unused method argument: coord
(ARG002)
114-114: Unused method argument: atype
(ARG002)
115-115: Unused method argument: box
(ARG002)
116-116: Unused method argument: fparam
(ARG002)
117-117: Unused method argument: aparam
(ARG002)
118-118: Unused method argument: do_atomic_virial
(ARG002)
123-123: Unused method argument: data_sys
(ARG002)
142-142: Unused static method argument: args
(ARG004)
142-142: Unused static method argument: kwargs
(ARG004)
155-155: Unused method argument: coord
(ARG002)
156-156: Unused method argument: atype
(ARG002)
157-157: Unused method argument: box
(ARG002)
158-158: Unused method argument: fparam
(ARG002)
159-159: Unused method argument: aparam
(ARG002)
160-160: Unused method argument: do_atomic_virial
(ARG002)
165-165: Unused method argument: data_sys
(ARG002)
184-184: Unused static method argument: args
(ARG004)
184-184: Unused static method argument: kwargs
(ARG004)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Test Python (6, 3.10)
- GitHub Check: Test Python (5, 3.13)
- GitHub Check: Test Python (1, 3.13)
- GitHub Check: Test Python (4, 3.10)
- GitHub Check: Test Python (4, 3.13)
- GitHub Check: Test Python (3, 3.13)
- GitHub Check: Test Python (5, 3.10)
- GitHub Check: Test Python (2, 3.10)
- GitHub Check: Test Python (6, 3.13)
- GitHub Check: Test Python (3, 3.10)
- GitHub Check: Test Python (2, 3.13)
- GitHub Check: Test Python (1, 3.10)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Test C++ (true, false, false, true)
- GitHub Check: Test C++ (false, false, false, true)
- GitHub Check: Test C++ (false, true, true, false)
- GitHub Check: Test C++ (true, true, true, false)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build C++ (clang, clang)
🔇 Additional comments (10)
deepmd/utils/data.py (3)
143-148: LGTM!The cache initialization logic correctly defaults to
Trueand respects the modifier'suse_cacheattribute when available. The cache dictionary is initialized only when a modifier is present, which is appropriate.
256-287: LGTM!The method signature changes to accept
num_workerare consistent across bothget_item_torchandget_item_paddle. Delegating to a centralizedget_single_framemethod improves maintainability.
401-406: LGTM!The early cache check correctly returns cached modified frames when
use_modifier_cacheis enabled, avoiding redundant modifier computations.source/tests/pt/test_data_modifier.py (7)
64-91: LGTM!The modifier argument registration functions are correctly implemented with appropriate documentation. The arguments align with the corresponding modifier class constructors.
94-138: LGTM!The
ModifierRandomTesterimplementation is correct for testing purposes. The deterministic random scaling using a seeded RNG ensures reproducible tests.Note: Static analysis warnings about unused arguments are false positives—these parameters are required by the base class interface.
140-179: LGTM!The
ModifierZeroTestercorrectly zeros out energy, force, and virial data for testing. Thedata[key] -= data[key]pattern is a clean way to zero arrays in-place.Note: Static analysis warnings about unused arguments are false positives—these parameters are required by the base class interface.
182-222: LGTM!The
ModifierScalingTestercorrectly implements model-based data modification using a frozen model. The implementation properly handles potential tuple returns from the model and applies scaling to the predictions.
231-251: LGTM!The test setup correctly configures single-step training for fast test execution. The tearDown method appropriately handles cleanup failures with try-except blocks.
Note: File cleanup operates on the current directory, which is acceptable for unit tests but assumes tests run sequentially in isolated environments.
Also applies to: 386-401
253-313: LGTM!Both test methods correctly verify the modifier behavior:
test_init_modify_dataensures modification is applied during initializationtest_full_modify_dataensures modification is applied only once (idempotency)The parameterized testing across different batch sizes and cache settings provides good coverage.
314-384: LGTM!The
test_inferencemethod provides comprehensive validation of modifier integration during inference, correctly verifying thatoutput_model ≈ output_ref - sfactor * output_modifier.The helper methods are well-designed:
get_dataset_nframes: Simple frame countingget_sampled_data: Robust data collection with duplicate handlingcheck_sampled_data: Clean comparison logic with appropriate error handlingAlso applies to: 403-502
Overview
This PR adds a data modifier plugin functionality to the PyTorch implementation of DeepMD. This feature allows for on-the-fly data modification during training and inference, enabling advanced data manipulation capabilities.
Key Changes
1. Added Data Modifier to Training Pipeline
deepmd/pt/entrypoints/main.pyget_data_modifier)get_trainer()function2. Added Data Modifier to Inference
deepmd/pt/infer/deep_eval.pyDeepEvalclass3. Implemented Data Modifier Framework
deepmd/pt/modifier/__init__.py(entirely new)BaseModifierwith registration systemModifierRandomTester: Applies random scaling to energy/force/virial data for testingModifierZeroTester: Zeroes out energy/force/virial data for testingModifierScalingTester: Applies scaled model predictions as data modifications4. Added Data Modifier Tests
deepmd/pt/test/test_modifier.py(entirely new)Summary by CodeRabbit
New Features
Behavior
Tests
✏️ Tip: You can customize this high-level summary in your review settings.