feat(pt): allow `--init-frz-model` for pt model converted from tf model #5091

ChiahsinChu · 2025-12-11T10:57:05Z

Add try-catch block for loading frozen models
Use strict=False when state_dict keys don't match
Log warnings for model state_dict mismatches
Prevent crashes when loading models with different architectures"

Fix bug in #5090

Summary by CodeRabbit

Bug Fixes
- Frozen model loading now uses non‑strict mode; missing or unexpected parameters are logged as warnings instead of causing failure.
Tests
- Added an end-to-end test covering conversion and loading of frozen models from external formats.
- Test I/O standardized to UTF‑8; teardown extended to remove generated artifacts.
New Files
- Added a JSON test model configuration for validation and training workflows.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-11T10:59:40Z

📝 Walkthrough

Walkthrough

When a frozen PyTorch model is provided, training now attempts non‑strict state_dict loading if an initial strict load raises a RuntimeError about missing or unexpected keys and logs the mismatch. Tests add an end‑to‑end pbtxt→pb→.pth conversion path, UTF‑8 JSON I/O, artifact cleanup, and a new JSON test model.

Changes

Cohort / File(s)	Summary
Frozen model state_dict loading `deepmd/pt/train/training.py`	Change: wrap `model.load_state_dict(...)` in try/except; if `RuntimeError` mentions "missing" or "unexpected" keys, retry with `strict=False` and log a warning listing the mismatched keys; otherwise re-raise.
Tests: init_frz_model pbtxt→pb→pth flow `source/tests/pt/test_init_frz_model.py`	Add: new test `test_init_frz_model_pb2pth` that converts a pbtxt→pb, uses `convert_backend` to create a `.pth`, initializes and runs trainer to validate model init; add imports (`convert_pbtxt_to_pb`, `convert_backend`), use `encoding="utf-8"` for JSON I/O, and extend teardown to remove generated `.pb`/`.pth` artifacts.
New test model config `source/tests/pt/model/models/se_e2_a.json`	Add: JSON test model configuration used by the new test (descriptor, fitting_net, data/stat, learning rate, loss, and training parameters).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Check the RuntimeError substring matching is robust and not brittle.
Confirm logged warning format and content (missing/unexpected keys).
Run the new test to validate pbtxt→pb conversion, backend conversion, trainer initialization, and cleanup.

Suggested reviewers

wanghan-iapcm
njzjz

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: allowing `--init-frz-model` parameter for PyTorch models converted from TensorFlow models, which matches the core modifications to training.py and new test coverage.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bb83aa9 and e861add.

📒 Files selected for processing (3)

deepmd/pt/train/training.py (1 hunks)
source/tests/pt/model/models/se_e2_a.json (1 hunks)
source/tests/pt/test_init_frz_model.py (5 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

source/tests/pt/model/models/se_e2_a.json

🧰 Additional context used

🧠 Learnings (6)

📓 Common learnings

Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4302
File: deepmd/pd/infer/inference.py:35-38
Timestamp: 2024-11-25T07:42:55.735Z
Learning: In the file `deepmd/pd/infer/inference.py`, when loading the model checkpoint in the `Tester` class, it's acceptable to not include additional error handling for loading the model state dictionary.

📚 Learning: 2024-11-25T07:42:55.735Z

Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4302
File: deepmd/pd/infer/inference.py:35-38
Timestamp: 2024-11-25T07:42:55.735Z
Learning: In the file `deepmd/pd/infer/inference.py`, when loading the model checkpoint in the `Tester` class, it's acceptable to not include additional error handling for loading the model state dictionary.

Applied to files:

source/tests/pt/test_init_frz_model.py
deepmd/pt/train/training.py

📚 Learning: 2025-12-12T13:40:14.334Z

Learnt from: CR
Repo: deepmodeling/deepmd-kit PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-12T13:40:14.334Z
Learning: Run core tests with `pytest source/tests/tf/test_dp_test.py::TestDPTestEner::test_1frame -v` to validate basic functionality

Applied to files:

source/tests/pt/test_init_frz_model.py

📚 Learning: 2025-12-12T13:40:14.334Z

Learnt from: CR
Repo: deepmodeling/deepmd-kit PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-12T13:40:14.334Z
Learning: Test PyTorch training with `cd examples/water/se_e2_a && dp --pt train input_torch.json --skip-neighbor-stat` and allow timeout of 60+ seconds for validation

Applied to files:

source/tests/pt/test_init_frz_model.py

📚 Learning: 2025-12-12T13:40:14.334Z

Learnt from: CR
Repo: deepmodeling/deepmd-kit PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-12T13:40:14.334Z
Learning: Use PyTorch backend with `--pt` flag or `export DP_BACKEND=pytorch`; use `input_torch.json` for configuration typically

Applied to files:

source/tests/pt/test_init_frz_model.py

📚 Learning: 2025-12-12T13:40:14.334Z

Learnt from: CR
Repo: deepmodeling/deepmd-kit PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-12T13:40:14.334Z
Learning: Validate Python interface by running `python -c "import deepmd; import deepmd.tf; print('Both interfaces work')"`

Applied to files:

source/tests/pt/test_init_frz_model.py

🧬 Code graph analysis (1)

source/tests/pt/test_init_frz_model.py (1)

deepmd/entrypoints/convert_backend.py (1)

convert_backend (11-31)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)

GitHub Check: Analyze (c-cpp)
GitHub Check: Analyze (python)
GitHub Check: Test Python (5, 3.13)
GitHub Check: Test Python (6, 3.10)
GitHub Check: Test Python (3, 3.10)
GitHub Check: Build C++ (cuda120, cuda)
GitHub Check: Test Python (3, 3.13)
GitHub Check: Test Python (4, 3.10)
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
GitHub Check: Build C++ (rocm, rocm)
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build wheels for cp310-manylinux_aarch64
GitHub Check: Build wheels for cp311-macosx_arm64
GitHub Check: Build C++ (cpu, cpu)
GitHub Check: Build C++ (clang, clang)
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Test C++ (true, false, false, true)
GitHub Check: Test C++ (true, true, true, false)
GitHub Check: Test C++ (false, false, false, true)
GitHub Check: Test C++ (false, true, true, false)

🔇 Additional comments (5)

deepmd/pt/train/training.py (1)

614-622: LGTM! Non-strict loading implementation is correct.

The implementation properly handles frozen model initialization with mismatched state_dict keys:

Uses strict=False to allow partial loading

Logs clear warnings when keys don't match

Prevents crashes from architecture differences

This approach was confirmed in previous review discussions and aligns with the PR objectives.

source/tests/pt/test_init_frz_model.py (4)

16-18: LGTM! Import additions support the new test.

The new imports enable end-to-end testing of the pbtxt→pb→pth conversion pipeline, which validates the frozen model initialization path from TensorFlow models.

Also applies to: 26-28

38-38: LGTM! Explicit UTF-8 encoding improves portability.

Adding encoding="utf-8" to JSON file operations ensures consistent behavior across platforms with different default encodings.

Also applies to: 67-67

138-173: LGTM! Comprehensive end-to-end test with clear assertions.

The new test effectively validates the TF→PT frozen model initialization path:

Tests pbtxt→pb→pth conversion pipeline

Includes explicit assertions per previous review feedback

Verifies trainer and model initialization

Uses numb_steps=0 to focus on initialization without training overhead

The test design properly validates the non-strict loading behavior introduced in training.py.

185-186: LGTM! Cleanup handles new test artifacts.

The tearDown addition properly removes .pb files generated by the new test, maintaining test isolation and preventing artifact accumulation.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

deepmd/pt/train/training.py (2)
619-623: Consider handling "Unexpected key(s)" as well.

PyTorch's load_state_dict(strict=True) can raise RuntimeError for both "Missing key(s)" and "Unexpected key(s)". The current check only handles missing keys, so if the frozen model has extra keys not present in self.model, this would not be caught and would propagate as an unhandled exception.

If the intent is to be lenient for any key mismatch, consider a more robust approach:
         try:
             self.model.load_state_dict(frz_model.state_dict())
         except RuntimeError as e:
-            if "Missing key(s) in state_dict" in str(e):
+            err_msg = str(e)
+            if "Missing key(s) in state_dict" in err_msg or "Unexpected key(s) in state_dict" in err_msg:
                 self.model.load_state_dict(frz_model.state_dict(), strict=False)
-                log.warning("Use strict=False to ignore non-matching keys.")
                 log.warning(f"Model state_dict mismatch detected: {e}")
+                log.warning("Loaded with strict=False to ignore non-matching keys.")
             else:
-                raise e
+                raise
Alternatively, you could use strict=False directly and inspect the returned NamedTuple containing missing_keys and unexpected_keys for more explicit control.

625-625: Use bare raise to preserve the original traceback.

Per Ruff TRY201, using raise without the exception name preserves the original traceback, which aids debugging.
             else:
-                raise e
+                raise

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d774ede and 335d53e.

📒 Files selected for processing (1)

deepmd/pt/train/training.py (1 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4302
File: deepmd/pd/infer/inference.py:35-38
Timestamp: 2024-11-25T07:42:55.735Z
Learning: In the file `deepmd/pd/infer/inference.py`, when loading the model checkpoint in the `Tester` class, it's acceptable to not include additional error handling for loading the model state dictionary.

📚 Learning: 2024-11-25T07:42:55.735Z

Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4302
File: deepmd/pd/infer/inference.py:35-38
Timestamp: 2024-11-25T07:42:55.735Z
Learning: In the file `deepmd/pd/infer/inference.py`, when loading the model checkpoint in the `Tester` class, it's acceptable to not include additional error handling for loading the model state dictionary.

Applied to files:

deepmd/pt/train/training.py

🧬 Code graph analysis (1)

deepmd/pt/train/training.py (1)

deepmd/pd/train/wrapper.py (2)

load_state_dict (186-191)

state_dict (199-203)

🪛 Ruff (0.14.8)

deepmd/pt/train/training.py

625-625: Use raise without specifying exception name

Remove exception name

(TRY201)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)

GitHub Check: Build C++ (clang, clang)
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Build wheels for cp310-manylinux_aarch64
GitHub Check: Build C++ (cpu, cpu)
GitHub Check: Build C++ (cuda, cuda)
GitHub Check: Build wheels for cp311-win_amd64
GitHub Check: Build wheels for cp311-macosx_arm64
GitHub Check: Build C++ (rocm, rocm)
GitHub Check: Build C++ (cuda120, cuda)
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Test Python (5, 3.12)
GitHub Check: Test Python (6, 3.12)
GitHub Check: Test Python (5, 3.9)
GitHub Check: Test Python (4, 3.9)
GitHub Check: Test Python (4, 3.12)
GitHub Check: Test Python (3, 3.9)
GitHub Check: Test Python (3, 3.12)
GitHub Check: Test Python (6, 3.9)
GitHub Check: Test Python (1, 3.12)
GitHub Check: Test Python (2, 3.12)
GitHub Check: Test Python (1, 3.9)
GitHub Check: Test C++ (false, false, false, true)
GitHub Check: Test Python (2, 3.9)
GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
GitHub Check: Test C++ (true, true, true, false)
GitHub Check: Test C++ (true, false, false, true)
GitHub Check: Test C++ (false, true, true, false)
GitHub Check: Analyze (python)
GitHub Check: Analyze (c-cpp)

codecov · 2025-12-11T11:49:27Z

Codecov Report

❌ Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 82.14%. Comparing base (b9c6615) to head (e861add).
⚠️ Report is 5 commits behind head on devel.

Files with missing lines	Patch %	Lines
deepmd/pt/train/training.py	75.00%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##            devel    #5091   +/-   ##
=======================================
  Coverage   82.14%   82.14%           
=======================================
  Files         709      709           
  Lines       72453    72457    +4     
  Branches     3615     3615           
=======================================
+ Hits        59514    59519    +5     
  Misses      11775    11775           
+ Partials     1164     1163    -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

source/tests/pt/test_init_frz_model.py (2)
137-163: Consider adding assertions to strengthen the test.

While this smoke test verifies that the TF→PT conversion and initialization doesn't crash (which aligns with the PR objective), it could be enhanced with basic assertions to verify the model loaded successfully.

For example, you could add verification that the trainer was created successfully:
trainer = get_trainer(config, init_frz_model=frozen_model)
assert trainer is not None, "Trainer should be created successfully"
trainer.run()
Or verify that the frozen model file exists before using it:
assert Path(frozen_model).exists(), f"Converted model {frozen_model} should exist"
trainer = get_trainer(config, init_frz_model=frozen_model)
163-163: Remove trailing whitespace.

Line 163 appears to have trailing whitespace.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 335d53e and 7956dd9.

📒 Files selected for processing (2)

deepmd/pt/train/training.py (1 hunks)
source/tests/pt/test_init_frz_model.py (5 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

deepmd/pt/train/training.py

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4302
File: deepmd/pd/infer/inference.py:35-38
Timestamp: 2024-11-25T07:42:55.735Z
Learning: In the file `deepmd/pd/infer/inference.py`, when loading the model checkpoint in the `Tester` class, it's acceptable to not include additional error handling for loading the model state dictionary.

🧬 Code graph analysis (1)

source/tests/pt/test_init_frz_model.py (3)

deepmd/entrypoints/convert_backend.py (1)

convert_backend (11-31)

deepmd/pt/entrypoints/main.py (1)

get_trainer (98-206)

deepmd/pt/train/training.py (1)

run (716-1212)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)

GitHub Check: Test Python (5, 3.12)
GitHub Check: Test Python (4, 3.12)
GitHub Check: Test Python (6, 3.9)
GitHub Check: Test Python (3, 3.9)
GitHub Check: Test Python (1, 3.12)
GitHub Check: Test Python (6, 3.12)
GitHub Check: Test Python (3, 3.12)
GitHub Check: Test Python (4, 3.9)
GitHub Check: Test Python (5, 3.9)
GitHub Check: Test Python (2, 3.9)
GitHub Check: Test Python (1, 3.9)
GitHub Check: Test Python (2, 3.12)
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
GitHub Check: Analyze (python)
GitHub Check: Test C++ (false, true, true, false)
GitHub Check: Test C++ (true, true, true, false)
GitHub Check: Test C++ (false, false, false, true)
GitHub Check: Test C++ (true, false, false, true)
GitHub Check: Analyze (c-cpp)
GitHub Check: Build C++ (cuda120, cuda)
GitHub Check: Build C++ (clang, clang)
GitHub Check: Build C++ (cpu, cpu)
GitHub Check: Build C++ (cuda, cuda)
GitHub Check: Build C++ (rocm, rocm)
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build wheels for cp311-win_amd64
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Build wheels for cp310-manylinux_aarch64
GitHub Check: Build wheels for cp311-macosx_arm64

🔇 Additional comments (4)

source/tests/pt/test_init_frz_model.py (4)

23-28: LGTM!

The new imports are appropriate for testing the TensorFlow to PyTorch model conversion pathway.

37-37: LGTM!

Explicit UTF-8 encoding improves cross-platform compatibility.

66-66: LGTM!

Explicit UTF-8 encoding improves cross-platform compatibility.

174-175: LGTM!

Proper cleanup of test artifacts (.pb files) ensures tests don't leave behind intermediate files.

source/tests/pt/test_init_frz_model.py

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

source/tests/pt/test_init_frz_model.py (1)
151-151: Remove redundant assignment.

This assignment is overridden by line 160, making it dead code.

Apply this diff to remove the redundant line:
-        config["training"]["numb_steps"] = 1
         config["training"]["save_freq"] = 1

🧹 Nitpick comments (1)

source/tests/pt/test_init_frz_model.py (1)

146-146: Consider converting the comment to a docstring.

The comment describing the test purpose would be better placed as a method docstring for improved documentation and discoverability.

Apply this diff:

     def test_init_frz_model_pb2pth(self) -> None:
+        """Test initialization from frozen model with mismatched keys to test strict=False fallback."""
         frozen_model = "frozen_model_pb2pth.pth"
         # Convert pth model from pb model
         convert_pbtxt_to_pb(
             str(Path(__file__).parent / "water/se_e2_a.pbtxt"), "frozen_model.pb"
         )
         convert_backend(INPUT="frozen_model.pb", OUTPUT=frozen_model)
 
-        """Test initialization from frozen model with mismatched keys to test strict=False fallback."""
         # Create a base model

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7956dd9 and b6c525a.

📒 Files selected for processing (1)

source/tests/pt/test_init_frz_model.py (5 hunks)

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4302
File: deepmd/pd/infer/inference.py:35-38
Timestamp: 2024-11-25T07:42:55.735Z
Learning: In the file `deepmd/pd/infer/inference.py`, when loading the model checkpoint in the `Tester` class, it's acceptable to not include additional error handling for loading the model state dictionary.

🧬 Code graph analysis (1)

source/tests/pt/test_init_frz_model.py (1)

deepmd/entrypoints/convert_backend.py (1)

convert_backend (11-31)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)

GitHub Check: Build C++ (cpu, cpu)
GitHub Check: Build C++ (rocm, rocm)
GitHub Check: Build C++ (clang, clang)
GitHub Check: Build C++ (cuda120, cuda)
GitHub Check: Build C++ (cuda, cuda)
GitHub Check: Test Python (5, 3.9)
GitHub Check: Test Python (6, 3.9)
GitHub Check: Test Python (5, 3.12)
GitHub Check: Test Python (6, 3.12)
GitHub Check: Test Python (4, 3.12)
GitHub Check: Test Python (2, 3.12)
GitHub Check: Test Python (4, 3.9)
GitHub Check: Test Python (1, 3.9)
GitHub Check: Test Python (2, 3.9)
GitHub Check: Test Python (3, 3.12)
GitHub Check: Test Python (3, 3.9)
GitHub Check: Test Python (1, 3.12)
GitHub Check: Test C++ (false, true, true, false)
GitHub Check: Test C++ (true, false, false, true)
GitHub Check: Test C++ (true, true, true, false)
GitHub Check: Test C++ (false, false, false, true)
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Build wheels for cp311-win_amd64
GitHub Check: Build wheels for cp311-macosx_arm64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build wheels for cp310-manylinux_aarch64
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
GitHub Check: Analyze (c-cpp)

🔇 Additional comments (4)

source/tests/pt/test_init_frz_model.py (4)
16-28: LGTM! Imports support the new test functionality.

The imports for convert_backend and convert_pbtxt_to_pb are necessary for the new test case that exercises model conversion from TensorFlow to PyTorch formats.

38-38: LGTM! UTF-8 encoding improves cross-platform compatibility.

Explicitly specifying UTF-8 encoding for JSON file operations is a best practice that prevents encoding issues across different platforms and locales.

Also applies to: 67-67

138-163: Verify test coverage: no assertions present.

The test method successfully exercises the conversion and initialization pathway but includes no assertions to validate correctness. Given the PR objectives focus on preventing crashes when loading converted models, this may be intentional—the test passes if no exceptions are raised.

Consider adding assertions if specific behavior should be validated beyond "does not crash":
# Example: verify the trainer was initialized successfully
self.assertIsNotNone(trainer)
# Or validate that the model can perform inference
# (similar to test_dp_test method)
175-176: LGTM! Proper cleanup of test artifacts.

The tearDown correctly removes the generated .pb files, preventing test pollution and maintaining a clean test environment.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

source/tests/pt/test_init_frz_model.py (1)
138-162: Consider adding assertions to verify test behavior.

The test verifies that initialization from a TensorFlow-converted model doesn't crash, but it lacks assertions to confirm the model was properly initialized. While the implicit "doesn't crash" check is valuable, adding explicit verification would make the test more robust and its intent clearer.

Consider adding assertions after line 162, such as:
# Verify the trainer was initialized successfully
assert trainer is not None
assert trainer.model is not None

# Optionally verify model can perform inference
freeze(model="model.pt", output="test_frozen.pth", head=None)
dp = DeepPot("test_frozen.pth")
# Perform a simple inference check similar to test_dp_test

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b6c525a and c96476a.

📒 Files selected for processing (2)

deepmd/pt/train/training.py (1 hunks)
source/tests/pt/test_init_frz_model.py (5 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

deepmd/pt/train/training.py

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4302
File: deepmd/pd/infer/inference.py:35-38
Timestamp: 2024-11-25T07:42:55.735Z
Learning: In the file `deepmd/pd/infer/inference.py`, when loading the model checkpoint in the `Tester` class, it's acceptable to not include additional error handling for loading the model state dictionary.

🧬 Code graph analysis (1)

source/tests/pt/test_init_frz_model.py (3)

deepmd/entrypoints/convert_backend.py (1)

convert_backend (11-31)

deepmd/pt/entrypoints/main.py (2)

main (521-580)

freeze (372-386)

deepmd/main.py (1)

main (973-1020)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)

GitHub Check: Analyze (javascript-typescript)
GitHub Check: Analyze (c-cpp)
GitHub Check: Analyze (python)
GitHub Check: Test C++ (true, false, false, true)
GitHub Check: Test C++ (false, true, true, false)
GitHub Check: Test C++ (false, false, false, true)
GitHub Check: Test C++ (true, true, true, false)
GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Build wheels for cp311-win_amd64
GitHub Check: Build wheels for cp310-manylinux_aarch64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build wheels for cp311-macosx_arm64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build C++ (rocm, rocm)
GitHub Check: Build C++ (cuda, cuda)
GitHub Check: Build C++ (clang, clang)
GitHub Check: Build C++ (cpu, cpu)
GitHub Check: Build C++ (cuda120, cuda)

🔇 Additional comments (4)

source/tests/pt/test_init_frz_model.py (4)

16-18: LGTM!

The import is correctly added and properly used in the new test method at line 144.

26-28: LGTM!

The import is correctly added and used in the new test method to convert pbtxt models to pb format.

38-38: LGTM!

Explicitly specifying UTF-8 encoding is good practice for ensuring consistent file I/O behavior across different platforms.

Also applies to: 67-67

174-175: LGTM!

The cleanup logic correctly handles the .pb files created by the new test method, ensuring test artifacts don't persist.

source/tests/pt/test_init_frz_model.py

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

source/tests/pt/test_init_frz_model.py (2)

66-71: Fix temp file handle leak + delete the temp JSON (can break on Windows).
NamedTemporaryFile(delete=False, ...) stays open and the .json is never removed; also reopening an open temp file can fail on Windows. Suggest writing via the temp handle, closing it, and os.remove in finally.

-                tmp_input = tempfile.NamedTemporaryFile(delete=False, suffix=".json")
-                with open(tmp_input.name, "w", encoding="utf-8") as f:
-                    json.dump(empty_config, f, indent=4)
-                run_dp(
-                    f"dp --pt train {tmp_input.name} --init-frz-model {self.models[-1]} --use-pretrain-script --skip-neighbor-stat"
-                )
+                tmp_input = tempfile.NamedTemporaryFile(
+                    delete=False, suffix=".json", mode="w", encoding="utf-8"
+                )
+                try:
+                    json.dump(empty_config, tmp_input, indent=4)
+                finally:
+                    tmp_input.close()
+                try:
+                    run_dp(
+                        f"dp --pt train {tmp_input.name} --init-frz-model {self.models[-1]} --use-pretrain-script --skip-neighbor-stat"
+                    )
+                finally:
+                    os.remove(tmp_input.name)

164-175: TearDown cleanup: also remove se_e2_a.hdf5 (likely produced by new JSON).
Right now .pb cleanup is added (good), but if training.stat_file is honored, ./se_e2_a.hdf5 may remain and cause test pollution.

         for f in os.listdir("."):
@@
             if f.startswith("frozen_model") and f.endswith(".pb"):
                 os.remove(f)
+            if f in ["se_e2_a.hdf5"]:
+                os.remove(f)

🧹 Nitpick comments (1)

source/tests/pt/test_init_frz_model.py (1)

138-163: Nice: pb→pth init coverage; consider asserting conversion outputs exist before training.
A quick Path(...).is_file() check before get_trainer(...) makes failures more local/easier to diagnose if conversion breaks.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c96476a and 4069606.

📒 Files selected for processing (3)

deepmd/pt/train/training.py (1 hunks)
source/tests/pt/model/models/se_e2_a.json (1 hunks)
source/tests/pt/test_init_frz_model.py (5 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

deepmd/pt/train/training.py

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4302
File: deepmd/pd/infer/inference.py:35-38
Timestamp: 2024-11-25T07:42:55.735Z
Learning: In the file `deepmd/pd/infer/inference.py`, when loading the model checkpoint in the `Tester` class, it's acceptable to not include additional error handling for loading the model state dictionary.

🧬 Code graph analysis (1)

source/tests/pt/test_init_frz_model.py (3)

deepmd/entrypoints/convert_backend.py (1)

convert_backend (11-31)

deepmd/pt/entrypoints/main.py (2)

main (521-580)

freeze (372-386)

deepmd/main.py (1)

main (973-1020)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)

GitHub Check: Test Python (5, 3.9)
GitHub Check: Test Python (6, 3.12)
GitHub Check: Test Python (5, 3.12)
GitHub Check: Test Python (1, 3.9)
GitHub Check: Test Python (4, 3.12)
GitHub Check: Test Python (4, 3.9)
GitHub Check: Test Python (6, 3.9)
GitHub Check: Test Python (1, 3.12)
GitHub Check: Build wheels for cp311-macosx_arm64
GitHub Check: Test Python (2, 3.9)
GitHub Check: Test Python (2, 3.12)
GitHub Check: Build wheels for cp310-manylinux_aarch64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Test Python (3, 3.12)
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build wheels for cp311-win_amd64
GitHub Check: Test Python (3, 3.9)
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
GitHub Check: Analyze (python)
GitHub Check: Analyze (c-cpp)
GitHub Check: Build C++ (cpu, cpu)
GitHub Check: Build C++ (cuda120, cuda)
GitHub Check: Build C++ (rocm, rocm)
GitHub Check: Build C++ (cuda, cuda)
GitHub Check: Build C++ (clang, clang)
GitHub Check: Test C++ (false, false, false, true)
GitHub Check: Test C++ (true, false, false, true)
GitHub Check: Test C++ (true, true, true, false)
GitHub Check: Test C++ (false, true, true, false)

🔇 Additional comments (2)

source/tests/pt/test_init_frz_model.py (2)

37-43: UTF-8 I/O: LGTM.
Consistent explicit encoding helps avoid CI locale surprises.

Also applies to: 67-68

16-28: API stability concern is unfounded; both functions are established and stable.

convert_pbtxt_to_pb is a documented TensorFlow utility already widely used across TF tests, and convert_backend is a public CLI subcommand with stable signature used in production model code. Both APIs are well-established and appropriate test dependencies. TensorFlow requirement for convert_pbtxt_to_pb is expected behavior for protobuf text conversion, not a new concern.

source/tests/pt/model/models/se_e2_a.json

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

source/tests/pt/test_init_frz_model.py (1)

138-162: Add assertions to validate successful initialization from the converted pb→pth model.

The test verifies that initialization doesn't crash but lacks assertions to confirm the model was properly initialized. The test_dp_test method in the same file demonstrates appropriate validation patterns—consider adding similar inference checks, such as creating a DeepPot instance from the initialized model and performing a basic evaluation to confirm state transfer was successful.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4069606 and 1be559c.

📒 Files selected for processing (3)

deepmd/pt/train/training.py (1 hunks)
source/tests/pt/model/models/se_e2_a.json (1 hunks)
source/tests/pt/test_init_frz_model.py (5 hunks)

✅ Files skipped from review due to trivial changes (1)

source/tests/pt/model/models/se_e2_a.json

🚧 Files skipped from review as they are similar to previous changes (1)

deepmd/pt/train/training.py

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4302
File: deepmd/pd/infer/inference.py:35-38
Timestamp: 2024-11-25T07:42:55.735Z
Learning: In the file `deepmd/pd/infer/inference.py`, when loading the model checkpoint in the `Tester` class, it's acceptable to not include additional error handling for loading the model state dictionary.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)

GitHub Check: Test Python (4, 3.9)
GitHub Check: Test Python (6, 3.12)
GitHub Check: Test Python (4, 3.12)
GitHub Check: Test Python (5, 3.12)
GitHub Check: Test Python (6, 3.9)
GitHub Check: Test Python (5, 3.9)
GitHub Check: Test Python (1, 3.9)
GitHub Check: Test Python (1, 3.12)
GitHub Check: Test Python (2, 3.9)
GitHub Check: Test Python (3, 3.12)
GitHub Check: Test Python (2, 3.12)
GitHub Check: Test Python (3, 3.9)
GitHub Check: Build C++ (cuda120, cuda)
GitHub Check: Build C++ (rocm, rocm)
GitHub Check: Build C++ (cuda, cuda)
GitHub Check: Build C++ (clang, clang)
GitHub Check: Build C++ (cpu, cpu)
GitHub Check: Test C++ (false, true, true, false)
GitHub Check: Test C++ (true, true, true, false)
GitHub Check: Analyze (c-cpp)
GitHub Check: Test C++ (true, false, false, true)
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Analyze (python)
GitHub Check: Build wheels for cp311-macosx_arm64
GitHub Check: Build wheels for cp310-manylinux_aarch64
GitHub Check: Build wheels for cp311-win_amd64
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Test C++ (false, false, false, true)
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)

🔇 Additional comments (4)

source/tests/pt/test_init_frz_model.py (4)

16-18: LGTM!

The import is properly used in the new test method to convert frozen models from pb to pth format.

26-28: LGTM!

The import is properly used in the new test method to convert pbtxt models to pb format.

38-38: LGTM!

Explicitly specifying UTF-8 encoding is a best practice for JSON files and ensures consistent behavior across different platforms.

Also applies to: 67-67

174-175: LGTM!

The cleanup correctly removes the .pb files generated during the pb→pth conversion test, maintaining proper test isolation.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1be559c and 5d2e44d.

📒 Files selected for processing (3)

deepmd/pt/train/training.py (1 hunks)
source/tests/pt/model/models/se_e2_a.json (1 hunks)
source/tests/pt/test_init_frz_model.py (5 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

deepmd/pt/train/training.py
source/tests/pt/model/models/se_e2_a.json

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4302
File: deepmd/pd/infer/inference.py:35-38
Timestamp: 2024-11-25T07:42:55.735Z
Learning: In the file `deepmd/pd/infer/inference.py`, when loading the model checkpoint in the `Tester` class, it's acceptable to not include additional error handling for loading the model state dictionary.

📚 Learning: 2025-12-12T13:40:14.325Z

Learnt from: CR
Repo: deepmodeling/deepmd-kit PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-12T13:40:14.325Z
Learning: Use PyTorch backend with `--pt` flag or `export DP_BACKEND=pytorch`; use `input_torch.json` for configuration typically

Applied to files:

source/tests/pt/test_init_frz_model.py

📚 Learning: 2025-12-12T13:40:14.325Z

Learnt from: CR
Repo: deepmodeling/deepmd-kit PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-12T13:40:14.325Z
Learning: Validate Python interface by running `python -c "import deepmd; import deepmd.tf; print('Both interfaces work')"`

Applied to files:

source/tests/pt/test_init_frz_model.py

🧬 Code graph analysis (1)

source/tests/pt/test_init_frz_model.py (1)

deepmd/entrypoints/convert_backend.py (1)

convert_backend (11-31)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)

GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build wheels for cp310-manylinux_aarch64
GitHub Check: Test C++ (true, false, false, true)
GitHub Check: Build wheels for cp311-macosx_arm64
GitHub Check: Build wheels for cp311-win_amd64
GitHub Check: Test C++ (false, false, false, true)
GitHub Check: Test C++ (true, true, true, false)
GitHub Check: Test C++ (false, true, true, false)
GitHub Check: Analyze (python)
GitHub Check: Analyze (c-cpp)
GitHub Check: Test Python (5, 3.10)
GitHub Check: Test Python (3, 3.13)
GitHub Check: Test Python (1, 3.10)
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
GitHub Check: Test Python (6, 3.10)
GitHub Check: Test Python (4, 3.10)
GitHub Check: Test Python (4, 3.13)
GitHub Check: Test Python (2, 3.10)
GitHub Check: Test Python (5, 3.13)
GitHub Check: Test Python (3, 3.10)
GitHub Check: Test Python (6, 3.13)
GitHub Check: Test Python (2, 3.13)
GitHub Check: Test Python (1, 3.13)
GitHub Check: Build C++ (clang, clang)
GitHub Check: Build C++ (cuda120, cuda)
GitHub Check: Build C++ (cpu, cpu)
GitHub Check: Build C++ (rocm, rocm)

🔇 Additional comments (3)

source/tests/pt/test_init_frz_model.py (3)

16-18: LGTM! Imports support the new TF-to-PT conversion test.

The added imports for convert_backend and convert_pbtxt_to_pb are necessary for the new test method and follow the existing import patterns in the file.

Also applies to: 26-28

38-38: LGTM! UTF-8 encoding ensures cross-platform compatibility.

Explicitly specifying UTF-8 encoding for JSON file operations is a good practice that prevents potential encoding issues across different platforms.

Also applies to: 67-67

174-175: LGTM! Proper cleanup of test artifacts.

The teardown correctly removes .pb files generated during the TF-to-PT conversion test, preventing test pollution and maintaining a clean test environment.

source/tests/pt/test_init_frz_model.py

deepmd/pt/train/training.py

wanghan-iapcm · 2025-12-15T04:11:27Z

Please resolve the conservation of coderabbitai, then the pr can be merged.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

source/tests/pt/test_init_frz_model.py (1)
138-174: LGTM: Test successfully validates TF→PT frozen model initialization.

The test properly validates the end-to-end conversion and initialization flow:

Converts TF model (pbtxt→pb→pth) using the appropriate conversion utilities

Initializes trainer from the converted frozen model

Includes explicit assertions on trainer and model attributes (addressing previous feedback)

Verifies the initialization completes without crashes

This directly tests the PR's objective of supporting --init-frz-model for PT models converted from TF models.

Optional enhancement: Consider adding inference verification to confirm the loaded model produces expected results:
# After line 173, optionally add:
# Verify the model can perform inference
dp = DeepPot(frozen_model)
# Use test data from existing test_dp_test method
# Run inference and verify results
This would provide stronger validation that the strict=False loading preserved model functionality, but is not essential for a basic smoke test.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ab15c73 and 2ddeeeb.

📒 Files selected for processing (1)

source/tests/pt/test_init_frz_model.py (5 hunks)

🧰 Additional context used

🧠 Learnings (4)

📓 Common learnings

Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4302
File: deepmd/pd/infer/inference.py:35-38
Timestamp: 2024-11-25T07:42:55.735Z
Learning: In the file `deepmd/pd/infer/inference.py`, when loading the model checkpoint in the `Tester` class, it's acceptable to not include additional error handling for loading the model state dictionary.

📚 Learning: 2024-11-25T07:42:55.735Z

Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4302
File: deepmd/pd/infer/inference.py:35-38
Timestamp: 2024-11-25T07:42:55.735Z
Learning: In the file `deepmd/pd/infer/inference.py`, when loading the model checkpoint in the `Tester` class, it's acceptable to not include additional error handling for loading the model state dictionary.

Applied to files:

source/tests/pt/test_init_frz_model.py

📚 Learning: 2025-12-12T13:40:14.325Z

Learnt from: CR
Repo: deepmodeling/deepmd-kit PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-12T13:40:14.325Z
Learning: Use PyTorch backend with `--pt` flag or `export DP_BACKEND=pytorch`; use `input_torch.json` for configuration typically

Applied to files:

source/tests/pt/test_init_frz_model.py

📚 Learning: 2025-12-12T13:40:14.325Z

Learnt from: CR
Repo: deepmodeling/deepmd-kit PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-12T13:40:14.325Z
Learning: Validate Python interface by running `python -c "import deepmd; import deepmd.tf; print('Both interfaces work')"`

Applied to files:

source/tests/pt/test_init_frz_model.py

🧬 Code graph analysis (1)

source/tests/pt/test_init_frz_model.py (1)

deepmd/entrypoints/convert_backend.py (1)

convert_backend (11-31)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)

GitHub Check: Test Python (4, 3.10)
GitHub Check: Test Python (6, 3.10)
GitHub Check: Test Python (1, 3.10)
GitHub Check: Test Python (6, 3.13)
GitHub Check: Test Python (2, 3.10)
GitHub Check: Test Python (5, 3.13)
GitHub Check: Test Python (2, 3.13)
GitHub Check: Test Python (4, 3.13)
GitHub Check: Test Python (5, 3.10)
GitHub Check: Test Python (3, 3.13)
GitHub Check: Test Python (1, 3.13)
GitHub Check: Test Python (3, 3.10)
GitHub Check: Test C++ (true, true, true, false)
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
GitHub Check: Build C++ (clang, clang)
GitHub Check: Test C++ (false, false, false, true)
GitHub Check: Build wheels for cp310-manylinux_aarch64
GitHub Check: Build wheels for cp311-macosx_arm64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Test C++ (true, false, false, true)
GitHub Check: Test C++ (false, true, true, false)
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Build wheels for cp311-win_amd64
GitHub Check: Build C++ (cpu, cpu)
GitHub Check: Build C++ (cuda120, cuda)
GitHub Check: Build C++ (rocm, rocm)
GitHub Check: Analyze (python)
GitHub Check: Analyze (c-cpp)

🔇 Additional comments (3)

source/tests/pt/test_init_frz_model.py (3)

16-18: LGTM: Import additions support the new test.

The new imports for convert_backend and convert_pbtxt_to_pb are necessary for the TF→PT model conversion test and are properly used.

Also applies to: 26-28

38-38: LGTM: Explicit UTF-8 encoding is good practice.

Making encoding explicit prevents platform-specific issues and is consistent with the new test method at line 149.

Also applies to: 67-67

185-186: LGTM: Cleanup handles new .pb artifacts.

The tearDown method correctly removes the intermediate .pb file created during the pbtxt→pb→pth conversion, ensuring a clean test environment.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

source/tests/pt/test_init_frz_model.py (1)

66-71: Optional: simplify temp JSON handling and avoid leaving stray files

The added UTF‑8 encoding is fine, but this pattern leaves a temp JSON file behind and reopens the NamedTemporaryFile by name.

You could simplify and self‑clean with something like:

-                tmp_input = tempfile.NamedTemporaryFile(delete=False, suffix=".json")
-                with open(tmp_input.name, "w", encoding="utf-8") as f:
-                    json.dump(empty_config, f, indent=4)
-                run_dp(
-                    f"dp --pt train {tmp_input.name} --init-frz-model {self.models[-1]} --use-pretrain-script --skip-neighbor-stat"
-                )
+                with tempfile.NamedTemporaryFile(
+                    "w", delete=False, suffix=".json", encoding="utf-8"
+                ) as tmp_input:
+                    json.dump(empty_config, tmp_input, indent=4)
+                    tmp_name = tmp_input.name
+                run_dp(
+                    f"dp --pt train {tmp_name} --init-frz-model {self.models[-1]} --use-pretrain-script --skip-neighbor-stat"
+                )
+                os.remove(tmp_name)

Not required for correctness, but it keeps /tmp (or the platform temp dir) tidy and avoids double‑opening the same file.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2ddeeeb and bb83aa9.

📒 Files selected for processing (3)

deepmd/pt/train/training.py (1 hunks)
source/tests/pt/model/models/se_e2_a.json (1 hunks)
source/tests/pt/test_init_frz_model.py (5 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

source/tests/pt/model/models/se_e2_a.json
deepmd/pt/train/training.py

🧰 Additional context used

🧠 Learnings (6)

📓 Common learnings

Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4302
File: deepmd/pd/infer/inference.py:35-38
Timestamp: 2024-11-25T07:42:55.735Z
Learning: In the file `deepmd/pd/infer/inference.py`, when loading the model checkpoint in the `Tester` class, it's acceptable to not include additional error handling for loading the model state dictionary.

📚 Learning: 2024-11-25T07:42:55.735Z

Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4302
File: deepmd/pd/infer/inference.py:35-38
Timestamp: 2024-11-25T07:42:55.735Z
Learning: In the file `deepmd/pd/infer/inference.py`, when loading the model checkpoint in the `Tester` class, it's acceptable to not include additional error handling for loading the model state dictionary.

Applied to files:

source/tests/pt/test_init_frz_model.py

📚 Learning: 2025-12-12T13:40:14.325Z

Learnt from: CR
Repo: deepmodeling/deepmd-kit PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-12T13:40:14.325Z
Learning: Run core tests with `pytest source/tests/tf/test_dp_test.py::TestDPTestEner::test_1frame -v` to validate basic functionality

Applied to files:

source/tests/pt/test_init_frz_model.py

📚 Learning: 2025-12-12T13:40:14.325Z

Learnt from: CR
Repo: deepmodeling/deepmd-kit PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-12T13:40:14.325Z
Learning: Test PyTorch training with `cd examples/water/se_e2_a && dp --pt train input_torch.json --skip-neighbor-stat` and allow timeout of 60+ seconds for validation

Applied to files:

source/tests/pt/test_init_frz_model.py

📚 Learning: 2025-12-12T13:40:14.325Z

Learnt from: CR
Repo: deepmodeling/deepmd-kit PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-12T13:40:14.325Z
Learning: Use PyTorch backend with `--pt` flag or `export DP_BACKEND=pytorch`; use `input_torch.json` for configuration typically

Applied to files:

source/tests/pt/test_init_frz_model.py

📚 Learning: 2025-12-12T13:40:14.325Z

Learnt from: CR
Repo: deepmodeling/deepmd-kit PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-12T13:40:14.325Z
Learning: Validate Python interface by running `python -c "import deepmd; import deepmd.tf; print('Both interfaces work')"`

Applied to files:

source/tests/pt/test_init_frz_model.py

🧬 Code graph analysis (1)

source/tests/pt/test_init_frz_model.py (2)

deepmd/entrypoints/convert_backend.py (1)

convert_backend (11-31)

deepmd/pt/entrypoints/main.py (2)

main (519-578)

get_trainer (96-204)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)

GitHub Check: Build C++ (cuda120, cuda)
GitHub Check: Test Python (5, 3.13)
GitHub Check: Test Python (2, 3.13)
GitHub Check: Test Python (6, 3.10)
GitHub Check: Test Python (6, 3.13)
GitHub Check: Build C++ (rocm, rocm)
GitHub Check: Test Python (2, 3.10)
GitHub Check: Test Python (3, 3.10)
GitHub Check: Test Python (1, 3.10)
GitHub Check: Build C++ (cpu, cpu)
GitHub Check: Test Python (3, 3.13)
GitHub Check: Test Python (4, 3.13)
GitHub Check: Test Python (1, 3.13)
GitHub Check: Build C++ (clang, clang)
GitHub Check: Test Python (4, 3.10)
GitHub Check: Test Python (5, 3.10)
GitHub Check: Test C++ (false, true, true, false)
GitHub Check: Test C++ (false, false, false, true)
GitHub Check: Test C++ (true, false, false, true)
GitHub Check: Test C++ (true, true, true, false)
GitHub Check: Analyze (c-cpp)
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
GitHub Check: Analyze (python)
GitHub Check: Build wheels for cp311-macosx_arm64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Build wheels for cp310-manylinux_aarch64
GitHub Check: Build wheels for cp311-win_amd64

🔇 Additional comments (4)

source/tests/pt/test_init_frz_model.py (4)

16-18: Conversion utility imports are well‑scoped and used

convert_backend and convert_pbtxt_to_pb are imported only where needed and keep the test focused on the pbtxt→pb→pth path; no issues here.

Also applies to: 26-28

38-39: Explicit UTF‑8 when reading JSON is a good robustness improvement

Opening se_atten.json with encoding="utf-8" makes the test more portable across environments with different default encodings; the change is correct.

138-174: New pbtxt→pb→pth initialization test is well‑structured

The new test_init_frz_model_pb2pth cleanly exercises:

conversion from .pbtxt to .pb, then to .pth,

get_trainer(..., init_frz_model=...) with numb_steps = 0,

explicit assertions on trainer, trainer.model, and trainer.optimizer,

and a trainer.run() call to ensure the init path doesn’t crash.

This is a solid, clear end‑to‑end test for the new behavior.

175-186: Extended teardown correctly cleans new .pb artifacts

Adding removal of frozen_model*.pb aligns teardown with the new conversion test and keeps the working directory clean after runs; logic is consistent with the existing .pth/.pt cleanup.

tf2pt model) - Add try-catch block for loading frozen models - Use strict=False when state_dict keys don't match - Log warnings for model state_dict mismatches - Prevent crashes when loading models with different architectures"

…it in UT

…el (#5091) - Add try-catch block for loading frozen models - Use strict=False when state_dict keys don't match - Log warnings for model state_dict mismatches - Prevent crashes when loading models with different architectures" Fix bug in #5090  ## Summary by CodeRabbit * **Bug Fixes** * Frozen model loading now uses non‑strict mode; missing or unexpected parameters are logged as warnings instead of causing failure. * **Tests** * Added an end-to-end test covering conversion and loading of frozen models from external formats. * Test I/O standardized to UTF‑8; teardown extended to remove generated artifacts. * **New Files** * Added a JSON test model configuration for validation and training workflows. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub>

…el (deepmodeling#5091) - Add try-catch block for loading frozen models - Use strict=False when state_dict keys don't match - Log warnings for model state_dict mismatches - Prevent crashes when loading models with different architectures" Fix bug in deepmodeling#5090  ## Summary by CodeRabbit * **Bug Fixes** * Frozen model loading now uses non‑strict mode; missing or unexpected parameters are logged as warnings instead of causing failure. * **Tests** * Added an end-to-end test covering conversion and loading of frozen models from external formats. * Test I/O standardized to UTF‑8; teardown extended to remove generated artifacts. * **New Files** * Added a JSON test model configuration for validation and training workflows. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub>

github-actions bot added the Python label Dec 11, 2025

coderabbitai bot reviewed Dec 11, 2025

View reviewed changes

coderabbitai bot reviewed Dec 12, 2025

View reviewed changes

source/tests/pt/test_init_frz_model.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Dec 12, 2025

View reviewed changes

ChiahsinChu force-pushed the devel-init-from-tf2pt branch from b6c525a to c96476a Compare December 12, 2025 05:57

coderabbitai bot reviewed Dec 12, 2025

View reviewed changes

source/tests/pt/test_init_frz_model.py Outdated Show resolved Hide resolved

ChiahsinChu force-pushed the devel-init-from-tf2pt branch from c96476a to 4069606 Compare December 12, 2025 08:07

coderabbitai bot reviewed Dec 12, 2025

View reviewed changes

source/tests/pt/model/models/se_e2_a.json Show resolved Hide resolved

ChiahsinChu force-pushed the devel-init-from-tf2pt branch from 4069606 to 1be559c Compare December 12, 2025 12:39

coderabbitai bot reviewed Dec 12, 2025

View reviewed changes

ChiahsinChu force-pushed the devel-init-from-tf2pt branch from 1be559c to 5d2e44d Compare December 15, 2025 01:33

coderabbitai bot reviewed Dec 15, 2025

View reviewed changes

source/tests/pt/test_init_frz_model.py Show resolved Hide resolved

wanghan-iapcm reviewed Dec 15, 2025

View reviewed changes

deepmd/pt/train/training.py Show resolved Hide resolved

iProzd approved these changes Dec 15, 2025

View reviewed changes

wanghan-iapcm approved these changes Dec 15, 2025

View reviewed changes

wanghan-iapcm enabled auto-merge December 15, 2025 04:11

auto-merge was automatically disabled December 15, 2025 04:26
Head branch was pushed to by a user without write access

coderabbitai bot reviewed Dec 15, 2025

View reviewed changes

ChiahsinChu force-pushed the devel-init-from-tf2pt branch from 2ddeeeb to bb83aa9 Compare December 15, 2025 08:22

coderabbitai bot reviewed Dec 15, 2025

View reviewed changes

ChiahsinChu added 4 commits December 16, 2025 13:30

add unit test for pb2pth model

bfcbd8a

revise err capture in loading stat_dict

dec0429

adapt coderabbit comment about adding explict assertion of trainer in…

e861add

…it in UT

ChiahsinChu force-pushed the devel-init-from-tf2pt branch from bb83aa9 to e861add Compare December 16, 2025 05:30

wanghan-iapcm added this pull request to the merge queue Dec 16, 2025

github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Dec 16, 2025

njzjz added this pull request to the merge queue Dec 16, 2025

Merged via the queue into deepmodeling:devel with commit 26013cb Dec 16, 2025
58 checks passed

ChiahsinChu mentioned this pull request Dec 17, 2025

[BUG] pt model converted from tf model cannot be used to init training #5090

Closed

ChiahsinChu deleted the devel-init-from-tf2pt branch December 19, 2025 08:47

feat(pt): allow --init-frz-model for pt model converted from tf model #5091

feat(pt): allow --init-frz-model for pt model converted from tf model #5091

Uh oh!

Conversation

ChiahsinChu commented Dec 11, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

wanghan-iapcm commented Dec 15, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat(pt): allow `--init-frz-model` for pt model converted from tf model #5091

feat(pt): allow `--init-frz-model` for pt model converted from tf model #5091

ChiahsinChu commented Dec 11, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 11, 2025 •

edited

Loading

codecov bot commented Dec 11, 2025 •

edited

Loading