-
Notifications
You must be signed in to change notification settings - Fork 191
chore(skills): import DP-GEN simplify agent skill #1879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
njzjz
merged 4 commits into
deepmodeling:master
from
njzjz-bot:chore/import-agent-skills
May 8, 2026
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
ca1a48c
sync(skills): Add dpgen-simplify skill (#59)
hyb1109 7826c81
sync(skills): refactor(skills): move dpgen-simplify to machine-learni…
njzjz-bot bed4e64
sync(skills): Enhance dpgen-simplify templates and runtime guidance (…
hyb1109 a8d60ba
style(skills): format dpgen JSON templates
njzjz-bot File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,341 @@ | ||
| --- | ||
| name: dpgen-simplify | ||
| description: Prepare, explain, validate, and run DP-GEN simplify workflows for reducing repeated or redundant DeepMD datasets. Use when the user wants to generate or modify `param.json` and `machine.json`, run `dpgen simplify param.json machine.json`, organize repeated simplify experiments, or inspect simplify outputs. | ||
| compatibility: Requires a runnable environment with Python and an activated DP-GEN runtime where `dpgen` is available in PATH for the outer simplify command. Real execution also requires DeePMD-kit and any backend-specific software required by the selected `fp_style`. For scheduler execution, each stage environment must be explicitly activated in `resources.source_list`. | ||
| license: LGPL-3.0-or-later | ||
| metadata: | ||
| author: hyb1109 | ||
| version: 0.2.0 | ||
| repository: https://github.com/deepmodeling/dpgen | ||
| --- | ||
|
|
||
| # DP-GEN Simplify | ||
|
|
||
| Use this skill when the user wants to prepare, explain, validate, or execute the `dpgen simplify` workflow. | ||
|
|
||
| This skill is for dataset simplification workflows where the user already has candidate data in DeepMD-compatible format and wants to reduce repeated or redundant structures through iterative selection. | ||
|
|
||
| ## Core Rule (Critical) | ||
|
|
||
| DP-GEN simplify always uses **two parameter classes** and therefore **two JSON files**: | ||
|
|
||
| - **Workflow parameters** -> `param.json` | ||
| - **Execution / machine parameters** -> `machine.json` | ||
|
|
||
| Run exactly: | ||
|
|
||
| ```bash | ||
| dpgen simplify param.json machine.json | ||
| ``` | ||
|
|
||
| Environment boundary rule: | ||
|
|
||
| - Outer layer: run `dpgen simplify param.json machine.json` in an activated environment where `dpgen --version` works. | ||
| - Inner layer: for scheduler stages, explicitly activate runtime in `resources.source_list` on the server side. | ||
|
|
||
| ## Agent responsibilities | ||
|
|
||
| When using this skill, the agent should: | ||
|
|
||
| 1. confirm that the task is a simplify workflow | ||
| 1. check whether existing configs or templates are already available | ||
| 1. collect only the missing dataset, training, FP, and machine inputs | ||
| 1. generate or patch `param.json` | ||
| 1. generate or patch `machine.json` | ||
| 1. explain important simplify parameters in plain language when asked | ||
| 1. validate the workflow before execution | ||
| 1. provide the exact command for running simplify | ||
| 1. if requested, help structure repeated experiments | ||
| 1. after execution, summarize outputs and next inspection targets | ||
|
|
||
| ## Working policy | ||
|
|
||
| ### 1. Ask only for missing inputs | ||
|
|
||
| Do not ask the user for everything if part of the configuration is already available. | ||
|
|
||
| If the user already provides: | ||
|
|
||
| - a partial `param.json` | ||
| - a partial `machine.json` | ||
| - a known training template | ||
| - a known cluster template | ||
|
|
||
| then patch those files instead of rebuilding everything from scratch. | ||
|
|
||
| ### 2. Preserve the user's scientific choices | ||
|
|
||
| Do not silently change: | ||
|
|
||
| - descriptor family | ||
| - fitting net structure | ||
| - fp backend | ||
| - trust thresholds | ||
| - `type_map` ordering | ||
|
|
||
| If a value looks scientifically questionable, explain the concern instead of silently replacing it. | ||
|
|
||
| ### 3. Keep local and scheduler execution explicit | ||
|
|
||
| If the user wants local execution, produce local-friendly commands. | ||
|
|
||
| If the user wants scheduler execution, produce scheduler-friendly commands and keep queue, partition, and resource requests explicit. | ||
|
|
||
| Do not invent scheduler module names or executable paths. | ||
|
|
||
| ### 4. Do not invent environment activation commands | ||
|
|
||
| If the user already has a working activation command such as: | ||
|
|
||
| - `conda activate ...` | ||
| - `module load ...` | ||
| - `source ...` | ||
|
|
||
| reuse it exactly. | ||
|
|
||
| If execution is requested and the activation method is unknown, ask the user for the precise activation command. | ||
|
|
||
| Do not guess conda environment names, module names, or site-specific paths. | ||
|
|
||
| ### 4.1 Outer launcher policy | ||
|
|
||
| Use an activated DP-GEN environment and verify with: | ||
|
|
||
| ```bash | ||
| dpgen --version | ||
| ``` | ||
|
|
||
| Do not start simplify from a shell where `dpgen` is unavailable. | ||
|
|
||
| ### 4.2 Outer vs inner runtime boundaries (critical) | ||
|
|
||
| Treat simplify execution as two separate environment layers: | ||
|
|
||
| 1. Outer layer: the shell that launches `dpgen simplify param.json machine.json` (must have `dpgen` in PATH) | ||
| 1. Inner layer: stage tasks dispatched by DP-GEN (`train` / `model_devi` / `fp`) on server/runtime side | ||
|
|
||
| Even if the outer layer is correct, inner stage tasks still need explicit runtime setup in `machine.json`. | ||
| Do not assume the outer shell environment will be inherited by dispatched stage jobs. | ||
| For scheduler-style execution, `resources.source_list` must explicitly activate the required runtime environment. | ||
|
|
||
| ### 5. Prefer reproducible output layout | ||
|
|
||
| When generating a simplify workflow, keep files organized and predictable. | ||
|
|
||
| Recommended structure: | ||
|
|
||
| ```text | ||
| project/ | ||
| ├── param.json | ||
| ├── machine.json | ||
| ├── run.sh | ||
| ├── logs/ | ||
| └── summary/ | ||
| ``` | ||
|
|
||
| For repeated experiments: | ||
|
|
||
| ```text | ||
| project/ | ||
| ├── base/ | ||
| ├── exp_01/ | ||
| ├── exp_02/ | ||
| ├── exp_03/ | ||
| └── summary/ | ||
| ``` | ||
|
|
||
| ## Minimum required inputs | ||
|
|
||
| Collect the following information before generating files. | ||
|
|
||
| ### Dataset information | ||
|
|
||
| - `pick_data` | ||
| - `sys_configs` | ||
| - `init_data_prefix` | ||
| - `init_data_sys` | ||
| - `sys_batch_size` | ||
| - dataset format | ||
| - `type_map` | ||
| - `mass_map` if needed | ||
| - `labeled` | ||
|
|
||
| ### Simplify controls | ||
|
|
||
| - `init_pick_number` | ||
| - `iter_pick_number` | ||
| - `model_devi_f_trust_lo` | ||
| - `model_devi_f_trust_hi` | ||
| - `model_devi_e_trust_lo` / `model_devi_e_trust_hi` if energy trust is used | ||
| - `numb_models` if not already specified | ||
|
|
||
| ### Training setup | ||
|
|
||
| - `train_backend` if required by environment (for example `pytorch`) | ||
| - `default_training_param` | ||
| - descriptor settings | ||
| - fitting network settings | ||
| - learning rate settings | ||
| - loss settings | ||
| - training step settings | ||
|
|
||
| ### FP setup | ||
|
|
||
| - `fp_style` | ||
| - If data is already labeled (energy/force/virial available) and no re-labeling is requested, set `fp_style` to `none`. | ||
| - if `fp_style != "none"`, collect matching FP runtime settings such as: | ||
| - `fp_task_max` | ||
| - `fp_task_min` | ||
| - `fp_params` | ||
| - pseudopotential or backend file paths if required | ||
|
|
||
| ### Execution setup | ||
|
|
||
| For each stage `train`, `model_devi`, and `fp`, collect or preserve: | ||
|
|
||
| - `command` | ||
| - `machine.batch_type` | ||
| - `machine.context_type` | ||
| - `machine.local_root` | ||
| - `machine.remote_root` | ||
| - `resources.number_node` | ||
| - `resources.cpu_per_node` | ||
| - `resources.gpu_per_node` | ||
| - `resources.group_size` | ||
| - `resources.source_list` (required for scheduler jobs; use it to activate environment explicitly) | ||
| - any explicit queue / partition / custom scheduler flags if the user already uses them | ||
|
|
||
| Choose a runtime profile first, then fill the matching template: | ||
|
|
||
| - server-local Slurm: `assets/machine.template.server-local-slurm.json` | ||
| - local machine -> remote Slurm via SSH: `assets/machine.template.ssh-remote-slurm.json` | ||
| - pure local shell testing: `assets/machine.template.local-shell.json` | ||
|
|
||
| ## How to build `param.json` | ||
|
|
||
| Construct `param.json` around these logical blocks: | ||
|
|
||
| 1. element and mass definitions | ||
| 1. data source and batch settings | ||
| 1. model ensemble count | ||
| 1. default DeePMD training parameters | ||
| 1. FP backend settings | ||
| 1. simplify pick settings | ||
| 1. trust thresholds | ||
|
|
||
| Key fields usually include: | ||
|
|
||
| - `type_map` | ||
| - `mass_map` | ||
| - `pick_data` | ||
| - `init_data_prefix` | ||
| - `init_data_sys` | ||
| - `sys_batch_size` | ||
| - `numb_models` | ||
| - `default_training_param` | ||
| - `fp_style` | ||
| - `shuffle_poscar` | ||
| - `fp_task_max` | ||
| - `fp_task_min` | ||
| - `fp_pp_path` | ||
| - `fp_pp_files` | ||
| - `fp_params` | ||
| - `init_pick_number` | ||
| - `iter_pick_number` | ||
| - `model_devi_f_trust_lo` | ||
| - `model_devi_f_trust_hi` | ||
|
|
||
| If the user is doing grid experiments, keep a base template and derive variants from it. | ||
|
|
||
| Official reference example (QM7-style, adapted with path placeholders): | ||
|
|
||
| - `assets/param.example.qm7.from-official-docs.json` | ||
|
|
||
| ## How to build `machine.json` | ||
|
|
||
| Construct `machine.json` with separate stage blocks for: | ||
|
|
||
| - `train` | ||
| - `model_devi` | ||
| - `fp` | ||
|
|
||
| For each stage, keep the following explicit: | ||
|
|
||
| - `command` | ||
| - machine or context configuration | ||
| - resources | ||
| - queue or partition if needed | ||
| - cpu and gpu counts | ||
| - custom scheduler flags | ||
| - environment activation commands | ||
|
|
||
| Do not merge all stages into one vague machine block. | ||
|
|
||
| ## Validation before run | ||
|
|
||
| Before execution, validate the workflow in this order: | ||
|
|
||
| 1. confirm outer-layer `dpgen` is available: | ||
|
|
||
| ```bash | ||
| dpgen --version | ||
| ``` | ||
|
|
||
| 2. validate JSON syntax: | ||
|
|
||
| ```bash | ||
| python -m json.tool param.json | ||
| python -m json.tool machine.json | ||
| ``` | ||
|
|
||
| 3. verify required dataset paths exist | ||
| 1. verify stage commands match the selected software stack | ||
| 1. if `fp_style` is `none`, do not require FP-specific backend settings | ||
| 1. only then run: | ||
|
|
||
| ```bash | ||
| dpgen simplify param.json machine.json | ||
| ``` | ||
|
|
||
| ## Output contract | ||
|
|
||
| Always provide: | ||
|
|
||
| 1. final absolute paths to `param.json` and `machine.json` | ||
| 1. the exact simplify command to run (`dpgen simplify param.json machine.json`) | ||
| 1. a short pre-run checklist | ||
| 1. any unresolved required fields | ||
| 1. if execution was performed, the main output locations and next files to inspect | ||
|
|
||
| ## Guardrails | ||
|
|
||
| - Never merge workflow and machine parameters into one file. | ||
| - Never run `dpgen simplify` before both JSON files are present. | ||
| - Never hardcode personal cluster, account, queue, or path settings as universal defaults. | ||
| - Never silently change the user's scientific choices. | ||
| - Keep `type_map` ordering consistent with dataset typing. | ||
| - If required inputs are missing, stop and ask instead of guessing. | ||
| - If `fp_style` is `none`, skip FP-specific prompts and keep FP-specific settings disabled or unset. | ||
| - If data is already labeled and the user does not request new labels, enforce `fp_style = "none"` and do not require active FP runtime fields. | ||
| - Do not assume outer-shell activation is inherited by stage jobs; for scheduler execution, require explicit `source_list` per stage. | ||
| - If the user already has working templates, patch them rather than overwriting them blindly. | ||
|
|
||
| ## References and bundled files | ||
|
|
||
| Use these bundled files: | ||
|
|
||
| - `assets/param.template.json` | ||
| - `assets/param.example.qm7.from-official-docs.json` | ||
| - `assets/machine.template.json` | ||
| - `assets/machine.template.server-local-slurm.json` | ||
| - `assets/machine.template.ssh-remote-slurm.json` | ||
| - `assets/machine.template.local-shell.json` | ||
| - `references/param-fields.md` | ||
| - `references/machine-fields.md` | ||
| - `references/workflow-notes.md` | ||
|
|
||
| External references: | ||
|
|
||
| - DP-GEN simplify overview: https://docs.deepmodeling.com/projects/dpgen/en/latest/simplify/simplify.html | ||
| - simplify parameter definitions: https://docs.deepmodeling.com/projects/dpgen/en/latest/simplify/simplify-jdata.html | ||
| - simplify machine definitions: https://docs.deepmodeling.com/projects/dpgen/en/latest/simplify/simplify-mdata.html |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| { | ||
| "api_version": "1.0", | ||
| "deepmd_version": "2.0", | ||
| "train": { | ||
| "command": "dp", | ||
| "machine": { | ||
| "batch_type": null, | ||
| "context_type": null, | ||
| "local_root": "./", | ||
| "remote_root": null | ||
| }, | ||
| "resources": { | ||
| "number_node": null, | ||
| "cpu_per_node": null, | ||
| "gpu_per_node": null, | ||
| "group_size": null | ||
| } | ||
| }, | ||
| "model_devi": { | ||
| "command": "dp", | ||
| "machine": { | ||
| "batch_type": null, | ||
| "context_type": null, | ||
| "local_root": "./", | ||
| "remote_root": null | ||
| }, | ||
| "resources": { | ||
| "number_node": null, | ||
| "cpu_per_node": null, | ||
| "gpu_per_node": null, | ||
| "group_size": null | ||
| } | ||
| }, | ||
| "fp": { | ||
| "command": null, | ||
| "machine": { | ||
| "batch_type": null, | ||
| "context_type": null, | ||
| "local_root": "./", | ||
| "remote_root": null | ||
| }, | ||
| "resources": { | ||
| "number_node": null, | ||
| "cpu_per_node": null, | ||
| "gpu_per_node": null, | ||
| "group_size": null | ||
| } | ||
| } | ||
| } | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add
resources.source_listto all stage templates for scheduler compatibility.SKILL.mdrequires explicit inner-stage environment activation viaresources.source_list, but this base template omits the field in every stage. That mismatch can lead to invalid scheduler configs being generated from this template.💡 Suggested patch
"train": { "command": "dp", "machine": { "batch_type": null, "context_type": null, "local_root": "./", "remote_root": null }, "resources": { "number_node": null, "cpu_per_node": null, "gpu_per_node": null, - "group_size": null + "group_size": null, + "source_list": null } }, @@ "resources": { "number_node": null, "cpu_per_node": null, "gpu_per_node": null, - "group_size": null + "group_size": null, + "source_list": null } }, @@ "resources": { "number_node": null, "cpu_per_node": null, "gpu_per_node": null, - "group_size": null + "group_size": null, + "source_list": null } }Also applies to: 27-32, 42-47
🤖 Prompt for AI Agents