-
Notifications
You must be signed in to change notification settings - Fork 15
Add nequip to train. #653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add nequip to train. #653
Conversation
janus train now takes the architecture as an argument. The architecture is match to the appropriate runner.
|
Thank you so much for this! How does nequip fine-tuning work?
Perhaps we should set the default value for mace's
I think this was probably unavoidable given how different training entry points will be, so I think this is fine? |
For fine-tuning it looks simple from what I can tell. For training the config has this model section model:
_target_: nequip.model.NequIPGNNModel
# Then parametersfor fine-tuning it looks like this model:
_target_: nequip.model.ModelFromPackage
package_path: ${model_package_path}where further up the file the path is set model_package_path: /content/NequIP-OAM-L-0.1.nequip.zip |
Yeah looks like that will be a good way to do it, perhaps also use
Me too, but just wanted to remark |
|
Note that |
Now mace and nequip both output to ./janus_results by default.
Ok cool, should we do a similar existence check for |
tests/models/nequip-checkpoint.ckpt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the existing toluene.nequip.pth not work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately not it has to end in .nequip.zip
InstantiationException: Error in call to target 'nequip.train.ema.EMALightningModule':
AssertionError('NequIP framework packaged files must have the `.nequip.zip` extension but found toluene.nequip.pth')
Then if you change it to that, it complains again
InstantiationException: Error in call to target 'nequip.train.ema.EMALightningModule':
RuntimeError('PytorchStreamReader failed locating file .data/extern_modules: file not found')
If you try and make the .data it fails again complaining of inconsistent module structure
Maybe there is an up to date version of the toulene.equip.pth we have? Where did it come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was one @alinelena trained, but now there are released foundational models, perhaps we could use those?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was one @alinelena trained, but now there are released foundational models, perhaps we could use those?
No problem in theory, the smallest looks to be 75 MB (NequIP-MP-L-0.1.nequip). It could be downloaded perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yap good idea
This will add nequip as a
janus train [architecture]option, where the [architecture] is now a required argument to train. AllArchitecturesother than"mace","mace_mp","mace_off","mace_omol", and"nequip"raise an error.It does work with these inputs inputs.zip
Todo:
janus train nequip.One thing I noticed is that
janus train mace ...will currently output outsidejanus_results, but the way I have it for nequip it goes intojanus_results. What's better? E.g. for mace,Or is that something that is control by the mace
config.yml? For nequip I've had to set it manually so I chose./janus_results.Details
Similar to
mace.cli.run_train.run, nequip has anequip.scripts.train.mainentrypoint - but it uses Hydra to manage configuration here.I've done similar to the current code and used the
mainfunction rather than a subprocess. This means using the Compose API which at least one downsides - thehydra.runtime.output_diris not set, but is required, and we have to manually set the HydraConfig singleton (as far as I've found...).This does mean that we have a match that obtains the runner function and its arguments, seperating out modules to avoid conflicts (mace/nequip in fact is one).