Skip to content

Conversation

@harveydevereux
Copy link
Collaborator

@harveydevereux harveydevereux commented Jan 13, 2026

This will add nequip as a janus train [architecture] option, where the [architecture] is now a required argument to train. All Architectures other than "mace", "mace_mp", "mace_off", "mace_omol", and "nequip" raise an error.

It does work with these inputs inputs.zip

Todo:

  • Testcase with janus train nequip.
  • Fine-tuning (and test).
  • Documentation.

One thing I noticed is that janus train mace ... will currently output outside janus_results, but the way I have it for nequip it goes into janus_results. What's better? E.g. for mace,

checkpoints  config.yml  janus_results  logs  results  test_compiled.model  test.model

Or is that something that is control by the mace config.yml? For nequip I've had to set it manually so I chose ./janus_results.


Details

Similar to mace.cli.run_train.run, nequip has a nequip.scripts.train.main entrypoint - but it uses Hydra to manage configuration here.

I've done similar to the current code and used the main function rather than a subprocess. This means using the Compose API which at least one downsides - the hydra.runtime.output_dir is not set, but is required, and we have to manually set the HydraConfig singleton (as far as I've found...).

This does mean that we have a match that obtains the runner function and its arguments, seperating out modules to avoid conflicts (mace/nequip in fact is one).

janus train now takes the architecture as an argument.

The architecture is match to the appropriate runner.
@ElliottKasoar
Copy link
Member

Thank you so much for this!

How does nequip fine-tuning work?

One thing I noticed is that janus train mace ... will currently output outside janus_results, but the way I have it for nequip it goes into janus_results. What's better? E.g. for mace,

Perhaps we should set the default value for mace's --work_dir (https://github.com/ACEsuit/mace/blob/main/mace/tools/arg_parser.py#L38) to janus-core, and then do the equivalent for any new architecture?

This does mean that we have a match that obtains the runner function and its arguments, seperating out modules to avoid conflicts (mace/nequip in fact is one).

I think this was probably unavoidable given how different training entry points will be, so I think this is fine?

@ElliottKasoar ElliottKasoar linked an issue Jan 15, 2026 that may be closed by this pull request
@harveydevereux
Copy link
Collaborator Author

How does nequip fine-tuning work?

For fine-tuning it looks simple from what I can tell. For training the config has this model section

  model:
    _target_: nequip.model.NequIPGNNModel
# Then parameters

for fine-tuning it looks like this

  model:
    _target_: nequip.model.ModelFromPackage
    package_path: ${model_package_path}

where further up the file the path is set

model_package_path: /content/NequIP-OAM-L-0.1.nequip.zip

@harveydevereux
Copy link
Collaborator Author

Perhaps we should set the default value for mace's --work_dir (https://github.com/ACEsuit/mace/blob/main/mace/tools/arg_parser.py#L38) to janus-core, and then do the equivalent for any new architecture?

Yeah looks like that will be a good way to do it, perhaps also use FilePrefix argument in train? Which can be passed on?

This does mean that we have a match that obtains the runner function and its arguments, seperating out modules to avoid conflicts (mace/nequip in fact is one).

I think this was probably unavoidable given how different training entry points will be, so I think this is fine?

Me too, but just wanted to remark

@harveydevereux
Copy link
Collaborator Author

Note that .yaml is a necessity and that the error is not very clear (but there is a pr). So I put a ValueError check to catch .yml.

facebookresearch/hydra#1050

facebookresearch/hydra#2889

@ElliottKasoar
Copy link
Member

ElliottKasoar commented Jan 20, 2026

How does nequip fine-tuning work?

For fine-tuning it looks simple from what I can tell. For training the config has this model section

  model:
    _target_: nequip.model.NequIPGNNModel
# Then parameters

for fine-tuning it looks like this

  model:
    _target_: nequip.model.ModelFromPackage
    package_path: ${model_package_path}

where further up the file the path is set

model_package_path: /content/NequIP-OAM-L-0.1.nequip.zip

Ok cool, should we do a similar existence check for model_package_path if fine-tuning then as we do for mace?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the existing toluene.nequip.pth not work?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately not it has to end in .nequip.zip

InstantiationException: Error in call to target 'nequip.train.ema.EMALightningModule':
AssertionError('NequIP framework packaged files must have the `.nequip.zip` extension but found toluene.nequip.pth')

Then if you change it to that, it complains again

InstantiationException: Error in call to target 'nequip.train.ema.EMALightningModule':
RuntimeError('PytorchStreamReader failed locating file .data/extern_modules: file not found')

If you try and make the .data it fails again complaining of inconsistent module structure

Maybe there is an up to date version of the toulene.equip.pth we have? Where did it come from?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was one @alinelena trained, but now there are released foundational models, perhaps we could use those?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was one @alinelena trained, but now there are released foundational models, perhaps we could use those?

No problem in theory, the smallest looks to be 75 MB (NequIP-MP-L-0.1.nequip). It could be downloaded perhaps?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yap good idea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expand supported models for training

3 participants