Skip to content

connect to calculations and systems through tasks#49

Open
JFRudzinski wants to merge 25 commits intodevelopfrom
eos-workflow
Open

connect to calculations and systems through tasks#49
JFRudzinski wants to merge 25 commits intodevelopfrom
eos-workflow

Conversation

@JFRudzinski
Copy link
Copy Markdown
Collaborator

To create an EOS workflow entry by uploading a workflow yaml

Test Data

@JFRudzinski
Copy link
Copy Markdown
Collaborator Author

So far I was able to populate the energies and volumes correctly from the tasks, which then triggers the calculation of the fits (also successful).

However, Something is wrong with the visualization as nothing shows up on the overview page and I get the attached error upon clicking on workflow2 in the DATA tab.
Screenshot from 2025-04-09 14-43-12

It seems possible that the javascript plotting was never tested with the workflow2 schema, since the only EOS entries on the repo are from materials project and are quite old.

@ladinesa Any thoughts/suggestions about this? What do you think about implementing plotly annotations in the schema and then disconnecting the javascript (or avoiding it if the plotly is there)?

Also please actually check the code here to see if this is the best approach, thanks!

@ladinesa
Copy link
Copy Markdown
Collaborator

ladinesa commented Apr 9, 2025

So far I was able to populate the energies and volumes correctly from the tasks, which then triggers the calculation of the fits (also successful).

However, Something is wrong with the visualization as nothing shows up on the overview page and I get the attached error upon clicking on workflow2 in the DATA tab. Screenshot from 2025-04-09 14-43-12

It seems possible that the javascript plotting was never tested with the workflow2 schema, since the only EOS entries on the repo are from materials project and are quite old.

@ladinesa Any thoughts/suggestions about this? What do you think about implementing plotly annotations in the schema and then disconnecting the javascript (or avoiding it if the plotly is there)?

Also please actually check the code here to see if this is the best approach, thanks!

i think it makes more sense to use plotly annotations but i could also put up a fix to the gui. Maybe dm me the normalized workflow archive file so i can test it.

@JFRudzinski
Copy link
Copy Markdown
Collaborator Author

@ladinesa I am planning to get back to this issue this week. Any new thoughts from your side or should I just go ahead with the plotly plots?

@ladinesa
Copy link
Copy Markdown
Collaborator

@ladinesa I am planning to get back to this issue this week. Any new thoughts from your side or should I just go ahead with the plotly plots?

o sorry i forgot about this, we can do both, fix the gui and introduce plotly plots

@JFRudzinski
Copy link
Copy Markdown
Collaborator Author

Ok, np, I will add the plotly stuff then, thanks!

@ladinesa
Copy link
Copy Markdown
Collaborator

However, Something is wrong with the visualization as nothing shows up on

I cannot recreate this locally with the upload you provided. I can open the workflow2 section without any error.

@JFRudzinski
Copy link
Copy Markdown
Collaborator Author

However, Something is wrong with the visualization as nothing shows up on

I cannot recreate this locally with the upload you provided. I can open the workflow2 section without any error.

Huh, strange. I updated my NOMAD to develop again, and then tried again and I get the same error.

Some questions:

  1. Do the EOS plots also show up for you?
  2. Are the EOSFits populated in the archive?
  3. Could you maybe show a screenshot of what your workflow2 sections looks like in DATA?

@ladinesa
Copy link
Copy Markdown
Collaborator

However, Something is wrong with the visualization as nothing shows up on

I cannot recreate this locally with the upload you provided. I can open the workflow2 section without any error.

Huh, strange. I updated my NOMAD to develop again, and then tried again and I get the same error.

Some questions:

  1. Do the EOS plots also show up for you?
  2. Are the EOSFits populated in the archive?
  3. Could you maybe show a screenshot of what your workflow2 sections looks like in DATA?

not sure what happened now, maybe i loaded the wrong set of plugins but I do see now the error. let me work on it. the eos plots do not show, even yesterday when I can load the workflow2 section in the data tab

Comment thread simulationworkflowschema/equation_of_state.py Outdated
@JFRudzinski
Copy link
Copy Markdown
Collaborator Author

JFRudzinski commented May 7, 2025

@ladinesa This all works perfectly now:
EOS

Not sure if I mentioned this, but this is all to assist Mariano Forti and Thomas Hammerschmidt in supporting their workflows. In the test data that they provided, the first task is a geometry optimization, which is not part of the EOS but the output structure is the one deformed for the different starting points of the EOS. So, I added some code in this direction. Can you review and let me know if you think it makes sense?

Absolutely no rush btw, I will not really have time to work on this again for 2 weeks probably. But if it is good as is (or close), then I would merge it sooner than later I guess.

@JFRudzinski JFRudzinski requested a review from ladinesa May 7, 2025 13:24
Comment thread simulationworkflowschema/equation_of_state.py Outdated
@JFRudzinski
Copy link
Copy Markdown
Collaborator Author

I just saw now that I broke the tests, so I will have to come back later and look at that, I need to move on to something else now...

@ladinesa
Copy link
Copy Markdown
Collaborator

ladinesa commented May 7, 2025

I just saw now that I broke the tests, so I will have to come back later and look at that, I need to move on to something else now...

you can just comment it out for now,

@JFRudzinski
Copy link
Copy Markdown
Collaborator Author

  • The test was easy to fix: tasks.task was simply not populated in this case. I can't tell you why because I don't know how the tasks are set, but I just added a try: to avoid the issue
  • I filter now for SinglePoint's, but I was wondering whether maybe there was a more efficient way instead of grabbing all the archives?
  • Also, just to mention, there is still 1 warning about how the system, method, etc is needed for normalization (you can see in the video I posted above). I guess this raised somehow before we add the run section. Not sure, but I guess it doesn't matter too much?

@JFRudzinski
Copy link
Copy Markdown
Collaborator Author

  • The test was easy to fix: tasks.task was simply not populated in this case. I can't tell you why because I don't know how the tasks are set, but I just added a try: to avoid the issue
  • I filter now for SinglePoint's, but I was wondering whether maybe there was a more efficient way instead of grabbing all the archives?
  • Also, just to mention, there is still 1 warning about how the system, method, etc is needed for normalization (you can see in the video I posted above). I guess this raised somehow before we add the run section. Not sure, but I guess it doesn't matter too much?

@ladinesa Any thoughts about this?

Comment thread simulationworkflowschema/equation_of_state.py Outdated
@ladinesa
Copy link
Copy Markdown
Collaborator

  • The test was easy to fix: tasks.task was simply not populated in this case. I can't tell you why because I don't know how the tasks are set, but I just added a try: to avoid the issue
  • I filter now for SinglePoint's, but I was wondering whether maybe there was a more efficient way instead of grabbing all the archives?
  • Also, just to mention, there is still 1 warning about how the system, method, etc is needed for normalization (you can see in the video I posted above). I guess this raised somehow before we add the run section. Not sure, but I guess it doesn't matter too much?

@ladinesa Any thoughts about this?

  1. Sorry, which test again? the failing test in a previous commit was on chemical reaction workflow, not covered here, some change maybe in the parser affected this.
  2. I do not quite get it, we specify the tasks in the workflow yaml file so I am not sure why we need to filter afterwards. In the upload you provided, it does not contain any geom opt task so why was this added to tasks later? the parent class does not append any tasks.
  3. we can also remove the super().normalize call or put it after populating run

@JFRudzinski
Copy link
Copy Markdown
Collaborator Author

  1. Oh, I thought it was posted somewhere above, guess not, sorry. It was just this test_eos. It failed because self.tasks.task was empty. I think it's fine now, at least in terms of the changes that I made. My code is just mostly not executed now in this case.
  2. The test data at the top of this PR indeed does not have a GO, but they later provide THIS which does. They consider this GO as the first step in their EOS workflow. That's why I am filtering it out before grabbing the results (energies/volumes). I guess it is a matter of how flexible we want to be. We could not do this filtering and tell them to make a hierarchical workflow, e.g., 1 workflow for GO+EOS and then also the EOS workflow. For me the answer depends on how common this is, but I figured in this case we could support it...what do you think?
  3. Which do you suggest in this case? Or what are the corresponding possible consequences?

Thanks!

@ladinesa
Copy link
Copy Markdown
Collaborator

  1. I think this can be fixed if we create the run section at the top and call super().normalize after before running the rest of the eos normalization.
  2. Ah all clear, yes, they should make it a nested workflow, one with go-eos, but yes we have calculations_ref only for the single_point, otherwise we can also put this in the parent.
  3. see 1

@JFRudzinski
Copy link
Copy Markdown
Collaborator Author

JFRudzinski commented May 14, 2025

1 and 3. So, when I create run first and then super.normalize(), it creates a different issue: self._calculations is set to the calculations under run. But I have opted to only put the input of the EOS calculation (i.e., output of the GO) into run.

I could just overwrite it, but I am hesitant to do that in case there are other routes where this workflow gets populated...or do you think this is ok? Or do you think we should store all the calculations under run? (it seems redundant to me)

  1. I didn't quite understand your stance in the end. Are you fine with either method (i.e., nested workflow or allowing GOs and filtering), or do you feel strongly that we tell them they have to make a nested workflow?

@ladinesa
Copy link
Copy Markdown
Collaborator

you can simply overwrite it. it only applies to eos workflowz so yeah one can undo the parent class normalozation in the child. that is common.2.i suggest that theychange their workflow.

@JFRudzinski
Copy link
Copy Markdown
Collaborator Author

@ladinesa I finally found some time to look at this again, here is a summary of the current status and some open issues (sorry for the mess of comments and print statements, I didn't want to delete those yet until I get some answers to these):

  1. I now use the provided input section to populate run. I first implemented everything assuming an MProxy there (as in my practical example with the YAML workflows), but then I ended up generalizing it to when you have a resolved section. This was happening in the case of the test which is still failing. But in the end I am not sure if this test is relevant or working properly. The current error is now that the system section does not have an m_parent_index...maybe it's using a very old schema version or something?!
  2. I now make sure to add the input structure as input to each task for standardization. Right now I am using the m_proxy_value to compare if 2 sections are the same. I am not sure if I need to cover the general case, or how to compare 2 (resolved) sections.
  3. I tried to remove the super.normalize() as you suggested but this results in breaking issues, and the rest is working at least in this case so I think we leave that.
  4. I am getting quite inconsistent behavior of the visualization connections between the input structure and all the tasks...however, maybe we don't need to address that here.

Comment thread simulationworkflowschema/equation_of_state.py Outdated
Comment thread simulationworkflowschema/equation_of_state.py
Comment thread simulationworkflowschema/equation_of_state.py Outdated
Comment thread simulationworkflowschema/equation_of_state.py Outdated
Comment thread simulationworkflowschema/equation_of_state.py
Comment thread simulationworkflowschema/equation_of_state.py Outdated
Comment thread simulationworkflowschema/equation_of_state.py Outdated
Comment thread simulationworkflowschema/equation_of_state.py Outdated
Comment thread simulationworkflowschema/equation_of_state.py Outdated
@JFRudzinski
Copy link
Copy Markdown
Collaborator Author

@ladinesa I was able to deal with non-proxy sections for the first part, but not the second part of my code. See my comments above. Any other ideas? (Let me know if the commented code is too confusing, I can clean it up, just using for debugging because I don't have a good testing method for non-proxy sections at the moment)

I need to try to wrap this up this week so that I have something for Mariano and Thomas early next week 😬

@JFRudzinski
Copy link
Copy Markdown
Collaborator Author

@ladinesa I thought I had this solved because directly using .m_resolved() and .m_proxy_value was working even when I preresolved the section. However, I now found that these attributes appear to only be present when the section was at some point a proxy.

if I fetch a section with .m_xpath(...) then these attributes do not exist. This indicates that the current code may not work if the tasks/inputs are set by a parser. But also I am looking to add another simplification which will set the sections with .m_xpath(...) (or similar). In this case, the current approach will definitely fail.

Maybe I can somehow create a proxy from a section and then resolve it to obtain these attributes in either case 😵

@ladinesa
Copy link
Copy Markdown
Collaborator

ladinesa commented Jun 4, 2025

can you try to use archive.m_context.resolve_archive(path)?

@JFRudzinski
Copy link
Copy Markdown
Collaborator Author

can you try to use archive.m_context.resolve_archive(path)?

This appears to return the complete archive, not sure if I am using it correctly, I do:

section = archive.m_context.resolve_archive(
                       "../upload/archive/omceU6tDd4MKm9SxXE3DLKG3shxG#/run/0/system/-1"
                    )

When I print section it is:

section: EntryArchive(processing_logs, run, workflow2, metadata, results)

Maybe the path syntax is wrong? I found some example in the codebase tests and it seemed like it's supposed to be the global path.

@ladinesa
Copy link
Copy Markdown
Collaborator

ladinesa commented Jun 4, 2025

yes this will resolve the archive from which you can then resolve the section using root_section.m_resolve("/run/0/system/-1")

# ! m_proxy_value is not available for "noraml sections"
print(f'input_item: {input_item}')
print(f'input_item.section: {input_item.section}')
raw_proxy_value = input_item.section.m_proxy_value
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ladinesa So, if I put back in the old test, this line throws an error "AttributeError: m_proxy_value".

The above print statements give:

input_item: Input system:Link(name, section)
input_item.section: System(atoms, atoms_group, constraint, prototype, springer_material, symmetry)

However, when I create from yaml (and print same quantities via logger), I get:

"input_item: Geometry Optimization Output Structure:Link(name, section)"
"input_item.section: System(type, configuration_raw_gid, is_representative, chemical_composition, chemical_composition_hill, chemical_composition_reduced, atoms, symmetry, descriptors)"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share your yaml file? How do i reproduce the first one?

@JFRudzinski
Copy link
Copy Markdown
Collaborator Author

Yes, sorry, the first one is from running the pytest for the package: uv run --directory packages/nomad-schema-plugin-simulation-workflow/ pytest

The yaml is:

'workflow2':
  'm_def': 'simulationworkflowschema.equation_of_state.EquationOfState'
  'name': 'EOS Workflow'
  'inputs':
  - 'name': 'Geometry Optimization Output Structure'
    'section': '../upload/archive/mainfile/Fe-Mo/data/Fe_pv/bulk/A15/relax/xc=PBE-PAW.E=450.dk=0.020/OUTCAR.gz#/run/0/system/-1'
  'tasks':
  - 'm_def': 'nomad.datamodel.metainfo.workflow.TaskReference'
    'name': 'Deformation 0.975'
    'task': '../upload/archive/mainfile/Fe-Mo/data/Fe_pv/bulk/A15/volume_relaxed/xc=PBE-PAW.E=450.dk=0.020/OUTCAR.0.975.gz#/workflow2'
  - 'm_def': 'nomad.datamodel.metainfo.workflow.TaskReference'
    'name': 'Deformation 0.980'
    'task': '../upload/archive/mainfile/Fe-Mo/data/Fe_pv/bulk/A15/volume_relaxed/xc=PBE-PAW.E=450.dk=0.020/OUTCAR.0.980.gz#/workflow2'
  - 'm_def': 'nomad.datamodel.metainfo.workflow.TaskReference'
    'name': 'Deformation 0.985'
    'task': '../upload/archive/mainfile/Fe-Mo/data/Fe_pv/bulk/A15/volume_relaxed/xc=PBE-PAW.E=450.dk=0.020/OUTCAR.0.985.gz#/workflow2'
  - 'm_def': 'nomad.datamodel.metainfo.workflow.TaskReference'
    'name': 'Deformation 0.990'
    'task': '../upload/archive/mainfile/Fe-Mo/data/Fe_pv/bulk/A15/volume_relaxed/xc=PBE-PAW.E=450.dk=0.020/OUTCAR.0.990.gz#/workflow2'
  - 'm_def': 'nomad.datamodel.metainfo.workflow.TaskReference'
    'name': 'Deformation 0.995'
    'task': '../upload/archive/mainfile/Fe-Mo/data/Fe_pv/bulk/A15/volume_relaxed/xc=PBE-PAW.E=450.dk=0.020/OUTCAR.0.995.gz#/workflow2'
  - 'm_def': 'nomad.datamodel.metainfo.workflow.TaskReference'
    'name': 'Deformation 1.000'
    'task': '../upload/archive/mainfile/Fe-Mo/data/Fe_pv/bulk/A15/volume_relaxed/xc=PBE-PAW.E=450.dk=0.020/OUTCAR.1.000.gz#/workflow2'
  - 'm_def': 'nomad.datamodel.metainfo.workflow.TaskReference'
    'name': 'Deformation 1.005'
    'task': '../upload/archive/mainfile/Fe-Mo/data/Fe_pv/bulk/A15/volume_relaxed/xc=PBE-PAW.E=450.dk=0.020/OUTCAR.1.005.gz#/workflow2'
  - 'm_def': 'nomad.datamodel.metainfo.workflow.TaskReference'
    'name': 'Deformation 1.010'
    'task': '../upload/archive/mainfile/Fe-Mo/data/Fe_pv/bulk/A15/volume_relaxed/xc=PBE-PAW.E=450.dk=0.020/OUTCAR.1.010.gz#/workflow2'
  - 'm_def': 'nomad.datamodel.metainfo.workflow.TaskReference'
    'name': 'Deformation 1.015'
    'task': '../upload/archive/mainfile/Fe-Mo/data/Fe_pv/bulk/A15/volume_relaxed/xc=PBE-PAW.E=450.dk=0.020/OUTCAR.1.015.gz#/workflow2'
  - 'm_def': 'nomad.datamodel.metainfo.workflow.TaskReference'
    'name': 'Deformation 1.020'
    'task': '../upload/archive/mainfile/Fe-Mo/data/Fe_pv/bulk/A15/volume_relaxed/xc=PBE-PAW.E=450.dk=0.020/OUTCAR.1.020.gz#/workflow2'
  - 'm_def': 'nomad.datamodel.metainfo.workflow.TaskReference'
    'name': 'Deformation 1.025'
    'task': '../upload/archive/mainfile/Fe-Mo/data/Fe_pv/bulk/A15/volume_relaxed/xc=PBE-PAW.E=450.dk=0.020/OUTCAR.1.025.gz#/workflow2'

@ladinesa
Copy link
Copy Markdown
Collaborator

ladinesa commented Jun 6, 2025

Sorry I also misunderstood this m_proxy thing, so maybe just do m_proxy_value = getattr(input_section, 'm_proxy_value', input_section.m_path())

@ladinesa
Copy link
Copy Markdown
Collaborator

ladinesa commented Jun 6, 2025

When you run test_eos_workflow, the inputs are created from the parsed archive in parse_trajectory by the parent class SimulationWorkflow and the input would really be run/system/0. And in your yaml file you specify it to be '../upload/archive/mainfile/Fe-Mo/data/Fe_pv/bulk/A15/relax/xc=PBE-PAW.E=450.dk=0.020/OUTCAR.gz#/run/0/system/-1. The first is an in-archive reference which I assume will resolve in-place while the second is a ref to a different archive so this would return a proxy,

@JFRudzinski
Copy link
Copy Markdown
Collaborator Author

Sorry I also misunderstood this m_proxy thing, so maybe just do m_proxy_value = getattr(input_section, 'm_proxy_value', input_section.m_path())

No worries. Yeah, so this is what I was trying, but m_path() only gives the relative archive path to archive root, but m_proxy_value gives the global path. Do you know how to get the global path from a (resolved in-place) section?

@ladinesa
Copy link
Copy Markdown
Collaborator

ladinesa commented Jun 7, 2025

Sorry I also misunderstood this m_proxy thing, so maybe just do m_proxy_value = getattr(input_section, 'm_proxy_value', input_section.m_path())

No worries. Yeah, so this is what I was trying, but m_path() only gives the relative archive path to archive root, but m_proxy_value gives the global path. Do you know how to get the global path from a (resolved in-place) section?

you can get the upload id and entry id and the mainfile from archive.metadat so essentially you can construct the global path with these info

@JFRudzinski
Copy link
Copy Markdown
Collaborator Author

Sorry I also misunderstood this m_proxy thing, so maybe just do m_proxy_value = getattr(input_section, 'm_proxy_value', input_section.m_path())

No worries. Yeah, so this is what I was trying, but m_path() only gives the relative archive path to archive root, but m_proxy_value gives the global path. Do you know how to get the global path from a (resolved in-place) section?

you can get the upload id and entry id and the mainfile from archive.metadat so essentially you can construct the global path with these info

That does make sense to me, but archive.m_root() gives EntryArchive(run, measurement, workflow, workflow2, results) (i.e., no metadata), and archive.m_context.resolve_archive(path) I believe requires the global archive path as input, right? Or is there a usage of the latter that I am not seeing?

@ladinesa
Copy link
Copy Markdown
Collaborator

ladinesa commented Jun 8, 2025

I am pretty sure that the archive that goes into the parse method has metadata, which is essentially the root for the non-proxy or in-place resolved sections. maybe do not use section.m_root() just archive if it is not a proxy

@JFRudzinski
Copy link
Copy Markdown
Collaborator Author

JFRudzinski commented Jun 8, 2025

I am pretty sure that the archive that goes into the parse method has metadata, which is essentially the root for the non-proxy or in-place resolved sections. maybe do not use section.m_root() just archive if it is not a proxy

I get the same thing by directly checking the archive that is the input to normalize(): archive: EntryArchive(run, measurement, workflow, workflow2, results), but keep in mind that this is coming from the test data in tests/data/ase/Cu.traj of the plugin. I am still a bit skeptical whether there is something strange with this example, but I don't have any other "non-yaml" way to test this normalizer.

@ladinesa
Copy link
Copy Markdown
Collaborator

ladinesa commented Jun 8, 2025

Ah now i understand sorry, im this case maybe just put a return if the full path cannot be constructed . this happens only for testing


if not self._calculations:
#! Causing test to fail
try:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ladinesa

I've adjusted the code now so that the test passes without the code within this try statement. When I include this check for single point workflows I get:

ERROR nomad.normalizing:metainfo.py:39 {"event": "could not normalize section", "exception": "Traceback (most recent call last):\n File "/home/jfrudzinski/work/soft/nomad-distro-dev-run-schema-2025-04/packages/nomad-FAIR/nomad/normalizing/metainfo.py", line 37, in normalize_section\n normalize(archive, logger)\n File "/home/jfrudzinski/work/soft/nomad-distro-dev-run-schema-2025-04/packages/nomad-FAIR/nomad/datamodel/datamodel.py", line 1233, in normalize\n if not archive.metadata.entry_type:\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\nAttributeError: 'NoneType' object has no attribute 'entry_type'", "normalizer": "MetainfoNormalizer", "section": "EntryArchive", "timestamp": "2025-06-09 16:37.36"}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought i already fixed this months ago. i put a returm if metadata is none maybe it was reverted accindentally i will fix it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants