Skip to content

Mdwarf contin vac#208

Open
imedan wants to merge 5 commits intomainfrom
mdwarf_contin_vac
Open

Mdwarf contin vac#208
imedan wants to merge 5 commits intomainfrom
mdwarf_contin_vac

Conversation

@imedan
Copy link

@imedan imedan commented Mar 18, 2026

This is for the data model files for the DR20 VAC: https://soji.sdss.utah.edu/collaboration/announcements/vacs/browse/select/sdss-5.20. Will also tag @zachway here as he in one of the other main authors of this VAC. There are three commits to the branch summarized below.

1f65e1e includes the new data model files for the VAC. As a note, dr20.cfg is not available yet in tree, so this was setup with DR19. Can do a replace easily with DR20 if that makes sense?

d78e81d deals with a bug I found in creating the YAML from HDF5 files. Specifically, I found that h5obj.size already returns an int, so this have an error when trying to do .item(). I am unsure if this is a difference in h5py or not.

376d161 deals with a string replacement bug. My file name included md (e.g. /uufs/chpc.utah.edu/common/home/u6033276/.modulefiles/miniconda/datamodel/datamodel/products/yaml/mdwarf-contin.yaml). So, when trying to make the markdown path, it would look for a yaml file that looked like /uufs/chpc.utah.edu/common/home/u6033276/.modulefiles/miniconda/datamodel/datamodel/products/yaml/yamlwarf-contin.yaml. I made the string replacement more explicit to only replace the directory and .yaml.

Would be good to review the above and see what should be kept or not.

@joelbrownstein
Copy link
Contributor

Hi @imedan, thanks for adding this datamodel, and for addressing the bugs that appeared for this case. Would you confirm that the datamodel renders as you'd expect in the DSI, here: https://sas.sdss.org/dsi/file/select-mdwarf-contin/mdwarf_contin_vac ?

I request 2 changes prior to the merge in order that the VAC datamodel be geared toward the data release.

First would you please do a global search and replace of DR19 -> DR20?

Second, VAC_ROOT should be replaced with the most closely related, existing environment variable: MWM_MDWARF, so I can include the VAC in a subfolder in the dr20 volume. So would you do a search and replace of:

$VAC_ROOT/mwm/mdwarf-contin/mdwarf-contin-standardization-[V_VAC].h5
to something like
$MWM_MDWARF/mdwarf-contin/mdwarf-contin-standardization-[V_VAC].h5

(Not needed for this PR, but is the h5 file ready on the SAS so I can stage it in DR20?)

@joelbrownstein joelbrownstein self-assigned this Mar 19, 2026
@imedan
Copy link
Author

imedan commented Mar 19, 2026

Yes, the render looks good to me!

I just pushed the two changes you requested. Do things look correct to you?

For the actual file, no. I just tried to copy the file:

cp /uufs/chpc.utah.edu/common/home/sdss/sdsswork/users/u6033276/mdwarf_contin_VAC/mdwarf_standardization_dr20_VAC.h5 /uufs/chpc.utah.edu/common/home/sdss52/dr20/vac/mwm/m-dwarf/mdwarf-contin/mdwarf-contin-standardization-1.0.0.h5

and I got a permission denied error.

@joelbrownstein
Copy link
Contributor

Thanks @imedan, this looks good to me.

I ran h5check and the h5 file validated, created the checksum file, and mirrored the VAC to JHU for sciserver compute access via the Ceph-fs disk (useful if a tutorial jupyter notebook is planned). I updated your VAC announcement with the datamodel and SAS path info, and updated Anne-Marie's VAC spreadsheet.

I also symlinked the dr20 version into IPL-4, and if you need to make changes to the file prior to its release, you can use /uufs/chpc.utah.edu/common/home/sdss52/ipl-4/vac/mwm/m-dwarf/mdwarf-contin to stage new versions (u6033276 added to the FACL write permissions).

Copy link
Contributor

@havok2063 havok2063 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a new VAC for DR20, how was this first generated for DR19? If this was released in both, there should be an entry for both DR19 and DR20. If this was new for DR20, the datamodel should be generated using the WORK release, then migrated to DR20.

Also, I don't recommend the find+replace method, as there is DR logic used to populate fields and a DR is not simply an entry in the datamodel. This could introduce inconsistencies. If you must do it, you should do it once in the yaml, check all the sub-fields are correct for the new DR, and re-generate the other files automatically. This will ensure those files are correct everywhere.

Comment on lines +225 to 228
base = os.path.splitext(self.output)[0]
yaml_dir = os.path.dirname(base).replace(f'/{self.format}', '/yaml')
cached_file = os.path.join(yaml_dir, os.path.basename(base) + '.yaml')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what these changes are for? it looks like this tries to replace both the suffix and the path part, but the original code already does that.

Comment on lines 91 to 236
@@ -232,7 +232,7 @@ def design_content(self, name: str = '/', description: str = None, hdftype: str
member['attrs'] = self._design_attrs(attrs) or []
if hdftype == 'dataset':
member['shape'] = ds_shape or (10,)
member['size'] = np.prod(member['shape']).item()
member['size'] = np.prod(member['shape'])
member['dtype'] = ds_dtype or 'S10'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these .items were necessary to dump the hdf5 variable into a valid format when serializing into yaml. Any particular reason these got changed? There are tests that are now failing, see https://github.com/sdss/datamodel/actions/runs/23303538048. This change may be the cause. Can you investigate?

design: false
vac: false
recommended_science_product: false
changelog:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a VAC and recommended for science, these should be set to true, and the datamodels re-validated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants