-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Description
I am looking into this block of eval_omni.py, the if not k.endswith("full") filtering looks causing problem: if a category e.g. action type or input_text totally missing (predicted as null or missing the field at all), the full_step / full_type won't add corresponding count, then the calculated hit rate will be higher than actual correct ones.
for key in [k for k in score_dict.keys() if not k.endswith("full")]:
if key.endswith("grounding"):
full_step_hit+=score_dict[key]
full_step+=score_dict[key+'_full']
full_gr_hit+=score_dict[key]
full_gr+=score_dict[key+'_full']
elif key.endswith("text"):
full_step_hit+=score_dict[key]
full_step+=score_dict[key+'_full']
else:
full_type_hit+=score_dict[key]
full_type+=score_dict[key+'_full']
logger.info(f"Type {key} Length {score_dict[key+'_full']} : {(score_dict[key] / score_dict[key+'_full'])}")
Metadata
Metadata
Assignees
Labels
No labels