Key mismatch in gpt4_eval.py

Hi, thank you for great work on LLM evaluation.
I'm very impressed by your work, because there are lack of evaluation metrics for ChatBot. 
I want to use this framework to evaluate my model, but there are some issues while I'm following the steps introduced in README.

The main issue is the key error in ```gpt4_eval.py```

In ```evaluation_set/flask_evaluation.jsonl```, there are no keys in *metrics/text/question_id*, so it occurs error. 
I think they have been replaced from *skill/instruction/idx*, but I wonder if changing just these things will make it work without any problems.

Additionally, I'm wondering if it's possible to know how much it typically costs to evaluate the FLASK 1700 sample using the GPT-4 API? Since there was a significant cost for the 80 samples from MT-bench, having this information in advance would be greatly helpful.

If it's not too much trouble, a prompt response would be greatly appreciated. Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Key mismatch in gpt4_eval.py #4

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Key mismatch in gpt4_eval.py #4

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions