-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
Thank you for open sourcing your excellent work. I had some difficulty following the steps to run the code.
- Seed dataset preparation
This step should be to organize the mm-K12 dataset into the following form:
[
{
"id": "unique identifier for the problem",
"question": "problem statement",
"correct_answer": "ground-truth final answer for evaluation and verification",
"image_path": "/path/to/image.png"
},
...
]
However, after downloading the mm-K12 dataset and parsing it, I only found 10,000 training data points and no 500 test data points. Where is the test data? Will its absence affect subsequent processes?
- Will its absence affect subsequent processes?
When executing this step, there is a parameter that requires us to set MODEL="/path/to/model". Which model should we use? The default value in the code is OpenGVLab/InternVL2_5-8B. Should I use it? Also, what is the relationship between qwen2.5 mentioned in the previous step (API endpoint setup (Optional)) and OpenGVLab/InternVL2_5-8B here? Which subsequent step requires qwen2.5?
This is my first time to learn about this field and I look forward to your response.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels