Some questions

Thank you for open sourcing your excellent work. I had some difficulty following the steps to run the code.
1. Seed dataset preparation

This step should be to organize the mm-K12 dataset into the following form:
[
    {
        "id": "unique identifier for the problem",
        "question": "problem statement",
        "correct_answer": "ground-truth final answer for evaluation and verification",
        "image_path": "/path/to/image.png"
    },
    ...
]
However, after downloading the mm-K12 dataset and parsing it, I only found 10,000 training data points and no 500 test data points. Where is the test data? Will its absence affect subsequent processes?

2.  Will its absence affect subsequent processes?
When executing this step, there is a parameter that requires us to set MODEL="/path/to/model". Which model should we use? The default value in the code is OpenGVLab/InternVL2_5-8B. Should I use it? Also, what is the relationship between qwen2.5 mentioned in the previous step (API endpoint setup (Optional)) and OpenGVLab/InternVL2_5-8B here? Which subsequent step requires qwen2.5?

This is my first time to learn about this field and I look forward to your response.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some questions #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Some questions #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions