Skip to content

Some questions #3

@ljjcoder

Description

@ljjcoder

Thank you for open sourcing your excellent work. I had some difficulty following the steps to run the code.

  1. Seed dataset preparation

This step should be to organize the mm-K12 dataset into the following form:
[
{
"id": "unique identifier for the problem",
"question": "problem statement",
"correct_answer": "ground-truth final answer for evaluation and verification",
"image_path": "/path/to/image.png"
},
...
]
However, after downloading the mm-K12 dataset and parsing it, I only found 10,000 training data points and no 500 test data points. Where is the test data? Will its absence affect subsequent processes?

  1. Will its absence affect subsequent processes?
    When executing this step, there is a parameter that requires us to set MODEL="/path/to/model". Which model should we use? The default value in the code is OpenGVLab/InternVL2_5-8B. Should I use it? Also, what is the relationship between qwen2.5 mentioned in the previous step (API endpoint setup (Optional)) and OpenGVLab/InternVL2_5-8B here? Which subsequent step requires qwen2.5?

This is my first time to learn about this field and I look forward to your response.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions