Skip to content

Discrepancy in the number of problems with test cases #3

@cai-jianfeng

Description

@cai-jianfeng

Hi, thank you for releasing this great dataset!
I've been working with the dataset and noticed a discrepancy regarding the number of problems that contain test cases. According to the paper, approximately 32.5k problems include test cases. However, when I processed the dataset, I found the following:

  • Total number of problems: 47,136
  • Problems with test cases (my count): 26,955
  • Problems with test cases (as stated in the paper): ~32,500

There is a gap of roughly 5.5k between my count and the number reported in the paper. I wanted to confirm whether:

  1. I might be using an incorrect method to identify problems with test cases, or
  2. There is a specific version/subset of the dataset I should be using, or
  3. The definition of "having test cases" differs from what I assumed.

Could you please clarify how the 32.5k figure was calculated? Any guidance on the correct way to filter for problems with test cases would be greatly appreciated.
Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions