Discrepancy in the number of problems with test cases

Hi, thank you for releasing this great dataset!
I've been working with the dataset and noticed a discrepancy regarding the number of problems that contain test cases. According to the paper, approximately 32.5k problems include test cases. However, when I processed the dataset, I found the following:

- **Total number of problems: 47,136**
- **Problems with test cases (my count): 26,955**
- **Problems with test cases (as stated in the paper): ~32,500**

There is a gap of roughly 5.5k between my count and the number reported in the paper. I wanted to confirm whether:

1. I might be using an incorrect method to identify problems with test cases, or
2. There is a specific version/subset of the dataset I should be using, or
3. The definition of "having test cases" differs from what I assumed.

Could you please clarify how the 32.5k figure was calculated? Any guidance on the correct way to filter for problems with test cases would be greatly appreciated.
Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Discrepancy in the number of problems with test cases #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Discrepancy in the number of problems with test cases #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions