LCB official scorer instead of skythoughts by slimfrkha · Pull Request #130 · mlfoundations/evalchemy

slimfrkha · 2025-06-13T10:30:44Z

Issue

Current code cannot reproduce official LCB scores from open source models.
Example: qwen3-14b thinking mode on LCB v5.

Reported in paper (temperature=0.6, top_p=0.95, top-k=20, max_new_tokens=32768): 63.5
current code (temperature=0.6, top_p=0.95, max_new_tokens=32768, n_repeat=2): 56.15

Score is trailing 7 pts behind. This can't be explained by seeding / randomness of generations.

Problem

Prompt is not formatted the same as LCB paper / official git repo
current code evaluator (originally from novaSky/Skythoughts) is different from official LCB github repo

Solution

Apply changes from official LCB github repo for prompt formatting and Evaluator.

Results

results after fix (temperature=0.6, top_p=0.95, max_new_tokens=32768, n_repeat=2): 61.00

This is more inline with official results. Difference in score is small and is related probably either to seeding or different n_repeat value.

… LCB scorer

slimfrkha · 2025-08-25T13:34:08Z

Hi @neginraoof,
Could you take a quick look at this PR when you get a chance? Just need your feedback to decide whether to keep it open or close it. Thanks!

fix(livecodebench): replace nova sky skythoughts scorer with official…

4d81d91

… LCB scorer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LCB official scorer instead of skythoughts#130

LCB official scorer instead of skythoughts#130
slimfrkha wants to merge 1 commit into
mlfoundations:mainfrom
slimfrkha:fix/lcb-scorer

slimfrkha commented Jun 13, 2025 •

edited

Loading

Uh oh!

slimfrkha commented Aug 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

slimfrkha commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Problem

Solution

Results

Uh oh!

slimfrkha commented Aug 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

slimfrkha commented Jun 13, 2025 •

edited

Loading