Ynankani/update windows benchmark md #762

ynankani · 2026-01-12T08:34:49Z

What does this PR do?

Type of change: ? documentation

Overview: Md update to add perplexity and kl divergence benchmark info.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: NA
Did you write any new necessary tests?: NA
Did you add or update any necessary documentation?: Yes
Did you update Changelog?: NA

Summary by CodeRabbit

Documentation
- Expanded accuracy comparison section with three detailed benchmark metrics: MMLU scores, Perplexity (PPL), and KL-divergence.
- Added comprehensive tables showing results across models and quantization configurations.
- Included evaluation guides and references for each metric.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

… info Signed-off-by: unknown <ynankani@nvidia.com>

coderabbitai · 2026-01-12T08:34:59Z

📝 Walkthrough

Walkthrough

Documentation expanded with three new metric subsections (MMLU Scores, Perplexity, KL-divergence) under Accuracy Comparison, including explanatory text, comparison tables, and evaluation guide links.

Changes

Cohort / File(s)	Summary
Documentation Updates `examples/windows/Benchmark.md`	Added subsections 1.2.1 (MMLU Scores), 1.2.2 (Perplexity with explanation and baseline/quantization details), and 1.2.3 (KL-divergence with references and results table). Includes external evaluation guide links.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title uses a vague branch-like format ('Ynankani/update windows benchmark md') that obscures the specific change rather than describing the main content addition.	Revise the title to clearly describe the primary changes, such as 'Add Perplexity and KL-divergence benchmark metrics to Windows documentation' to better convey the specific improvements made.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

examples/windows/Benchmark.md (1)

58-58: Consider adding a note about different test configurations.

The Perplexity and KL-divergence sections use RTX 5090 with v0.39.0, while earlier sections use RTX 4090 with v0.19.0. While the configurations are clearly stated in each section, consider adding a brief note explaining that different benchmarks were run at different times with different hardware/software versions to help readers understand why the configurations differ.

Also applies to: 78-78

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5104513 and 42ea13a.

📒 Files selected for processing (1)

examples/windows/Benchmark.md

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: linux
GitHub Check: wait-checks / wait
GitHub Check: code-quality
GitHub Check: build-docs

🔇 Additional comments (3)

examples/windows/Benchmark.md (3)

27-28: LGTM! Good organizational improvement.

Adding the explicit subsection numbering (1.2.1) improves document structure and makes it consistent with the new sections that follow.

45-68: All verification checks pass: the external documentation links are accessible and the internal evaluation guide path exists at examples/windows/accuracy_benchmark/perplexity_metrics/README.md.

70-92: All references verified—no issues found.

The KL-divergence section is correct: the internal evaluation guide path (./accuracy_benchmark/kl_divergence_metrics/README.md) exists, and both external documentation links are accessible.

codecov · 2026-01-12T08:47:34Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.62%. Comparing base (5104513) to head (42ea13a).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #762   +/-   ##
=======================================
  Coverage   74.62%   74.62%           
=======================================
  Files         192      192           
  Lines       18989    18989           
=======================================
  Hits        14171    14171           
  Misses       4818     4818

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ynankani added 2 commits January 12, 2026 13:55

[Benchmark] Update md file for Perplexity and Kl divergence benchmark…

617b49e

… info Signed-off-by: unknown <ynankani@nvidia.com>

[Benchmark] Update md file for Perplexity and Kl divergence benchmark…

42ea13a

… info Signed-off-by: unknown <ynankani@nvidia.com>

ynankani requested a review from a team as a code owner January 12, 2026 08:34

ynankani requested a review from vishalpandya1990 January 12, 2026 08:34

coderabbitai bot reviewed Jan 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ynankani/update windows benchmark md #762

Ynankani/update windows benchmark md #762

Uh oh!

ynankani commented Jan 12, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 12, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

codecov bot commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ynankani/update windows benchmark md #762

Are you sure you want to change the base?

Ynankani/update windows benchmark md #762

Uh oh!

Conversation

ynankani commented Jan 12, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before your PR is "Ready for review"

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jan 12, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ynankani commented Jan 12, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 12, 2026 •

edited

Loading