|
| 1 | +# Testing Reckless Changes |
| 2 | + |
| 3 | +This guide explains the layers of testing used in Reckless development, |
| 4 | +what the `Bench:` value means, and how to set up local and OpenBench |
| 5 | +tests without relying on assumed chess-engine workflow knowledge. |
| 6 | + |
| 7 | +## Overview |
| 8 | + |
| 9 | +Reckless uses four different kinds of validation: |
| 10 | + |
| 11 | +1. Local correctness checks |
| 12 | +2. Local benchmarking |
| 13 | +3. CI smoke games |
| 14 | +4. OpenBench strength testing |
| 15 | + |
| 16 | +These layers answer different questions: |
| 17 | + |
| 18 | +- `cargo test`, `cargo fmt`, and `cargo clippy` answer "did I break the |
| 19 | + build or obvious correctness?" |
| 20 | +- `bench` answers "did I change the engine's search behavior or |
| 21 | + throughput on the standard bench positions?" |
| 22 | +- the CI `fastchess` smoke test answers "does the engine stay stable in |
| 23 | + a minimal game run?" |
| 24 | +- OpenBench answers "does this change improve strength?" |
| 25 | + |
| 26 | +Do not treat a single layer as a replacement for the others. |
| 27 | + |
| 28 | +## Local Correctness Checks |
| 29 | + |
| 30 | +Run the same checks that CI runs: |
| 31 | + |
| 32 | +```bash |
| 33 | +cargo test --verbose |
| 34 | +cargo fmt -- --check |
| 35 | +cargo clippy -- -D warnings |
| 36 | +``` |
| 37 | + |
| 38 | +CI also runs `cargo run --verbose -- bench`, so it is worth running |
| 39 | +`bench` locally before opening a PR. |
| 40 | + |
| 41 | +Relevant workflows: |
| 42 | + |
| 43 | +- [Reckless CI](../.github/workflows/rust.yml) |
| 44 | +- [Games](../.github/workflows/games.yml) |
| 45 | +- [PGO](../.github/workflows/pgo.yml) |
| 46 | + |
| 47 | +## What `bench` Does |
| 48 | + |
| 49 | +The built-in `bench` command searches a fixed set of positions from |
| 50 | +[`src/tools/bench.rs`](../src/tools/bench.rs) |
| 51 | +and prints: |
| 52 | + |
| 53 | +```text |
| 54 | +Bench: <nodes> nodes <nps> nps |
| 55 | +``` |
| 56 | + |
| 57 | +The important value for commit messages and OpenBench is the first |
| 58 | +number: |
| 59 | + |
| 60 | +- `Bench: <nodes>` |
| 61 | + |
| 62 | +That number is the total number of nodes searched over the built-in |
| 63 | +bench suite at the configured depth. In practice, contributors use it |
| 64 | +as a compact fingerprint for the engine's current search behavior. |
| 65 | + |
| 66 | +The second number: |
| 67 | + |
| 68 | +- `<nps>` |
| 69 | + |
| 70 | +is still useful, but it is not the canonical `Bench:` value used in |
| 71 | +commit messages or OpenBench forms. |
| 72 | + |
| 73 | +### Default Bench Settings |
| 74 | + |
| 75 | +From [`src/tools/bench.rs`](../src/tools/bench.rs): |
| 76 | + |
| 77 | +- hash: `16` |
| 78 | +- threads: `1` |
| 79 | +- depth: `12` |
| 80 | + |
| 81 | +So these commands are equivalent: |
| 82 | + |
| 83 | +```bash |
| 84 | +cargo run -- bench |
| 85 | +./target/release/reckless bench |
| 86 | +./target/release/reckless 'bench 16 1 12' |
| 87 | +``` |
| 88 | + |
| 89 | +The parameter meanings are: |
| 90 | + |
| 91 | +- first argument: transposition-table hash size in MB |
| 92 | +- second argument: number of search threads |
| 93 | +- third argument: search depth |
| 94 | + |
| 95 | +For example: |
| 96 | + |
| 97 | +```bash |
| 98 | +./target/release/reckless 'bench 16 1 12' |
| 99 | +``` |
| 100 | + |
| 101 | +means "run bench with `Hash=16`, `Threads=1`, `Depth=12`". |
| 102 | + |
| 103 | +## What to Put in the Commit Message |
| 104 | + |
| 105 | +When maintainers ask for `Bench: ...` in the commit message, they mean |
| 106 | +the full commit message or description should contain the node count |
| 107 | +from `bench`, for example: |
| 108 | + |
| 109 | +```text |
| 110 | +Bench: 3140512 |
| 111 | +``` |
| 112 | + |
| 113 | +For Reckless, OpenBench uses this to autofill the bench field for a |
| 114 | +test. |
| 115 | + |
| 116 | +The usual flow is: |
| 117 | + |
| 118 | +1. make the change |
| 119 | +2. run `bench` |
| 120 | +3. set the commit message to `Bench: <nodes>` |
| 121 | +4. push your branch |
| 122 | +5. submit OpenBench tests |
| 123 | +6. open the PR once the test passes, or update an already-open PR with |
| 124 | + the result |
| 125 | + |
| 126 | +If your change is intended to be non-functional, the bench node count |
| 127 | +should usually stay the same. If it changes, treat that as a sign that |
| 128 | +the patch changed engine behavior, even if the edit looked like a |
| 129 | +micro-optimization. |
| 130 | + |
| 131 | +## Architecture Caveat |
| 132 | + |
| 133 | +Bench values are not always identical across architectures. In |
| 134 | +practice, Apple Silicon and x86 can disagree on the `Bench:` node |
| 135 | +count, likely because of architecture-specific NNUE inference details. |
| 136 | + |
| 137 | +If your local `Bench:` value does not match what other contributors |
| 138 | +expect: |
| 139 | + |
| 140 | +1. run `bench` on `main` |
| 141 | +2. run `bench` on your branch |
| 142 | +3. ask in Discord or check recent Reckless OpenBench tests before |
| 143 | + submitting |
| 144 | + |
| 145 | +Do not assume your local Apple Silicon number is the number the |
| 146 | +Reckless OpenBench instance expects. |
| 147 | + |
| 148 | +## CI Smoke Games |
| 149 | + |
| 150 | +The repo's `Games` workflow uses `fastchess` as a minimal stability |
| 151 | +smoke test, not as final Elo proof. It checks for: |
| 152 | + |
| 153 | +- `illegal move` |
| 154 | +- `disconnect` |
| 155 | +- `stall` |
| 156 | + |
| 157 | +CI pins a specific `fastchess` revision in the |
| 158 | +[`Games` workflow](../.github/workflows/games.yml) to keep smoke-test |
| 159 | +infrastructure reproducible. |
| 160 | + |
| 161 | +Contributors do not generally rely on manual local `fastchess` runs as a |
| 162 | +normal part of the Reckless workflow. In practice, the common path is: |
| 163 | + |
| 164 | +1. local correctness checks |
| 165 | +2. `bench` |
| 166 | +3. OpenBench |
| 167 | + |
| 168 | +If you want a personal sanity check, a local `fastchess` run is fine, |
| 169 | +but treat it as optional and low-signal compared with OpenBench. |
| 170 | + |
| 171 | +## PGO Testing |
| 172 | + |
| 173 | +PGO stands for profile-guided optimization. Reckless uses it in CI and |
| 174 | +in release workflows: |
| 175 | + |
| 176 | +```bash |
| 177 | +cargo pgo instrument |
| 178 | +cargo pgo run -- bench |
| 179 | +cargo pgo optimize |
| 180 | +``` |
| 181 | + |
| 182 | +That process: |
| 183 | + |
| 184 | +1. builds an instrumented binary |
| 185 | +2. runs `bench` to collect profile data |
| 186 | +3. rebuilds using the recorded profile |
| 187 | + |
| 188 | +Small hot-path changes can disappear or reverse under PGO, so do not |
| 189 | +rely only on plain release builds for performance claims. |
| 190 | + |
| 191 | +If you want the exact repo-style optimized build: |
| 192 | + |
| 193 | +```bash |
| 194 | +make pgo |
| 195 | +``` |
| 196 | + |
| 197 | +## Project Style Note |
| 198 | + |
| 199 | +Reckless is a performance-focused chess engine. It does not strictly |
| 200 | +follow conservative Rust style guidelines in the way a general-purpose |
| 201 | +library might. |
| 202 | + |
| 203 | +In practice, that means: |
| 204 | + |
| 205 | +- low-level and performance-oriented code is normal here |
| 206 | +- `unsafe` or guideline-breaking patterns are not automatically a |
| 207 | + problem |
| 208 | +- the important question is whether the code is correct, measured, and |
| 209 | + justified for the engine |
| 210 | + |
| 211 | +When reviewing or proposing changes, optimize for correctness, |
| 212 | +performance evidence, and consistency with the existing codebase rather |
| 213 | +than generic Rust style advice alone. |
| 214 | + |
| 215 | +## OpenBench Basics |
| 216 | + |
| 217 | +OpenBench is the main strength-testing framework for Reckless. The |
| 218 | +upstream project describes it as a distributed framework for running |
| 219 | +fixed-game and SPRT engine tests: |
| 220 | + |
| 221 | +- <https://github.com/AndyGrant/OpenBench> |
| 222 | + |
| 223 | +Reckless uses its own OpenBench instance: |
| 224 | + |
| 225 | +- <https://recklesschess.space/> |
| 226 | + |
| 227 | +### Important OpenBench Fields |
| 228 | + |
| 229 | +For a normal branch-vs-main test, the key fields are: |
| 230 | + |
| 231 | +- `Dev Source`: the repository that contains your test branch |
| 232 | +- `Dev Sha`: the commit you want to test |
| 233 | +- `Dev Branch`: your test branch |
| 234 | +- `Dev Bench`: the `Bench:` node count for your dev build |
| 235 | +- `Base Source`: the repository that contains the base branch |
| 236 | +- `Base Sha`: the commit you want as the baseline |
| 237 | +- `Base Branch`: usually `main` |
| 238 | +- `Base Bench`: the `Bench:` node count for the base build |
| 239 | +- `Dev Options` and `Base Options`: UCI options passed to the engine |
| 240 | + during games |
| 241 | + |
| 242 | +### Which Repository to Use |
| 243 | + |
| 244 | +If your development branch only exists in your fork, use your fork as |
| 245 | +the source repository for both sides of the test. |
| 246 | + |
| 247 | +Example: |
| 248 | + |
| 249 | +- `Dev Source`: `https://github.com/<you>/Reckless` |
| 250 | +- `Dev Branch`: `your-branch` |
| 251 | +- `Base Source`: `https://github.com/<you>/Reckless` |
| 252 | +- `Base Branch`: `main` |
| 253 | + |
| 254 | +This works as long as your fork's `main` matches upstream `main`. |
| 255 | + |
| 256 | +Using the upstream repo for the base side and your fork for the dev side |
| 257 | +can be confusing if the instance expects both refs to come from the same |
| 258 | +source repository. If in doubt, copy a recent working Reckless test and |
| 259 | +only change the branch, SHA, and bench fields. |
| 260 | + |
| 261 | +### What the Bench Fields Mean in OpenBench |
| 262 | + |
| 263 | +The `Dev Bench` and `Base Bench` fields should contain the bench node |
| 264 | +counts, not the NPS. |
| 265 | + |
| 266 | +Example: |
| 267 | + |
| 268 | +- correct: `3140512` |
| 269 | +- wrong: `1133878` |
| 270 | + |
| 271 | +### What the Engine Options Mean |
| 272 | + |
| 273 | +OpenBench options such as: |
| 274 | + |
| 275 | +```text |
| 276 | +Threads=1 Hash=16 Minimal=true MoveOverhead=0 |
| 277 | +``` |
| 278 | + |
| 279 | +map to normal UCI engine options: |
| 280 | + |
| 281 | +- `Threads=1`: use one search thread |
| 282 | +- `Hash=16`: use a 16 MB transposition table |
| 283 | +- `Minimal=true`: reduce UCI output noise |
| 284 | +- `MoveOverhead=0`: reserve zero milliseconds per move for |
| 285 | + GUI/network overhead |
| 286 | + |
| 287 | +This `Hash=16` is the same concept as the first argument to the local |
| 288 | +`bench` command. |
| 289 | + |
| 290 | +### A Good Reckless Example |
| 291 | + |
| 292 | +This is a representative Reckless OpenBench test layout: |
| 293 | + |
| 294 | +- dev and base both use your fork as `Source` |
| 295 | +- dev branch points at your testing bookmark or branch |
| 296 | +- base branch points at `main` |
| 297 | +- both sides use the same network |
| 298 | +- both sides use `Threads=1 Hash=16 Minimal=true MoveOverhead=0` |
| 299 | + |
| 300 | +At the time this guide was written, a working example looked like: |
| 301 | + |
| 302 | +```text |
| 303 | +Dev Source https://github.com/joshka/Reckless |
| 304 | +Dev Branch joshka/optimize-quiet-move-scoring |
| 305 | +Dev Bench 2786596 |
| 306 | +Base Source https://github.com/joshka/Reckless |
| 307 | +Base Branch main |
| 308 | +Base Bench 2786596 |
| 309 | +Dev/Base Options Threads=1 Hash=16 Minimal=true MoveOverhead=0 |
| 310 | +``` |
| 311 | + |
| 312 | +Treat that as a template for field placement, not as a permanent |
| 313 | +universal config. Copy a recent passing Reckless test when possible. |
| 314 | + |
| 315 | +### Approval and Pending Tests |
| 316 | + |
| 317 | +Some OpenBench instances auto-approve tests. Reckless does not appear to |
| 318 | +do that for every registered user. |
| 319 | + |
| 320 | +If a test lands in a pending state, that usually means the instance |
| 321 | +requires an approver to accept it before workers will run it. |
| 322 | + |
| 323 | +## Recommended Reckless Workflow |
| 324 | + |
| 325 | +For a normal search or evaluation patch: |
| 326 | + |
| 327 | +1. make the change |
| 328 | +2. run `cargo test --verbose` |
| 329 | +3. run `cargo fmt -- --check` |
| 330 | +4. run `cargo clippy -- -D warnings` |
| 331 | +5. run `bench` |
| 332 | +6. set the commit message to `Bench: <nodes>` |
| 333 | +7. push the branch to your fork |
| 334 | +8. create an OpenBench test using your fork for both `Dev Source` and |
| 335 | + `Base Source` |
| 336 | +9. open the PR after the test passes, or update an existing PR with the |
| 337 | + result |
| 338 | + |
| 339 | +This ordering is intentional. In Reckless development, contributors |
| 340 | +often run OpenBench first and only open the PR after the test looks |
| 341 | +good. |
| 342 | + |
| 343 | +If the change is specifically about performance: |
| 344 | + |
| 345 | +1. compare release builds locally |
| 346 | +2. compare PGO builds locally |
| 347 | +3. only then rely on OpenBench to answer the Elo question |
| 348 | + |
| 349 | +## When to Ask for Help |
| 350 | + |
| 351 | +Ask in Discord before spending a lot of worker time if: |
| 352 | + |
| 353 | +- your local `Bench:` value differs from what maintainers expect |
| 354 | +- OpenBench cannot find your branch or SHA |
| 355 | +- you are not sure whether `Base Source` should point at upstream or |
| 356 | + your fork |
| 357 | +- you see a pending test and do not know whether it needs approval |
| 358 | +- your patch changes `Bench:` when you thought it was non-functional |
0 commit comments