Skip to content

Commit 485ca28

Browse files
committed
Document Reckless testing workflow
1 parent 01b3fe0 commit 485ca28

File tree

3 files changed

+365
-0
lines changed

3 files changed

+365
-0
lines changed

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,10 @@ Reckless is an open source competitive chess engine, consistently performing amo
4343

4444
## Getting started
4545

46+
Additional contributor docs:
47+
48+
- [Testing Reckless Changes](docs/testing.md)
49+
4650
### Precompiled binaries
4751

4852
You can download precompiled builds from the [GitHub Releases page](https://github.com/codedeliveryservice/Reckless/releases).

docs/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Docs
2+
3+
- [Testing Reckless Changes](testing.md)

docs/testing.md

Lines changed: 358 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,358 @@
1+
# Testing Reckless Changes
2+
3+
This guide explains the layers of testing used in Reckless development,
4+
what the `Bench:` value means, and how to set up local and OpenBench
5+
tests without relying on assumed chess-engine workflow knowledge.
6+
7+
## Overview
8+
9+
Reckless uses four different kinds of validation:
10+
11+
1. Local correctness checks
12+
2. Local benchmarking
13+
3. CI smoke games
14+
4. OpenBench strength testing
15+
16+
These layers answer different questions:
17+
18+
- `cargo test`, `cargo fmt`, and `cargo clippy` answer "did I break the
19+
build or obvious correctness?"
20+
- `bench` answers "did I change the engine's search behavior or
21+
throughput on the standard bench positions?"
22+
- the CI `fastchess` smoke test answers "does the engine stay stable in
23+
a minimal game run?"
24+
- OpenBench answers "does this change improve strength?"
25+
26+
Do not treat a single layer as a replacement for the others.
27+
28+
## Local Correctness Checks
29+
30+
Run the same checks that CI runs:
31+
32+
```bash
33+
cargo test --verbose
34+
cargo fmt -- --check
35+
cargo clippy -- -D warnings
36+
```
37+
38+
CI also runs `cargo run --verbose -- bench`, so it is worth running
39+
`bench` locally before opening a PR.
40+
41+
Relevant workflows:
42+
43+
- [Reckless CI](../.github/workflows/rust.yml)
44+
- [Games](../.github/workflows/games.yml)
45+
- [PGO](../.github/workflows/pgo.yml)
46+
47+
## What `bench` Does
48+
49+
The built-in `bench` command searches a fixed set of positions from
50+
[`src/tools/bench.rs`](../src/tools/bench.rs)
51+
and prints:
52+
53+
```text
54+
Bench: <nodes> nodes <nps> nps
55+
```
56+
57+
The important value for commit messages and OpenBench is the first
58+
number:
59+
60+
- `Bench: <nodes>`
61+
62+
That number is the total number of nodes searched over the built-in
63+
bench suite at the configured depth. In practice, contributors use it
64+
as a compact fingerprint for the engine's current search behavior.
65+
66+
The second number:
67+
68+
- `<nps>`
69+
70+
is still useful, but it is not the canonical `Bench:` value used in
71+
commit messages or OpenBench forms.
72+
73+
### Default Bench Settings
74+
75+
From [`src/tools/bench.rs`](../src/tools/bench.rs):
76+
77+
- hash: `16`
78+
- threads: `1`
79+
- depth: `12`
80+
81+
So these commands are equivalent:
82+
83+
```bash
84+
cargo run -- bench
85+
./target/release/reckless bench
86+
./target/release/reckless 'bench 16 1 12'
87+
```
88+
89+
The parameter meanings are:
90+
91+
- first argument: transposition-table hash size in MB
92+
- second argument: number of search threads
93+
- third argument: search depth
94+
95+
For example:
96+
97+
```bash
98+
./target/release/reckless 'bench 16 1 12'
99+
```
100+
101+
means "run bench with `Hash=16`, `Threads=1`, `Depth=12`".
102+
103+
## What to Put in the Commit Message
104+
105+
When maintainers ask for `Bench: ...` in the commit message, they mean
106+
the full commit message or description should contain the node count
107+
from `bench`, for example:
108+
109+
```text
110+
Bench: 3140512
111+
```
112+
113+
For Reckless, OpenBench uses this to autofill the bench field for a
114+
test.
115+
116+
The usual flow is:
117+
118+
1. make the change
119+
2. run `bench`
120+
3. set the commit message to `Bench: <nodes>`
121+
4. push your branch
122+
5. submit OpenBench tests
123+
6. open the PR once the test passes, or update an already-open PR with
124+
the result
125+
126+
If your change is intended to be non-functional, the bench node count
127+
should usually stay the same. If it changes, treat that as a sign that
128+
the patch changed engine behavior, even if the edit looked like a
129+
micro-optimization.
130+
131+
## Architecture Caveat
132+
133+
Bench values are not always identical across architectures. In
134+
practice, Apple Silicon and x86 can disagree on the `Bench:` node
135+
count, likely because of architecture-specific NNUE inference details.
136+
137+
If your local `Bench:` value does not match what other contributors
138+
expect:
139+
140+
1. run `bench` on `main`
141+
2. run `bench` on your branch
142+
3. ask in Discord or check recent Reckless OpenBench tests before
143+
submitting
144+
145+
Do not assume your local Apple Silicon number is the number the
146+
Reckless OpenBench instance expects.
147+
148+
## CI Smoke Games
149+
150+
The repo's `Games` workflow uses `fastchess` as a minimal stability
151+
smoke test, not as final Elo proof. It checks for:
152+
153+
- `illegal move`
154+
- `disconnect`
155+
- `stall`
156+
157+
CI pins a specific `fastchess` revision in the
158+
[`Games` workflow](../.github/workflows/games.yml) to keep smoke-test
159+
infrastructure reproducible.
160+
161+
Contributors do not generally rely on manual local `fastchess` runs as a
162+
normal part of the Reckless workflow. In practice, the common path is:
163+
164+
1. local correctness checks
165+
2. `bench`
166+
3. OpenBench
167+
168+
If you want a personal sanity check, a local `fastchess` run is fine,
169+
but treat it as optional and low-signal compared with OpenBench.
170+
171+
## PGO Testing
172+
173+
PGO stands for profile-guided optimization. Reckless uses it in CI and
174+
in release workflows:
175+
176+
```bash
177+
cargo pgo instrument
178+
cargo pgo run -- bench
179+
cargo pgo optimize
180+
```
181+
182+
That process:
183+
184+
1. builds an instrumented binary
185+
2. runs `bench` to collect profile data
186+
3. rebuilds using the recorded profile
187+
188+
Small hot-path changes can disappear or reverse under PGO, so do not
189+
rely only on plain release builds for performance claims.
190+
191+
If you want the exact repo-style optimized build:
192+
193+
```bash
194+
make pgo
195+
```
196+
197+
## Project Style Note
198+
199+
Reckless is a performance-focused chess engine. It does not strictly
200+
follow conservative Rust style guidelines in the way a general-purpose
201+
library might.
202+
203+
In practice, that means:
204+
205+
- low-level and performance-oriented code is normal here
206+
- `unsafe` or guideline-breaking patterns are not automatically a
207+
problem
208+
- the important question is whether the code is correct, measured, and
209+
justified for the engine
210+
211+
When reviewing or proposing changes, optimize for correctness,
212+
performance evidence, and consistency with the existing codebase rather
213+
than generic Rust style advice alone.
214+
215+
## OpenBench Basics
216+
217+
OpenBench is the main strength-testing framework for Reckless. The
218+
upstream project describes it as a distributed framework for running
219+
fixed-game and SPRT engine tests:
220+
221+
- <https://github.com/AndyGrant/OpenBench>
222+
223+
Reckless uses its own OpenBench instance:
224+
225+
- <https://recklesschess.space/>
226+
227+
### Important OpenBench Fields
228+
229+
For a normal branch-vs-main test, the key fields are:
230+
231+
- `Dev Source`: the repository that contains your test branch
232+
- `Dev Sha`: the commit you want to test
233+
- `Dev Branch`: your test branch
234+
- `Dev Bench`: the `Bench:` node count for your dev build
235+
- `Base Source`: the repository that contains the base branch
236+
- `Base Sha`: the commit you want as the baseline
237+
- `Base Branch`: usually `main`
238+
- `Base Bench`: the `Bench:` node count for the base build
239+
- `Dev Options` and `Base Options`: UCI options passed to the engine
240+
during games
241+
242+
### Which Repository to Use
243+
244+
If your development branch only exists in your fork, use your fork as
245+
the source repository for both sides of the test.
246+
247+
Example:
248+
249+
- `Dev Source`: `https://github.com/<you>/Reckless`
250+
- `Dev Branch`: `your-branch`
251+
- `Base Source`: `https://github.com/<you>/Reckless`
252+
- `Base Branch`: `main`
253+
254+
This works as long as your fork's `main` matches upstream `main`.
255+
256+
Using the upstream repo for the base side and your fork for the dev side
257+
can be confusing if the instance expects both refs to come from the same
258+
source repository. If in doubt, copy a recent working Reckless test and
259+
only change the branch, SHA, and bench fields.
260+
261+
### What the Bench Fields Mean in OpenBench
262+
263+
The `Dev Bench` and `Base Bench` fields should contain the bench node
264+
counts, not the NPS.
265+
266+
Example:
267+
268+
- correct: `3140512`
269+
- wrong: `1133878`
270+
271+
### What the Engine Options Mean
272+
273+
OpenBench options such as:
274+
275+
```text
276+
Threads=1 Hash=16 Minimal=true MoveOverhead=0
277+
```
278+
279+
map to normal UCI engine options:
280+
281+
- `Threads=1`: use one search thread
282+
- `Hash=16`: use a 16 MB transposition table
283+
- `Minimal=true`: reduce UCI output noise
284+
- `MoveOverhead=0`: reserve zero milliseconds per move for
285+
GUI/network overhead
286+
287+
This `Hash=16` is the same concept as the first argument to the local
288+
`bench` command.
289+
290+
### A Good Reckless Example
291+
292+
This is a representative Reckless OpenBench test layout:
293+
294+
- dev and base both use your fork as `Source`
295+
- dev branch points at your testing bookmark or branch
296+
- base branch points at `main`
297+
- both sides use the same network
298+
- both sides use `Threads=1 Hash=16 Minimal=true MoveOverhead=0`
299+
300+
At the time this guide was written, a working example looked like:
301+
302+
```text
303+
Dev Source https://github.com/joshka/Reckless
304+
Dev Branch joshka/optimize-quiet-move-scoring
305+
Dev Bench 2786596
306+
Base Source https://github.com/joshka/Reckless
307+
Base Branch main
308+
Base Bench 2786596
309+
Dev/Base Options Threads=1 Hash=16 Minimal=true MoveOverhead=0
310+
```
311+
312+
Treat that as a template for field placement, not as a permanent
313+
universal config. Copy a recent passing Reckless test when possible.
314+
315+
### Approval and Pending Tests
316+
317+
Some OpenBench instances auto-approve tests. Reckless does not appear to
318+
do that for every registered user.
319+
320+
If a test lands in a pending state, that usually means the instance
321+
requires an approver to accept it before workers will run it.
322+
323+
## Recommended Reckless Workflow
324+
325+
For a normal search or evaluation patch:
326+
327+
1. make the change
328+
2. run `cargo test --verbose`
329+
3. run `cargo fmt -- --check`
330+
4. run `cargo clippy -- -D warnings`
331+
5. run `bench`
332+
6. set the commit message to `Bench: <nodes>`
333+
7. push the branch to your fork
334+
8. create an OpenBench test using your fork for both `Dev Source` and
335+
`Base Source`
336+
9. open the PR after the test passes, or update an existing PR with the
337+
result
338+
339+
This ordering is intentional. In Reckless development, contributors
340+
often run OpenBench first and only open the PR after the test looks
341+
good.
342+
343+
If the change is specifically about performance:
344+
345+
1. compare release builds locally
346+
2. compare PGO builds locally
347+
3. only then rely on OpenBench to answer the Elo question
348+
349+
## When to Ask for Help
350+
351+
Ask in Discord before spending a lot of worker time if:
352+
353+
- your local `Bench:` value differs from what maintainers expect
354+
- OpenBench cannot find your branch or SHA
355+
- you are not sure whether `Base Source` should point at upstream or
356+
your fork
357+
- you see a pending test and do not know whether it needs approval
358+
- your patch changes `Bench:` when you thought it was non-functional

0 commit comments

Comments
 (0)