Skip to content

Refine mean/median sequence length output#205

Open
ewels wants to merge 1 commit into
s-andrews:masterfrom
ewels:mean-median-length-refinements
Open

Refine mean/median sequence length output#205
ewels wants to merge 1 commit into
s-andrews:masterfrom
ewels:mean-median-length-refinements

Conversation

@ewels

@ewels ewels commented May 29, 2026

Copy link
Copy Markdown
Contributor

Builds on #203 / 54b336e (mean & median sequence length). Two small refinements plus the test snapshots that commit didn't include.

Changes

1. Mean Length → 1 decimal place (BasicStats.java)

Currently the mean is integer division (totalBases/actualCount), so a 150.7 bp mean prints as 150. This reports it to 1 d.p. instead:

if (actualCount == 0) return "0.0";
return String.format(Locale.ROOT, "%.1f", (double)totalBases/actualCount);

(Locale.ROOT keeps the decimal separator as . regardless of system locale; the zero guard avoids an integer divide-by-zero on empty input.)

2. Median tie-breaking (SequenceLengthDistribution.medianLength())

For an even number of reads, the current code returns the upper of the two central values. This uses the standard definition — the mean of the two central values, rounded up:

  • Odd N: the single central value.
  • Even N: (lo + hi + 1) / 2.

Also added a total == 0 guard (returns 0) so an empty distribution doesn't throw. Identical to current behaviour for uniform-length (short-read) data; differs only on even read counts with two distinct central lengths.

3. Test snapshots (FileContentsTest_{minimal,complex}_fastqc_*)

Updated the approved data + HTML files for the new Mean Length / Median Length rows. The HTML snapshots are edited in place so the embedded base64 chart images are left untouched (rendering differs per OS).

Notes

  • Labels (Mean Length / Median Length) and filtered-sequence handling are left exactly as in 54b336e.
  • Heads-up: the approved fastqc_data.txt files still carry ##FastQC 0.12.2.devel while the build now emits 0.13.0.devel — a pre-existing version-string mismatch in the snapshots, independent of this change. Left as-is to keep this PR scoped; happy to bump it if you'd prefer.

🤖 Generated with Claude Code

Builds on the mean/median sequence length feature (s-andrews#203):

- Report Mean Length to 1 decimal place rather than truncating to an
  integer, so e.g. a 150.7 bp mean is no longer shown as 150.
- Calculate the median with standard tie-breaking: for an even number of
  reads, average the two central values and round up, rather than taking
  only the upper-middle value.
- Update the FileContentsTest approved snapshots (data + HTML) for the new
  rows. HTML snapshots are edited in place so the embedded chart images
  are left untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread uk/ac/babraham/FastQC/Modules/BasicStats.java Outdated
@ewels ewels force-pushed the mean-median-length-refinements branch from 4b9a339 to c38a976 Compare May 30, 2026 00:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant