Skip to content

tests: filter out outliers in performance tests#1788

Open
peaBerberian wants to merge 1 commit intodevfrom
perf-tests-improv
Open

tests: filter out outliers in performance tests#1788
peaBerberian wants to merge 1 commit intodevfrom
perf-tests-improv

Conversation

@peaBerberian
Copy link
Copy Markdown
Collaborator

For multiple years now, we run performance tests on each PR - to detect performance regressions on some key scenarios (load, seek, track switching).

It should be able to catch true large regressions but it bothers me that sometimes it seems to detect with a high confidence a very minor regression in the "cold loading multithread" scenario.

This one could particularly be sensitive to ordering / optimizations made by the browsers' cache.

So I'm here trying to experiment with some strategies to limit the possibility of having some kind of bias in our performance tests:

  • I do more test iterations. We previously hit what seems to be a limitation in the CI when running the browser 128 times. I want to check if it's still the case as it's limiting.

  • I remove the 10% outliers of all samples, both for the previous state and the current state. It may be enough to remove the difference for our cold-loading test.

  • I added a function trying to detect ordering bias

For multiple years now, we run performance tests - to detect performance
regressions on some key scenarios (load, seek, track switching).

It should be able to catch true large regressions but it bothers me that
sometimes it seems to detect with a high confidence a very minor
regression in the "cold loading multithread" scenario.

This one could particularly be sensitive to ordering / optimizations
made by the browsers' cache.

So I'm here trying to experiment with some strategies to limit the
possibility of having some kind of bias in our performance tests:

- I do more test iterations. We previously hit what seems to be a
  limitation in the CI when running the browser 128 times. I want to
  check if it's still the case as it's limiting.

- I remove the 10% outliers of all samples, both for the previous state
  and the current state. It may be enough to remove the difference for
  our cold-loading test.

- I added a function trying to detect ordering bias
@github-actions
Copy link
Copy Markdown

✅ Automated performance checks have passed on commit 99b9af7ff62e331e7c354904c6792c468e7e625c with the base branch dev.

Details

Performance tests 1st run output

No significative change in performance for tests:

Name Mean Median
loading 24.11ms -> 24.10ms (0.003ms, z: 0.50429) 36.00ms -> 35.85ms
seeking 308.40ms -> 311.83ms (-3.435ms, z: 0.30503) 17.25ms -> 17.10ms
audio-track-reload 32.16ms -> 32.22ms (-0.053ms, z: 1.63560) 48.15ms -> 48.30ms
cold loading multithread 51.26ms -> 50.45ms (0.814ms, z: 24.35218) 76.65ms -> 75.60ms
seeking multithread 13.72ms -> 13.68ms (0.044ms, z: 1.58353) 20.55ms -> 20.40ms
audio-track-reload multithread 30.29ms -> 30.13ms (0.160ms, z: 5.85138) 45.15ms -> 44.85ms
hot loading multithread 20.12ms -> 20.00ms (0.123ms, z: 6.53893) 30.00ms -> 29.85ms

@canalplus canalplus deleted a comment from github-actions Bot Feb 27, 2026
@canalplus canalplus deleted a comment from github-actions Bot Feb 27, 2026
@peaBerberian peaBerberian added the Priority: 3 (Low) This issue or PR has a low priority. label Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Priority: 3 (Low) This issue or PR has a low priority.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant