Skip to content

Understand the outlier benchmarks on 3.14 (main) vs. 3.13.0 #726

@mdboom

Description

@mdboom

As suggested in the last sync meeting, we should understand why some of the benchmarks regressed and progressed. There are possible outcomes for each:

  1. The benchmark is poorly designed
  2. There is low-hanging fixes in CPython to reduce the regression
  3. We are reasonably comfortable with the regression given improvements elsewhere

I think as a first pass, we should just try to classify along these lines, and then fix CPython (where possible) first, and fix benchmarks with a lower priority.

For the progressions, it may just be a source of WHATSNEW content.

Let's crowdsource this where possible, reporting back to the checklist below.

Using the last weekly as a guide, the statistically significant regressions are below. For longitudinal details, see the plot of benchmark performance over time below.

  • subparsers, many_optionals (argparse)
  • python_startup / python_startup_no_site
  • json_dumps / json_loads
  • mako
  • nbody
  • coroutines
  • typing_runtime_protocols
  • fannkuch
  • deltablue
  • shortest_path (networkx)
  • pickle_pure_python

The most statistically significant progressions are:

  • mdp (tuple hash caching provided a major speedup)
  • deepcopy / deepcopy_memo
  • go
  • regex / regex_effbot / regex_v8
  • float
  • pylint
  • spectral_norm
  • richards / richards_super
  • xml_etree_parse
  • dulwich_log
  • tomli_loads
  • genshi_text
  • 2to3
  • async stuff

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions