Skip to content

Fix Sysbench MySQL TPCC restart-on-crash failure; bump sysbench package to rev6#738

Open
AlexWFMS wants to merge 1 commit into
mainfrom
users/alexwill/sysbench-tpcc-cleanup
Open

Fix Sysbench MySQL TPCC restart-on-crash failure; bump sysbench package to rev6#738
AlexWFMS wants to merge 1 commit into
mainfrom
users/alexwill/sysbench-tpcc-cleanup

Conversation

@AlexWFMS

Copy link
Copy Markdown
Contributor

Problem

The QoS MySQL Sysbench TPCC target goals fail ~100% on VirtualClient
restart-on-crash (observed across all VC versions in PPE01). An experiment
reaches the benchmark once, then any restart-on-crash dies in the
database-population step and the experiment runs to its full timeout.

Root cause

The shipped sysbench package has two cooperating defects on the MySQL TPCC path:

  1. cleanup-database.py has no TPCC branch — only MySQL OLTP is handled, so
    the Cleanup action is a no-op for TPCC and never drops the tables.
  2. populate-database.py's prepare is not idempotent — it runs
    sysbench tpcc … prepare against whatever is already in the DB.

When a crash leaves the TPCC tables populated, every restart re-runs prepare
against non-empty tables → sysbench emits
FATAL: … returned error 1062 (Duplicate entry '1' for key 'itemN.PRIMARY')
(or 1061 on indexes) → the run_command FATAL-detector fails the step → VC
restart-on-crash retries 6× → System.AggregateException (MonitorAgentWorkload)
→ the experiment runs to the full timeout.

Fix

  • populate-database.py: drop any existing TPCC tables (sysbench tpcc … cleanup)
    before prepare, so populate is idempotent regardless of prior (possibly
    crash-interrupted) state.
  • cleanup-database.py: add the missing MySQL TPCC branch so the Cleanup
    action actually drops the tables.

These scripts ship inside the sysbench-1.0.20.revN.zip package in the packages
blob store (not the build), so the package was re-revved:

  • Built sysbench-1.0.20.rev6.zip = rev5 contents + the two fixed scripts,
    and uploaded to the packages store.
  • Bumped all 4 PERF-{MYSQL,POSTGRESQL}-SYSBENCH-{OLTP,TPCC} profiles rev5rev6.

While doing this I found the repo's Sysbench/*.py had drifted from the shipped
package — the repo populate-database.py was a stale rev3-era version (using
--host + a truncate loop), while the deployed rev5 uses --hostIpAddress +
a run_command FATAL-detector and no truncate loop. This PR re-syncs the repo
copy to the shipped lineage
and adds the idempotency fix on top.

Verification

Reproduced and verified on an Ubuntu 24.04 VM with a VC-equivalent sysbench 1.1.0
build (akopytov/sysbench + Percona tpcc.lua, exactly as
configure-workload-generator.py builds it):

  • Before: re-running prepare against populated tables →
    FATAL … db_bulk_insert_next() failed (1062). Matches production.
  • After: patched populate-database.py recovers on a dirty DB (rc=0, no 1062,
    idempotent across repeated runs); patched cleanup-database.py drops the TPCC
    tables (item1 → no table).

Files changed

  • src/VirtualClient/VirtualClient.Actions/Sysbench/populate-database.py
  • src/VirtualClient/VirtualClient.Actions/Sysbench/cleanup-database.py
  • src/VirtualClient/VirtualClient.Main/profiles/PERF-MYSQL-SYSBENCH-OLTP.json
  • src/VirtualClient/VirtualClient.Main/profiles/PERF-MYSQL-SYSBENCH-TPCC.json
  • src/VirtualClient/VirtualClient.Main/profiles/PERF-POSTGRESQL-SYSBENCH-OLTP.json
  • src/VirtualClient/VirtualClient.Main/profiles/PERF-POSTGRESQL-SYSBENCH-TPCC.json
  • VERSION (3.3.14 → 3.3.15)

QoS MySQL Sysbench TPCC goals fail 100% on VirtualClient restart-on-crash. The
shipped sysbench package's cleanup-database.py has no TPCC branch (Cleanup is a
no-op) and populate-database.py's 'prepare' is not idempotent. When a crash
leaves the TPCC tables populated, every restart re-runs 'prepare' against
non-empty tables -> MySQL error 1062 (Duplicate entry itemN.PRIMARY) / 1061 ->
6 crashes -> System.AggregateException -> experiment runs to full timeout.

- populate-database.py: drop existing TPCC tables before 'prepare' so populate
  is idempotent regardless of prior (crash-interrupted) state. This also re-syncs
  the repo copy with the deployed package lineage (the repo had drifted to a
  stale rev3-era version using --host + a truncate loop; the shipped rev5 uses
  --hostIpAddress + a run_command FATAL-detector and no truncate loop).
- cleanup-database.py: add the missing MySQL TPCC branch so Cleanup drops tables.
- Bump all 4 SYSBENCH profiles (MYSQL/POSTGRESQL x OLTP/TPCC) rev5 -> rev6.

NOTE: the fixed scripts ship inside sysbench-1.0.20.rev6.zip in the 'packages'
blob store (not the build), so that package must be uploaded for this to take
effect. Reproduced and verified on an Ubuntu 24.04 VM with VC-equivalent
sysbench 1.1.0.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ericavella

ericavella commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

This is probably adding some confusion. The scripts in source are not being used in the workload. Make sure to make changes to the scripts in the packages in blob store. I think we also edit the version in-place here instead of bumping the package version (keep it at rev5).

@ericavella ericavella left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we don't need changes in source yet for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants