fix: resolve 11 high/medium severity bugs from broad codebase scan by FileSystemGuy · Pull Request #395 · mlcommons/storage

FileSystemGuy · 2026-05-26T23:26:41Z

Summary

This PR fixes 11 bugs identified in a broad static analysis of the codebase. The findings span core framework stability, submission validation correctness, and benchmark execution reliability. Every change is a targeted fix — no refactoring or feature additions.

Issues are grouped by subsystem below. Each entry includes the root cause and what was done to fix it.

Core Framework

CORE-1 — Ctrl+C raises `AttributeError` instead of exiting cleanly

File: mlpstorage_py/config.py

EXIT_CODE.INTERRUPTED and EXIT_CODE.ERROR were referenced in main.py but absent from the EXIT_CODE enum. Pressing Ctrl+C triggered the SIGINT handler which called sys.exit(EXIT_CODE.INTERRUPTED), raising AttributeError: INTERRUPTED is not a valid EXIT_CODE and crashing with a traceback instead of exiting cleanly.

Fix: Added INTERRUPTED = 8 and ERROR = 9 to the EXIT_CODE enum in config.py.

CORE-2 — Run entries validated against datagen metadata

File: mlpstorage_py/submission_checker/loader.py

In the submission loader, metadata_path was set inside the outer loop over datagen timestamps and never updated in the inner loop over run timestamps. Every run entry was therefore loaded with the datagen's metadata file. Checks such as closed_submission_parameters and verify_datasize_usage were reading datagen invocation arguments instead of run invocation arguments, silently corrupting all parameter validation for training runs.

Fix: Added metadata_path = self.find_metadata_path(timestamp_path) as the first statement inside the inner run-timestamp loop. Cross-phase checks that need datagen params already iterate self.submissions_logs.datagen_files explicitly and are unaffected.

CORE-3 — `--params` values containing `=` crash before benchmark starts

File: mlpstorage_py/benchmarks/dlio.py

process_dlio_params() split each --params argument on every = character with no maxsplit. Any param value containing = (base64 credentials, S3 endpoint URIs, connection strings) produced more than 2 parts, causing ValueError: too many values to unpack before the benchmark started. Object-storage workloads with credential parameters were entirely blocked.

Fix: Changed item.split("=") to item.split("=", 1).

CORE-4 — `CheckpointingBenchmark._run()` swallows all exceptions silently

File: mlpstorage_py/benchmarks/dlio.py

The except Exception as e block in CheckpointingBenchmark._run() discarded the caught exception without logging it, returning EXIT_CODE.FAILURE with no diagnostic output. Any failure in execute_command() or datasize() was invisible to the operator. TrainingBenchmark._run() already logged str(e) correctly.

Fix: Added self.logger.error(f'Checkpointing benchmark failed: {e}') before the return.

CORE-5 — `UnboundLocalError` replaces original exception from `_run()`

File: mlpstorage_py/benchmarks/base.py

In Benchmark.run(), result = self._run() was inside a try with a finally that performed cleanup. If _run() raised an exception, the finally block ran correctly but execution then fell through to return result. Since result was never assigned, Python raised UnboundLocalError: local variable 'result' referenced before assignment, replacing the original diagnostic exception with a secondary one and destroying failure information.

Fix: Initialized result = EXIT_CODE.FAILURE immediately before the try block. Also added EXIT_CODE to the import from mlpstorage_py.config in base.py.

CORE-6 — `JSONParser.contains` raises `AttributeError` on any `in` test

File: mlpstorage_py/submission_checker/parsers/json_parser.py

__contains__ returned key in self.messages, but self.messages does not exist on JSONParser — the parsed JSON dict is stored in self.d. Any key in parser test raised AttributeError: 'JSONParser' object has no attribute 'messages'.

Fix: Changed self.messages to self.d.

CORE-12 — Unused `pyarrow` import makes it a hard dependency of all benchmarks

File: mlpstorage_py/benchmarks/base.py

from pyarrow.ipc import open_stream was present at the top of base.py but open_stream was never referenced anywhere in the file. Because it was a top-level import, any environment without pyarrow installed would fail to import the benchmark base class entirely, breaking all benchmarks with ImportError. pyarrow is retained in pyproject.toml as it is needed by DLIO and parquet handling elsewhere.

Fix: Removed the unused import line.

CORE-13 — DLIO exit code discarded; failed runs report `EXIT_CODE.SUCCESS`

File: mlpstorage_py/benchmarks/dlio.py

execute_command() called self._execute_command(...) but discarded the (stdout, stderr, return_code) return value. If DLIO exited with a non-zero code (OOM, assertion failure, I/O error), execute_command() returned silently and TrainingBenchmark._run() returned EXIT_CODE.SUCCESS. Results validation then proceeded against nonexistent or incomplete output files.

Fix: execute_command() now unpacks the return value and raises RuntimeError if the return code is non-zero. The existing except Exception handler in _run() (improved by CORE-4) catches and logs this.

Submission Validation

RULES-1 — 500-steps dataset minimum formula is circular; constraint never fires

File: mlpstorage_py/submission_checker/checks/training_checks.py

The dataset minimum size check computed:

num_steps_per_epoch = max(MIN_STEPS_PER_EPOCH,
                          num_files_train * num_samples_per_file // (batch_size * num_accelerators))
min_samples_steps = num_steps_per_epoch * batch_size * num_accelerators

Because the second argument to max() is derived from the actual file count, num_steps_per_epoch is always ≥ the actual steps, making min_samples_steps always ≥ the actual sample count. The steps constraint could never produce a "too few files" error. The canonical computation in rules/utils.py does not have this defect.

Fix: Replaced the two-line calculation with the direct formula:

min_samples_steps = MIN_STEPS_PER_EPOCH * batch_size * num_accelerators

RULES-3 — `NameError` in subset-mode process count check; check silently passes

File: mlpstorage_py/submission_checker/checks/checkpointing_checks.py

In the closed-submission process count check, model_key was used in the error log inside the if checkpoint_mode == "subset": branch, but model_key was only assigned inside the else: branch (after a regex match on the model name). When a CLOSED subset-mode submission had the wrong process count, the code hit NameError, which was silently swallowed, and the check returned as if it passed. The required 8-process count for subset mode was never enforced.

Fix: Replaced model_key with model_name in the subset-mode error log. model_name is assigned unconditionally at the top of the loop body and is the correct identifier for the message.

RULES-4 — AU check reads nonexistent DLIO fields; every submission fails spuriously

File: mlpstorage_py/submission_checker/checks/training_checks.py

The accelerator utilization check read train_au_mean_percentage and train_au_meet_expectation from the DLIO summary JSON. Neither field exists in actual DLIO output. The real field is train_au_percentage, a list of per-epoch AU percentage values. Both .get() calls always returned their defaults (0 and ""), causing au_expectation != "success" to always be True. Every training submission was flagged as an AU failure regardless of actual utilization, making it impossible to distinguish passing from failing submissions.

Fix: Replaced the broken field lookups with logic that reads train_au_percentage, computes the mean, and compares it against the 90% minimum threshold specified in the MLPerf Storage rules (Rules.md §3.3.2):

au_values = metrics.get("train_au_percentage", [])
au_mean = sum(au_values) / len(au_values)
if au_mean < 90.0:
    # log and fail

Test plan

pytest tests/unit -v passes with no new failures
pytest tests/integration -v passes where applicable
Ctrl+C during a benchmark run exits cleanly with a non-zero code (not a traceback)
A --params argument containing = in the value (e.g. key=val=extra) is parsed correctly
A submission with train_au_percentage values above 90% passes the AU check; values below 90% fail it
A checkpointing run that fails produces a logged error message, not a silent failure

Fixes span core framework stability, submission validation correctness, and benchmark execution reliability. All changes are targeted one-line or small-block fixes with no refactoring. ## CORE-1: Ctrl+C raises AttributeError instead of exiting cleanly config.py — Added missing EXIT_CODE.INTERRUPTED (8) and EXIT_CODE.ERROR (9) enum members. Without these, the SIGINT/SIGTERM signal handler called sys.exit(EXIT_CODE.INTERRUPTED) and crashed with AttributeError before the process could exit. ## CORE-2: Run entries validated against datagen metadata (stale path) submission_checker/loader.py — Added metadata_path re-computation inside the inner run-timestamp loop. Previously metadata_path was set only in the outer datagen loop, so every run entry was loaded with the datagen's metadata file. Checks like closed_submission_parameters and verify_datasize_usage were auditing datagen invocation args instead of run invocation args. ## CORE-3: --params values containing '=' crash before benchmark starts benchmarks/dlio.py — Changed item.split("=") to item.split("=", 1) in process_dlio_params(). Without maxsplit=1, any param value containing '=' (base64 credentials, S3 URIs, endpoint strings) produced more than 2 parts and raised ValueError: too many values to unpack before the benchmark started. ## CORE-4: CheckpointingBenchmark._run() swallows exceptions silently benchmarks/dlio.py — Added self.logger.error() call in the except block of CheckpointingBenchmark._run(). Previously the caught exception 'e' was discarded with no log output, making all checkpointing failures produce a silent EXIT_CODE.FAILURE. Now matches the logging pattern in TrainingBenchmark._run(). ## CORE-5: UnboundLocalError masks original exception from _run() benchmarks/base.py — Initialized result = EXIT_CODE.FAILURE before the try block in Benchmark.run(). If _run() raised an exception, the finally block completed cleanup correctly but then 'return result' hit UnboundLocalError (result was never assigned), replacing the original diagnostic exception with a secondary one. Also added EXIT_CODE to the config import in base.py. ## CORE-6: JSONParser.__contains__ raises AttributeError on any 'in' test submission_checker/parsers/json_parser.py — Changed self.messages to self.d in __contains__. The attribute self.messages does not exist; the parsed JSON dict is stored in self.d. Any 'key in parser' test raised AttributeError. ## CORE-12: Unused pyarrow import makes pyarrow a hard benchmark dependency benchmarks/base.py — Removed unused 'from pyarrow.ipc import open_stream'. The symbol open_stream was never referenced in the file. The top-level import forced pyarrow to be present at import time for all benchmarks, failing with ImportError if absent. pyarrow remains in pyproject.toml as it is needed by DLIO and parquet handling elsewhere. ## CORE-13: DLIO exit code discarded; failed runs report EXIT_CODE.SUCCESS benchmarks/dlio.py — execute_command() now captures the return code from _execute_command() and raises RuntimeError on non-zero. Previously the return value was discarded entirely, so a DLIO crash or assertion failure left TrainingBenchmark._run() returning SUCCESS and proceeding to validate nonexistent or incomplete results. ## RULES-1: 500-steps dataset minimum formula is circular; check never fires submission_checker/checks/training_checks.py — Replaced the circular num_steps_per_epoch intermediate with the direct formula: min_samples_steps = MIN_STEPS_PER_EPOCH * batch_size * num_accelerators The old code derived num_steps_per_epoch from the actual file count using max(..., actual_steps), then multiplied back. Because actual_steps >= itself, min_samples_steps was always >= actual samples, so the constraint could never produce a "too few files" error. The direct formula matches rules/utils.py. ## RULES-3: NameError in subset-mode process count check; check silently passes submission_checker/checks/checkpointing_checks.py — Replaced undefined model_key with model_name in the subset-mode error log. model_key was only assigned inside the else branch (after a regex match), but was referenced in the if branch. The NameError was silently swallowed, causing CLOSED subset-mode submissions with the wrong process count to pass validation unchallenged. ## RULES-4: AU check reads nonexistent DLIO fields; every submission fails submission_checker/checks/training_checks.py — Replaced lookups for train_au_mean_percentage and train_au_meet_expectation (neither exists in DLIO output) with the actual field train_au_percentage (a list of per-epoch AU values). The check now computes the mean of that list and compares it against the 90% minimum required by the MLPerf Storage rules (Rules.md §3.3.2). Previously both .get() calls always returned their defaults (0 and ""), making au_expectation != "success" always True and flagging every submission as an AU failure regardless of actual utilization.

github-actions · 2026-05-26T23:26:50Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

FileSystemGuy requested a review from a team May 26, 2026 23:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: resolve 11 high/medium severity bugs from broad codebase scan#395

fix: resolve 11 high/medium severity bugs from broad codebase scan#395
FileSystemGuy wants to merge 1 commit into
mainfrom
FileSystemGuy-bugs-high

FileSystemGuy commented May 26, 2026

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

FileSystemGuy commented May 26, 2026

Summary

Core Framework

CORE-1 — Ctrl+C raises AttributeError instead of exiting cleanly

CORE-2 — Run entries validated against datagen metadata

CORE-3 — --params values containing = crash before benchmark starts

CORE-4 — CheckpointingBenchmark._run() swallows all exceptions silently

CORE-5 — UnboundLocalError replaces original exception from _run()

CORE-6 — JSONParser.__contains__ raises AttributeError on any in test

CORE-12 — Unused pyarrow import makes it a hard dependency of all benchmarks

CORE-13 — DLIO exit code discarded; failed runs report EXIT_CODE.SUCCESS

Submission Validation

RULES-1 — 500-steps dataset minimum formula is circular; constraint never fires

RULES-3 — NameError in subset-mode process count check; check silently passes

RULES-4 — AU check reads nonexistent DLIO fields; every submission fails spuriously

Test plan

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

CORE-1 — Ctrl+C raises `AttributeError` instead of exiting cleanly

CORE-3 — `--params` values containing `=` crash before benchmark starts

CORE-4 — `CheckpointingBenchmark._run()` swallows all exceptions silently

CORE-5 — `UnboundLocalError` replaces original exception from `_run()`

CORE-6 — `JSONParser.contains` raises `AttributeError` on any `in` test

CORE-12 — Unused `pyarrow` import makes it a hard dependency of all benchmarks

CORE-13 — DLIO exit code discarded; failed runs report `EXIT_CODE.SUCCESS`

RULES-3 — `NameError` in subset-mode process count check; check silently passes