SIP276 Harden restart checkpoint validation and parser overflow handling by dlebauer · Pull Request #283 · PecanProject/sipnet

dlebauer · 2026-03-04T16:41:04Z

Summary

What: Harden restart checkpoint handling by (1) validating loaded checkpoint boundaries against the midnight-window contract and (2) rejecting overflowed long long restart metadata values. Update restart infrastructure tests to cover both regressions, align mean.npp.* schema assertions, and document serialized tracker payload intent in the developer restart spec.
Motivation: Address contract gaps from the restart refactor review so invalid/tampered restart files fail fast instead of resuming silently.

How was this change tested?

List steps taken to test this change, with appropriate outputs if applicable

make sipnet
make -C tests/sipnet/test_restart_infrastructure tests
make -C tests/sipnet/test_restart_infrastructure run
Output: PASSED testRestartMVP

No tests/smoke/**/sipnet.out files were changed in this PR.

Reproduction steps

If appropriate, list steps to reproduce the change locally

Check out branch codex/restart-contract-gaps.
Build and run restart infrastructure tests:
- make sipnet
- make -C tests/sipnet/test_restart_infrastructure tests
- make -C tests/sipnet/test_restart_infrastructure run
Confirm added regressions in testRestartMVP.c:
- testTamperedBoundaryNotNearMidnightFails
- testProcessedStepsOverflowFails

Related issues

Fixes #: N/A (follow-up hardening from PR SIP279 SIPNET Restart MVP #276 restart contract review, which is the one that fixes 276)

Checklist

Related issues are listed above. PRs without an approved, related issue may not get reviewed.
PR title has the issue number in it ("[#] ")
Tests added/updated for new features (if applicable)
Documentation updated (if applicable)
docs/CHANGELOG.md updated with noteworthy changes
Code formatted with clang-format (run git clang-format if needed)

github-actions · 2026-03-04T16:41:51Z

Cpp-Linter Report ⚠️

Some files did not pass the configured checks!

clang-tidy (v19.1.1) reports: 1 concern(s)

src/sipnet/frontend.c:212:5: error: [clang-analyzer-core.NonNullParamChecker]

Null pointer passed to 1st parameter expecting 'nonnull'

  212 |     fclose(out);
      |     ^      ~~~
/home/runner/work/sipnet/sipnet/src/sipnet/frontend.c:165:7: note: Assuming field 'doMainOutput' is 0
  165 |   if (ctx.doMainOutput) {
      |       ^~~~~~~~~~~~~~~~
/home/runner/work/sipnet/sipnet/src/sipnet/frontend.c:165:3: note: Taking false branch
  165 |   if (ctx.doMainOutput) {
      |   ^
/home/runner/work/sipnet/sipnet/src/sipnet/frontend.c:171:5: note: Null pointer value stored to 'out'
  171 |     out = NULL;
      |     ^~~~~~~~~~
/home/runner/work/sipnet/sipnet/src/sipnet/frontend.c:175:7: note: Assuming field 'dumpConfig' is 0
  175 |   if (ctx.dumpConfig) {
      |       ^~~~~~~~~~~~~~
/home/runner/work/sipnet/sipnet/src/sipnet/frontend.c:175:3: note: Taking false branch
  175 |   if (ctx.dumpConfig) {
      |   ^
/home/runner/work/sipnet/sipnet/src/sipnet/frontend.c:189:7: note: Assuming field 'events' is 0
  189 |   if (ctx.events) {
      |       ^~~~~~~~~~
/home/runner/work/sipnet/sipnet/src/sipnet/frontend.c:189:3: note: Taking false branch
  189 |   if (ctx.events) {
      |   ^
/home/runner/work/sipnet/sipnet/src/sipnet/frontend.c:200:7: note: Assuming field 'doSingleOutputs' is 0
  200 |   if (ctx.doSingleOutputs) {
      |       ^~~~~~~~~~~~~~~~~~~
/home/runner/work/sipnet/sipnet/src/sipnet/frontend.c:200:3: note: Taking false branch
  200 |   if (ctx.doSingleOutputs) {
      |   ^
/home/runner/work/sipnet/sipnet/src/sipnet/frontend.c:211:7: note: Assuming field 'doMainOutput' is not equal to 0
  211 |   if (ctx.doMainOutput) {
      |       ^~~~~~~~~~~~~~~~
/home/runner/work/sipnet/sipnet/src/sipnet/frontend.c:211:3: note: Taking true branch
  211 |   if (ctx.doMainOutput) {
      |   ^
/home/runner/work/sipnet/sipnet/src/sipnet/frontend.c:212:5: note: Null pointer passed to 1st parameter expecting 'nonnull'
  212 |     fclose(out);
      |     ^      ~~~

Have any feedback or feature suggestions? Share it here.

Copilot

Pull request overview

Implements the red-team audit “restart contract-gap” fixes by tightening restart checkpoint boundary validation, hardening strict integer parsing against overflow, and expanding restart infrastructure tests/docs to lock in the contract.

Changes:

Add load-time checkpoint boundary validation and stricter restart parsing (including strtoll overflow rejection).
Move/track cumulative GDD via trackers for restart continuity, and integrate restart load/write into the main run loop.
Add/refresh restart infrastructure tests + fixtures and update developer/user docs for the restart schema/constraints.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/sipnet/test_restart_infrastructure/testRestartMVP.c	New end-to-end restart infrastructure test suite covering boundary/overflow/tamper cases
tests/sipnet/test_restart_infrastructure/restart_segment2_late.clim	Fixture for “restart must start near midnight” validation
tests/sipnet/test_restart_infrastructure/restart_segment2_bad.clim	Fixture for strict climate mismatch failure
tests/sipnet/test_restart_infrastructure/restart_segment2.clim	Fixture for valid segment-2 climate forcing
tests/sipnet/test_restart_infrastructure/restart_segment1_not_midnight.clim	Fixture for invalid checkpoint boundary not near midnight
tests/sipnet/test_restart_infrastructure/restart_segment1.clim	Fixture for valid segment-1 climate forcing
tests/sipnet/test_restart_infrastructure/restart_seg2_bad.in	Input for running segment 2 with mismatched climate fixture
tests/sipnet/test_restart_infrastructure/restart_seg2.in	Input for running segment 2 from checkpoint
tests/sipnet/test_restart_infrastructure/restart_seg1.in	Input for running segment 1 and writing checkpoint
tests/sipnet/test_restart_infrastructure/restart_full.clim	Fixture for continuous baseline run
tests/sipnet/test_restart_infrastructure/restart_cont.in	Input for continuous baseline run
tests/sipnet/test_restart_infrastructure/restart.param	Parameter fixture for restart tests
tests/sipnet/test_restart_infrastructure/norestart_b.in	No-restart mode fixture
tests/sipnet/test_restart_infrastructure/norestart_a.in	No-restart mode fixture
tests/sipnet/test_restart_infrastructure/events_segment2.in	Segmented events fixture for segment 2
tests/sipnet/test_restart_infrastructure/events_segment1.in	Segmented events fixture for segment 1
tests/sipnet/test_restart_infrastructure/events_base.in	Baseline events fixture for continuous run
tests/sipnet/test_restart_infrastructure/Makefile	Build/run harness for restart infrastructure tests
src/sipnet/state.h	Update GDD semantics and add tracker fields for cumulative GDD continuity
src/sipnet/sipnet.c	Integrate restart load/write, processed-step tracking, and GDD logic changes
src/sipnet/restart.h	New restart module API
src/sipnet/restart.c	New restart implementation with schema validation, boundary checks, and strict parsing
src/sipnet/frontend.c	Add `EVENTS_FILE` prefix handling when initializing events input
src/sipnet/events.h	Doc comment tweak for event input filename
src/sipnet/cli.c	Add `--events-file`, `--restart-in`, `--restart-out`; adjust `--file-name` to require an arg
src/common/context.h	Add context fields for events/restart paths
src/common/context.c	Add defaults/metadata for `EVENTS_FILE`, `RESTART_IN`, `RESTART_OUT`
restart_refactor_prompts.md	Added internal refactor/prompt canvas file
mkdocs.yml	Add restart checkpoint spec to documentation nav
docs/user-guide/running-sipnet.md	Document new CLI/config keys and restart constraints
docs/user-guide/model-inputs.md	Update user-facing option docs (events/restart) and config example
docs/developer-guide/restart-checkpoint.md	New developer spec for restart schema v1.0 and validation contract
docs/CHANGELOG.md	Changelog entry for restart checkpoints
Makefile	Add `restart.c` to SIPNET build sources

Comments suppressed due to low confidence (2)

src/sipnet/restart.c:1115

processedStepCount is loaded from untrusted checkpoint data (processed_steps) and then incremented on every timestep. If a checkpoint sets processed_steps to a very large value (e.g., LLONG_MAX), ++processedStepCount will overflow (undefined behavior) on the next processed step. Fix by validating processed_steps is within a safe range on load (e.g., 0 <= processed_steps <= LLONG_MAX - 1), and/or guarding the increment in restartNoteProcessedClimateStep with an explicit overflow check that fails with EXIT_CODE_BAD_PARAMETER_VALUE.

void restartNoteProcessedClimateStep(const ClimateNode *climateStep) {
  copyClimateSignature(&lastProcessedClimate, climateStep);
  hasLastProcessedClimate = 1;
  ++processedStepCount;
}

tests/sipnet/test_restart_infrastructure/testRestartMVP.c:49

runModelWithArgs builds a shell command using sprintf into a fixed-size buffer. Even in test code this is brittle and can overflow if paths/args grow (and it also makes whitespace in filenames/args hard to handle). Prefer snprintf with explicit bounds checking, and consider avoiding shell concatenation when possible (e.g., fork/exec with argv) to make the test harness more robust.

static int runModelWithArgs(const char *inputFile, const char *logFile,
                            const char *extraArgs) {
  char cmd[1024];
  if (extraArgs != NULL && extraArgs[0] != '\0') {
    sprintf(cmd, "%s -i %s %s > %s 2>&1", SIPNET_CMD, inputFile, extraArgs,
            logFile);
  } else {
    sprintf(cmd, "%s -i %s > %s 2>&1", SIPNET_CMD, inputFile, logFile);
  }
  return runShell(cmd);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-04T16:54:43Z

src/sipnet/restart.c

 validateCheckpointBoundaryForWrite(const char *restartOut,
                                   const RestartClimateSignature *boundary) {
  double stepHours = boundary->length * 24.0;
  if (stepHours <= RESTART_FLOAT_EPSILON) {
    logError("Cannot write restart checkpoint %s: non-positive timestep length "
             "at boundary (year=%d day=%d time=%.8f length=%.8f)\n",
             restartOut, boundary->year, boundary->day, boundary->time,
             boundary->length);
    exit(EXIT_CODE_BAD_PARAMETER_VALUE);
  }

  double hoursUntilMidnight = 24.0 - boundary->time;
  if (hoursUntilMidnight > (stepHours + RESTART_FLOAT_EPSILON)) {
    logError("Cannot write restart checkpoint %s: last timestep ends more than "
             "one timestep before midnight\n",
             restartOut);
    logError("Boundary timestep: year=%d day=%d time=%.8f length=%.8f\n",
             boundary->year, boundary->day, boundary->time, boundary->length);
    exit(EXIT_CODE_BAD_PARAMETER_VALUE);
  }


validateCheckpointBoundaryForWrite only checks length and the "within one timestep of midnight" condition, but does not validate that the stored boundary timestamp itself is sane (e.g., time in [0,24] and day in [1, daysInYear(year)]). As written, a tampered checkpoint with an out-of-range boundary.time/boundary.day can bypass the midnight-window logic (because 24.0 - time can go negative) and distort downstream boundary checks. Add explicit range validation for boundary.year/day/time (and keep the existing length > 0 check).

Copilot · 2026-03-04T16:54:44Z

src/sipnet/restart.c

+validateCheckpointBoundaryForLoad(const char *restartIn,
+                                  const RestartClimateSignature *boundary) {
+  double stepHours = boundary->length * 24.0;
+  if (stepHours <= RESTART_FLOAT_EPSILON) {
+    logError("Restart boundary mismatch in %s: checkpoint boundary has "
+             "non-positive timestep length (year=%d day=%d time=%.8f "
+             "length=%.8f)\n",
+             restartIn, boundary->year, boundary->day, boundary->time,
+             boundary->length);
+    exit(EXIT_CODE_BAD_PARAMETER_VALUE);
+  }
+
+  double hoursUntilMidnight = 24.0 - boundary->time;
+  if (hoursUntilMidnight > (stepHours + RESTART_FLOAT_EPSILON)) {
+    logError(
+        "Restart boundary mismatch in %s: checkpoint boundary is more than "
+        "one timestep before midnight\n",
+        restartIn);
+    logError("Checkpoint boundary: year=%d day=%d time=%.8f length=%.8f\n",
+             boundary->year, boundary->day, boundary->time, boundary->length);
+    exit(EXIT_CODE_BAD_PARAMETER_VALUE);
+  }
+}


validateCheckpointBoundaryForLoad has the same issue as the write-side validation: it doesn't reject out-of-range boundary.time/boundary.day values. A checkpoint with boundary.time > 24 makes hoursUntilMidnight negative, which will incorrectly pass the "near midnight" check. Please validate boundary.time is within [0,24] and boundary.day is within the valid day-of-year range for boundary.year before applying the midnight-window check.

dlebauer · 2026-03-08T06:11:41Z

Folded into #276 and pushed on codex/restart-mvp-master, so closing this stacked follow-up as superseded.

Harden restart checkpoint validation and regression tests

a691392

dlebauer requested review from Alomir, mdietze and mswilburn as code owners March 4, 2026 16:41

Copilot AI review requested due to automatic review settings March 4, 2026 16:41

Copilot started reviewing on behalf of dlebauer March 4, 2026 16:41 View session

dlebauer changed the base branch from master to codex/restart-mvp-master March 4, 2026 16:42

Copilot AI reviewed Mar 4, 2026

View reviewed changes

dlebauer changed the title ~~Harden restart checkpoint validation and parser overflow handling~~ SIP276 Harden restart checkpoint validation and parser overflow handling Mar 4, 2026

dlebauer closed this Mar 8, 2026

dlebauer mentioned this pull request Mar 8, 2026

SIP279 SIPNET Restart MVP #276

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SIP276 Harden restart checkpoint validation and parser overflow handling#283

SIP276 Harden restart checkpoint validation and parser overflow handling#283
dlebauer wants to merge 1 commit intocodex/restart-mvp-masterfrom
codex/restart-contract-gaps

dlebauer commented Mar 4, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

dlebauer commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dlebauer commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How was this change tested?

Reproduction steps

Related issues

Checklist

Uh oh!

github-actions bot commented Mar 4, 2026

Cpp-Linter Report ⚠️

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

dlebauer commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dlebauer commented Mar 4, 2026 •

edited

Loading