Skip to content

SIP276 Harden restart checkpoint validation and parser overflow handling#283

Closed
dlebauer wants to merge 1 commit intocodex/restart-mvp-masterfrom
codex/restart-contract-gaps
Closed

SIP276 Harden restart checkpoint validation and parser overflow handling#283
dlebauer wants to merge 1 commit intocodex/restart-mvp-masterfrom
codex/restart-contract-gaps

Conversation

@dlebauer
Copy link
Member

@dlebauer dlebauer commented Mar 4, 2026

Summary

  • What: Harden restart checkpoint handling by (1) validating loaded checkpoint boundaries against the midnight-window contract and (2) rejecting overflowed long long restart metadata values. Update restart infrastructure tests to cover both regressions, align mean.npp.* schema assertions, and document serialized tracker payload intent in the developer restart spec.
  • Motivation: Address contract gaps from the restart refactor review so invalid/tampered restart files fail fast instead of resuming silently.

How was this change tested?

List steps taken to test this change, with appropriate outputs if applicable

  • make sipnet
  • make -C tests/sipnet/test_restart_infrastructure tests
  • make -C tests/sipnet/test_restart_infrastructure run
  • Output: PASSED testRestartMVP

No tests/smoke/**/sipnet.out files were changed in this PR.

Reproduction steps

If appropriate, list steps to reproduce the change locally

  1. Check out branch codex/restart-contract-gaps.
  2. Build and run restart infrastructure tests:
    • make sipnet
    • make -C tests/sipnet/test_restart_infrastructure tests
    • make -C tests/sipnet/test_restart_infrastructure run
  3. Confirm added regressions in testRestartMVP.c:
    • testTamperedBoundaryNotNearMidnightFails
    • testProcessedStepsOverflowFails

Related issues

Checklist

  • Related issues are listed above. PRs without an approved, related issue may not get reviewed.
  • PR title has the issue number in it ("[#] ")
  • Tests added/updated for new features (if applicable)
  • Documentation updated (if applicable)
  • docs/CHANGELOG.md updated with noteworthy changes
  • Code formatted with clang-format (run git clang-format if needed)

Copilot AI review requested due to automatic review settings March 4, 2026 16:41
@github-actions
Copy link

github-actions bot commented Mar 4, 2026

Cpp-Linter Report ⚠️

Some files did not pass the configured checks!

clang-tidy (v19.1.1) reports: 1 concern(s)
  • src/sipnet/frontend.c:212:5: error: [clang-analyzer-core.NonNullParamChecker]

    Null pointer passed to 1st parameter expecting 'nonnull'

      212 |     fclose(out);
          |     ^      ~~~
    /home/runner/work/sipnet/sipnet/src/sipnet/frontend.c:165:7: note: Assuming field 'doMainOutput' is 0
      165 |   if (ctx.doMainOutput) {
          |       ^~~~~~~~~~~~~~~~
    /home/runner/work/sipnet/sipnet/src/sipnet/frontend.c:165:3: note: Taking false branch
      165 |   if (ctx.doMainOutput) {
          |   ^
    /home/runner/work/sipnet/sipnet/src/sipnet/frontend.c:171:5: note: Null pointer value stored to 'out'
      171 |     out = NULL;
          |     ^~~~~~~~~~
    /home/runner/work/sipnet/sipnet/src/sipnet/frontend.c:175:7: note: Assuming field 'dumpConfig' is 0
      175 |   if (ctx.dumpConfig) {
          |       ^~~~~~~~~~~~~~
    /home/runner/work/sipnet/sipnet/src/sipnet/frontend.c:175:3: note: Taking false branch
      175 |   if (ctx.dumpConfig) {
          |   ^
    /home/runner/work/sipnet/sipnet/src/sipnet/frontend.c:189:7: note: Assuming field 'events' is 0
      189 |   if (ctx.events) {
          |       ^~~~~~~~~~
    /home/runner/work/sipnet/sipnet/src/sipnet/frontend.c:189:3: note: Taking false branch
      189 |   if (ctx.events) {
          |   ^
    /home/runner/work/sipnet/sipnet/src/sipnet/frontend.c:200:7: note: Assuming field 'doSingleOutputs' is 0
      200 |   if (ctx.doSingleOutputs) {
          |       ^~~~~~~~~~~~~~~~~~~
    /home/runner/work/sipnet/sipnet/src/sipnet/frontend.c:200:3: note: Taking false branch
      200 |   if (ctx.doSingleOutputs) {
          |   ^
    /home/runner/work/sipnet/sipnet/src/sipnet/frontend.c:211:7: note: Assuming field 'doMainOutput' is not equal to 0
      211 |   if (ctx.doMainOutput) {
          |       ^~~~~~~~~~~~~~~~
    /home/runner/work/sipnet/sipnet/src/sipnet/frontend.c:211:3: note: Taking true branch
      211 |   if (ctx.doMainOutput) {
          |   ^
    /home/runner/work/sipnet/sipnet/src/sipnet/frontend.c:212:5: note: Null pointer passed to 1st parameter expecting 'nonnull'
      212 |     fclose(out);
          |     ^      ~~~

Have any feedback or feature suggestions? Share it here.

@dlebauer dlebauer changed the base branch from master to codex/restart-mvp-master March 4, 2026 16:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the red-team audit “restart contract-gap” fixes by tightening restart checkpoint boundary validation, hardening strict integer parsing against overflow, and expanding restart infrastructure tests/docs to lock in the contract.

Changes:

  • Add load-time checkpoint boundary validation and stricter restart parsing (including strtoll overflow rejection).
  • Move/track cumulative GDD via trackers for restart continuity, and integrate restart load/write into the main run loop.
  • Add/refresh restart infrastructure tests + fixtures and update developer/user docs for the restart schema/constraints.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/sipnet/test_restart_infrastructure/testRestartMVP.c New end-to-end restart infrastructure test suite covering boundary/overflow/tamper cases
tests/sipnet/test_restart_infrastructure/restart_segment2_late.clim Fixture for “restart must start near midnight” validation
tests/sipnet/test_restart_infrastructure/restart_segment2_bad.clim Fixture for strict climate mismatch failure
tests/sipnet/test_restart_infrastructure/restart_segment2.clim Fixture for valid segment-2 climate forcing
tests/sipnet/test_restart_infrastructure/restart_segment1_not_midnight.clim Fixture for invalid checkpoint boundary not near midnight
tests/sipnet/test_restart_infrastructure/restart_segment1.clim Fixture for valid segment-1 climate forcing
tests/sipnet/test_restart_infrastructure/restart_seg2_bad.in Input for running segment 2 with mismatched climate fixture
tests/sipnet/test_restart_infrastructure/restart_seg2.in Input for running segment 2 from checkpoint
tests/sipnet/test_restart_infrastructure/restart_seg1.in Input for running segment 1 and writing checkpoint
tests/sipnet/test_restart_infrastructure/restart_full.clim Fixture for continuous baseline run
tests/sipnet/test_restart_infrastructure/restart_cont.in Input for continuous baseline run
tests/sipnet/test_restart_infrastructure/restart.param Parameter fixture for restart tests
tests/sipnet/test_restart_infrastructure/norestart_b.in No-restart mode fixture
tests/sipnet/test_restart_infrastructure/norestart_a.in No-restart mode fixture
tests/sipnet/test_restart_infrastructure/events_segment2.in Segmented events fixture for segment 2
tests/sipnet/test_restart_infrastructure/events_segment1.in Segmented events fixture for segment 1
tests/sipnet/test_restart_infrastructure/events_base.in Baseline events fixture for continuous run
tests/sipnet/test_restart_infrastructure/Makefile Build/run harness for restart infrastructure tests
src/sipnet/state.h Update GDD semantics and add tracker fields for cumulative GDD continuity
src/sipnet/sipnet.c Integrate restart load/write, processed-step tracking, and GDD logic changes
src/sipnet/restart.h New restart module API
src/sipnet/restart.c New restart implementation with schema validation, boundary checks, and strict parsing
src/sipnet/frontend.c Add EVENTS_FILE prefix handling when initializing events input
src/sipnet/events.h Doc comment tweak for event input filename
src/sipnet/cli.c Add --events-file, --restart-in, --restart-out; adjust --file-name to require an arg
src/common/context.h Add context fields for events/restart paths
src/common/context.c Add defaults/metadata for EVENTS_FILE, RESTART_IN, RESTART_OUT
restart_refactor_prompts.md Added internal refactor/prompt canvas file
mkdocs.yml Add restart checkpoint spec to documentation nav
docs/user-guide/running-sipnet.md Document new CLI/config keys and restart constraints
docs/user-guide/model-inputs.md Update user-facing option docs (events/restart) and config example
docs/developer-guide/restart-checkpoint.md New developer spec for restart schema v1.0 and validation contract
docs/CHANGELOG.md Changelog entry for restart checkpoints
Makefile Add restart.c to SIPNET build sources
Comments suppressed due to low confidence (2)

src/sipnet/restart.c:1115

  • processedStepCount is loaded from untrusted checkpoint data (processed_steps) and then incremented on every timestep. If a checkpoint sets processed_steps to a very large value (e.g., LLONG_MAX), ++processedStepCount will overflow (undefined behavior) on the next processed step. Fix by validating processed_steps is within a safe range on load (e.g., 0 <= processed_steps <= LLONG_MAX - 1), and/or guarding the increment in restartNoteProcessedClimateStep with an explicit overflow check that fails with EXIT_CODE_BAD_PARAMETER_VALUE.
void restartNoteProcessedClimateStep(const ClimateNode *climateStep) {
  copyClimateSignature(&lastProcessedClimate, climateStep);
  hasLastProcessedClimate = 1;
  ++processedStepCount;
}

tests/sipnet/test_restart_infrastructure/testRestartMVP.c:49

  • runModelWithArgs builds a shell command using sprintf into a fixed-size buffer. Even in test code this is brittle and can overflow if paths/args grow (and it also makes whitespace in filenames/args hard to handle). Prefer snprintf with explicit bounds checking, and consider avoiding shell concatenation when possible (e.g., fork/exec with argv) to make the test harness more robust.
static int runModelWithArgs(const char *inputFile, const char *logFile,
                            const char *extraArgs) {
  char cmd[1024];
  if (extraArgs != NULL && extraArgs[0] != '\0') {
    sprintf(cmd, "%s -i %s %s > %s 2>&1", SIPNET_CMD, inputFile, extraArgs,
            logFile);
  } else {
    sprintf(cmd, "%s -i %s > %s 2>&1", SIPNET_CMD, inputFile, logFile);
  }
  return runShell(cmd);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 160 to 179
validateCheckpointBoundaryForWrite(const char *restartOut,
const RestartClimateSignature *boundary) {
double stepHours = boundary->length * 24.0;
if (stepHours <= RESTART_FLOAT_EPSILON) {
logError("Cannot write restart checkpoint %s: non-positive timestep length "
"at boundary (year=%d day=%d time=%.8f length=%.8f)\n",
restartOut, boundary->year, boundary->day, boundary->time,
boundary->length);
exit(EXIT_CODE_BAD_PARAMETER_VALUE);
}

double hoursUntilMidnight = 24.0 - boundary->time;
if (hoursUntilMidnight > (stepHours + RESTART_FLOAT_EPSILON)) {
logError("Cannot write restart checkpoint %s: last timestep ends more than "
"one timestep before midnight\n",
restartOut);
logError("Boundary timestep: year=%d day=%d time=%.8f length=%.8f\n",
boundary->year, boundary->day, boundary->time, boundary->length);
exit(EXIT_CODE_BAD_PARAMETER_VALUE);
}
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validateCheckpointBoundaryForWrite only checks length and the "within one timestep of midnight" condition, but does not validate that the stored boundary timestamp itself is sane (e.g., time in [0,24] and day in [1, daysInYear(year)]). As written, a tampered checkpoint with an out-of-range boundary.time/boundary.day can bypass the midnight-window logic (because 24.0 - time can go negative) and distort downstream boundary checks. Add explicit range validation for boundary.year/day/time (and keep the existing length > 0 check).

Copilot uses AI. Check for mistakes.
Comment on lines +183 to +205
validateCheckpointBoundaryForLoad(const char *restartIn,
const RestartClimateSignature *boundary) {
double stepHours = boundary->length * 24.0;
if (stepHours <= RESTART_FLOAT_EPSILON) {
logError("Restart boundary mismatch in %s: checkpoint boundary has "
"non-positive timestep length (year=%d day=%d time=%.8f "
"length=%.8f)\n",
restartIn, boundary->year, boundary->day, boundary->time,
boundary->length);
exit(EXIT_CODE_BAD_PARAMETER_VALUE);
}

double hoursUntilMidnight = 24.0 - boundary->time;
if (hoursUntilMidnight > (stepHours + RESTART_FLOAT_EPSILON)) {
logError(
"Restart boundary mismatch in %s: checkpoint boundary is more than "
"one timestep before midnight\n",
restartIn);
logError("Checkpoint boundary: year=%d day=%d time=%.8f length=%.8f\n",
boundary->year, boundary->day, boundary->time, boundary->length);
exit(EXIT_CODE_BAD_PARAMETER_VALUE);
}
}
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validateCheckpointBoundaryForLoad has the same issue as the write-side validation: it doesn't reject out-of-range boundary.time/boundary.day values. A checkpoint with boundary.time > 24 makes hoursUntilMidnight negative, which will incorrectly pass the "near midnight" check. Please validate boundary.time is within [0,24] and boundary.day is within the valid day-of-year range for boundary.year before applying the midnight-window check.

Copilot uses AI. Check for mistakes.
@dlebauer dlebauer changed the title Harden restart checkpoint validation and parser overflow handling SIP276 Harden restart checkpoint validation and parser overflow handling Mar 4, 2026
@dlebauer
Copy link
Member Author

dlebauer commented Mar 8, 2026

Folded into #276 and pushed on codex/restart-mvp-master, so closing this stacked follow-up as superseded.

@dlebauer dlebauer closed this Mar 8, 2026
@dlebauer dlebauer mentioned this pull request Mar 8, 2026
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants