SIP276 Harden restart checkpoint validation and parser overflow handling#283
SIP276 Harden restart checkpoint validation and parser overflow handling#283dlebauer wants to merge 1 commit intocodex/restart-mvp-masterfrom
Conversation
Cpp-Linter Report
|
There was a problem hiding this comment.
Pull request overview
Implements the red-team audit “restart contract-gap” fixes by tightening restart checkpoint boundary validation, hardening strict integer parsing against overflow, and expanding restart infrastructure tests/docs to lock in the contract.
Changes:
- Add load-time checkpoint boundary validation and stricter restart parsing (including
strtolloverflow rejection). - Move/track cumulative GDD via
trackersfor restart continuity, and integrate restart load/write into the main run loop. - Add/refresh restart infrastructure tests + fixtures and update developer/user docs for the restart schema/constraints.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/sipnet/test_restart_infrastructure/testRestartMVP.c | New end-to-end restart infrastructure test suite covering boundary/overflow/tamper cases |
| tests/sipnet/test_restart_infrastructure/restart_segment2_late.clim | Fixture for “restart must start near midnight” validation |
| tests/sipnet/test_restart_infrastructure/restart_segment2_bad.clim | Fixture for strict climate mismatch failure |
| tests/sipnet/test_restart_infrastructure/restart_segment2.clim | Fixture for valid segment-2 climate forcing |
| tests/sipnet/test_restart_infrastructure/restart_segment1_not_midnight.clim | Fixture for invalid checkpoint boundary not near midnight |
| tests/sipnet/test_restart_infrastructure/restart_segment1.clim | Fixture for valid segment-1 climate forcing |
| tests/sipnet/test_restart_infrastructure/restart_seg2_bad.in | Input for running segment 2 with mismatched climate fixture |
| tests/sipnet/test_restart_infrastructure/restart_seg2.in | Input for running segment 2 from checkpoint |
| tests/sipnet/test_restart_infrastructure/restart_seg1.in | Input for running segment 1 and writing checkpoint |
| tests/sipnet/test_restart_infrastructure/restart_full.clim | Fixture for continuous baseline run |
| tests/sipnet/test_restart_infrastructure/restart_cont.in | Input for continuous baseline run |
| tests/sipnet/test_restart_infrastructure/restart.param | Parameter fixture for restart tests |
| tests/sipnet/test_restart_infrastructure/norestart_b.in | No-restart mode fixture |
| tests/sipnet/test_restart_infrastructure/norestart_a.in | No-restart mode fixture |
| tests/sipnet/test_restart_infrastructure/events_segment2.in | Segmented events fixture for segment 2 |
| tests/sipnet/test_restart_infrastructure/events_segment1.in | Segmented events fixture for segment 1 |
| tests/sipnet/test_restart_infrastructure/events_base.in | Baseline events fixture for continuous run |
| tests/sipnet/test_restart_infrastructure/Makefile | Build/run harness for restart infrastructure tests |
| src/sipnet/state.h | Update GDD semantics and add tracker fields for cumulative GDD continuity |
| src/sipnet/sipnet.c | Integrate restart load/write, processed-step tracking, and GDD logic changes |
| src/sipnet/restart.h | New restart module API |
| src/sipnet/restart.c | New restart implementation with schema validation, boundary checks, and strict parsing |
| src/sipnet/frontend.c | Add EVENTS_FILE prefix handling when initializing events input |
| src/sipnet/events.h | Doc comment tweak for event input filename |
| src/sipnet/cli.c | Add --events-file, --restart-in, --restart-out; adjust --file-name to require an arg |
| src/common/context.h | Add context fields for events/restart paths |
| src/common/context.c | Add defaults/metadata for EVENTS_FILE, RESTART_IN, RESTART_OUT |
| restart_refactor_prompts.md | Added internal refactor/prompt canvas file |
| mkdocs.yml | Add restart checkpoint spec to documentation nav |
| docs/user-guide/running-sipnet.md | Document new CLI/config keys and restart constraints |
| docs/user-guide/model-inputs.md | Update user-facing option docs (events/restart) and config example |
| docs/developer-guide/restart-checkpoint.md | New developer spec for restart schema v1.0 and validation contract |
| docs/CHANGELOG.md | Changelog entry for restart checkpoints |
| Makefile | Add restart.c to SIPNET build sources |
Comments suppressed due to low confidence (2)
src/sipnet/restart.c:1115
processedStepCountis loaded from untrusted checkpoint data (processed_steps) and then incremented on every timestep. If a checkpoint setsprocessed_stepsto a very large value (e.g.,LLONG_MAX),++processedStepCountwill overflow (undefined behavior) on the next processed step. Fix by validatingprocessed_stepsis within a safe range on load (e.g.,0 <= processed_steps <= LLONG_MAX - 1), and/or guarding the increment inrestartNoteProcessedClimateStepwith an explicit overflow check that fails withEXIT_CODE_BAD_PARAMETER_VALUE.
void restartNoteProcessedClimateStep(const ClimateNode *climateStep) {
copyClimateSignature(&lastProcessedClimate, climateStep);
hasLastProcessedClimate = 1;
++processedStepCount;
}
tests/sipnet/test_restart_infrastructure/testRestartMVP.c:49
runModelWithArgsbuilds a shell command usingsprintfinto a fixed-size buffer. Even in test code this is brittle and can overflow if paths/args grow (and it also makes whitespace in filenames/args hard to handle). Prefersnprintfwith explicit bounds checking, and consider avoiding shell concatenation when possible (e.g.,fork/execwith argv) to make the test harness more robust.
static int runModelWithArgs(const char *inputFile, const char *logFile,
const char *extraArgs) {
char cmd[1024];
if (extraArgs != NULL && extraArgs[0] != '\0') {
sprintf(cmd, "%s -i %s %s > %s 2>&1", SIPNET_CMD, inputFile, extraArgs,
logFile);
} else {
sprintf(cmd, "%s -i %s > %s 2>&1", SIPNET_CMD, inputFile, logFile);
}
return runShell(cmd);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| validateCheckpointBoundaryForWrite(const char *restartOut, | ||
| const RestartClimateSignature *boundary) { | ||
| double stepHours = boundary->length * 24.0; | ||
| if (stepHours <= RESTART_FLOAT_EPSILON) { | ||
| logError("Cannot write restart checkpoint %s: non-positive timestep length " | ||
| "at boundary (year=%d day=%d time=%.8f length=%.8f)\n", | ||
| restartOut, boundary->year, boundary->day, boundary->time, | ||
| boundary->length); | ||
| exit(EXIT_CODE_BAD_PARAMETER_VALUE); | ||
| } | ||
|
|
||
| double hoursUntilMidnight = 24.0 - boundary->time; | ||
| if (hoursUntilMidnight > (stepHours + RESTART_FLOAT_EPSILON)) { | ||
| logError("Cannot write restart checkpoint %s: last timestep ends more than " | ||
| "one timestep before midnight\n", | ||
| restartOut); | ||
| logError("Boundary timestep: year=%d day=%d time=%.8f length=%.8f\n", | ||
| boundary->year, boundary->day, boundary->time, boundary->length); | ||
| exit(EXIT_CODE_BAD_PARAMETER_VALUE); | ||
| } |
There was a problem hiding this comment.
validateCheckpointBoundaryForWrite only checks length and the "within one timestep of midnight" condition, but does not validate that the stored boundary timestamp itself is sane (e.g., time in [0,24] and day in [1, daysInYear(year)]). As written, a tampered checkpoint with an out-of-range boundary.time/boundary.day can bypass the midnight-window logic (because 24.0 - time can go negative) and distort downstream boundary checks. Add explicit range validation for boundary.year/day/time (and keep the existing length > 0 check).
| validateCheckpointBoundaryForLoad(const char *restartIn, | ||
| const RestartClimateSignature *boundary) { | ||
| double stepHours = boundary->length * 24.0; | ||
| if (stepHours <= RESTART_FLOAT_EPSILON) { | ||
| logError("Restart boundary mismatch in %s: checkpoint boundary has " | ||
| "non-positive timestep length (year=%d day=%d time=%.8f " | ||
| "length=%.8f)\n", | ||
| restartIn, boundary->year, boundary->day, boundary->time, | ||
| boundary->length); | ||
| exit(EXIT_CODE_BAD_PARAMETER_VALUE); | ||
| } | ||
|
|
||
| double hoursUntilMidnight = 24.0 - boundary->time; | ||
| if (hoursUntilMidnight > (stepHours + RESTART_FLOAT_EPSILON)) { | ||
| logError( | ||
| "Restart boundary mismatch in %s: checkpoint boundary is more than " | ||
| "one timestep before midnight\n", | ||
| restartIn); | ||
| logError("Checkpoint boundary: year=%d day=%d time=%.8f length=%.8f\n", | ||
| boundary->year, boundary->day, boundary->time, boundary->length); | ||
| exit(EXIT_CODE_BAD_PARAMETER_VALUE); | ||
| } | ||
| } |
There was a problem hiding this comment.
validateCheckpointBoundaryForLoad has the same issue as the write-side validation: it doesn't reject out-of-range boundary.time/boundary.day values. A checkpoint with boundary.time > 24 makes hoursUntilMidnight negative, which will incorrectly pass the "near midnight" check. Please validate boundary.time is within [0,24] and boundary.day is within the valid day-of-year range for boundary.year before applying the midnight-window check.
|
Folded into #276 and pushed on codex/restart-mvp-master, so closing this stacked follow-up as superseded. |
Summary
long longrestart metadata values. Update restart infrastructure tests to cover both regressions, alignmean.npp.*schema assertions, and document serialized tracker payload intent in the developer restart spec.How was this change tested?
List steps taken to test this change, with appropriate outputs if applicable
make sipnetmake -C tests/sipnet/test_restart_infrastructure testsmake -C tests/sipnet/test_restart_infrastructure runPASSED testRestartMVPNo
tests/smoke/**/sipnet.outfiles were changed in this PR.Reproduction steps
If appropriate, list steps to reproduce the change locally
codex/restart-contract-gaps.make sipnetmake -C tests/sipnet/test_restart_infrastructure testsmake -C tests/sipnet/test_restart_infrastructure runtestRestartMVP.c:testTamperedBoundaryNotNearMidnightFailstestProcessedStepsOverflowFailsRelated issues
276)Checklist
docs/CHANGELOG.mdupdated with noteworthy changesclang-format(rungit clang-formatif needed)