fix(GRSModule): fix Scenario Inspector fallback, increase inference max_tokens, expand mutations catalog#3413
Merged
gorkem-bwl merged 3 commits intodevelopfrom Feb 25, 2026
Conversation
When final/scenarios.jsonl is absent, fall back to mutated_candidates.jsonl instead of base_scenarios_deduped.jsonl. Mutated candidates carry obligation_id, base_scenario_id, and mutation data, so seed_trace and mutation_trace are reconstructed correctly, making Stages 1–3 render properly before the validate stage is run. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The 500-token cap caused all grok-3-mini responses to be truncated mid-sentence (finish_reason: length). Updated the default in InferConfig, the CLI --max-tokens argument, and the infer-multi-model Makefile target. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Added authority_pressure, ambiguity_pressure, and bypass_request mutation families with parameterized templates. Replaced the minimal stubs with richer mutations covering authority bypass, deadline pressure, incomplete documentation, review skipping, audit evasion, and forced yes/no responses. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe your changes
Summary
Scenario Inspector fallback (
pages/1_Scenario_Inspector.py): whenfinal/scenarios.jsonldoesn't exist yet, the inspector now falls back tomutated_candidates.jsonl(perturb stage) instead ofbase_scenarios_deduped.jsonl. Base scenarios lackseed_traceandmutation_trace, so Stages 1–3 were rendering empty/warnings. Mutatedcandidates have all the required fields;
seed_traceandmutation_traceare reconstructed from
obligation_id,base_scenario_id, andmutation.Base scenarios are kept as a last-resort fallback.
Inference token limit (
src/infer/runner.py,src/cli.py,Makefile):increased default
max_tokensfrom 500 to 2048. The old limit caused allgrok-3-mini responses to be truncated mid-sentence (
finish_reason: length).Updated in
InferConfig, the--max-tokensCLI default, and theinfer-multi-modelMakefile target.Mutations catalog (
configs/mutations.yaml): replaced the minimalmutation stubs with a richer set of adversarial families —
authority_pressure,urgency_pressure,ambiguity_pressure, andbypass_request— each withparameterized templates covering authority bypass, deadline pressure,
incomplete documentation, review skipping, audit evasion, and forced yes/no
responses.
Test plan
make run-scenario-viewerand navigate to Scenario Inspector — Stages1–3 should render correctly before
make validateis runmake infer-multi-model(after clearing existing grok-3-miniresponses) and confirm
finish_reasonis no longerlengthmake perturband verify new mutation families appear inmutated_candidates.jsonlWrite your issue number after "Fixes "
This PR does not intend to fix any specific issue.
Please ensure all items are checked off before requesting a review: