Skip to content

fix(GRSModule): fix Scenario Inspector fallback, increase inference max_tokens, expand mutations catalog#3413

Merged
gorkem-bwl merged 3 commits intodevelopfrom
fix/grs-scenario-inspector-and-max-tokens
Feb 25, 2026
Merged

fix(GRSModule): fix Scenario Inspector fallback, increase inference max_tokens, expand mutations catalog#3413
gorkem-bwl merged 3 commits intodevelopfrom
fix/grs-scenario-inspector-and-max-tokens

Conversation

@sermengi
Copy link
Copy Markdown
Contributor

@sermengi sermengi commented Feb 24, 2026

Describe your changes

Summary

  • Scenario Inspector fallback (pages/1_Scenario_Inspector.py): when
    final/scenarios.jsonl doesn't exist yet, the inspector now falls back to
    mutated_candidates.jsonl (perturb stage) instead of
    base_scenarios_deduped.jsonl. Base scenarios lack seed_trace and
    mutation_trace, so Stages 1–3 were rendering empty/warnings. Mutated
    candidates have all the required fields; seed_trace and mutation_trace
    are reconstructed from obligation_id, base_scenario_id, and mutation.
    Base scenarios are kept as a last-resort fallback.

  • Inference token limit (src/infer/runner.py, src/cli.py, Makefile):
    increased default max_tokens from 500 to 2048. The old limit caused all
    grok-3-mini responses to be truncated mid-sentence (finish_reason: length).
    Updated in InferConfig, the --max-tokens CLI default, and the
    infer-multi-model Makefile target.

  • Mutations catalog (configs/mutations.yaml): replaced the minimal
    mutation stubs with a richer set of adversarial families — authority_pressure,
    urgency_pressure, ambiguity_pressure, and bypass_request — each with
    parameterized templates covering authority bypass, deadline pressure,
    incomplete documentation, review skipping, audit evasion, and forced yes/no
    responses.

Test plan

  • Run make run-scenario-viewer and navigate to Scenario Inspector — Stages
    1–3 should render correctly before make validate is run
  • Re-run make infer-multi-model (after clearing existing grok-3-mini
    responses) and confirm finish_reason is no longer length
  • Run make perturb and verify new mutation families appear in
    mutated_candidates.jsonl

Write your issue number after "Fixes "

This PR does not intend to fix any specific issue.

Please ensure all items are checked off before requesting a review:

  • I deployed the code locally.
  • I have performed a self-review of my code.
  • I have included the issue # in the PR.
  • I have labelled the PR correctly.
  • The issue I am working on is assigned to me.
  • I have avoided using hardcoded values to ensure scalability and maintain consistency across the application.
  • I have ensured that font sizes, color choices, and other UI elements are referenced from the theme.
  • My pull request is focused and addresses a single, specific feature.
  • If there are UI changes, I have attached a screenshot or video to this PR.

sermengi and others added 3 commits February 24, 2026 16:29
When final/scenarios.jsonl is absent, fall back to mutated_candidates.jsonl
instead of base_scenarios_deduped.jsonl. Mutated candidates carry
obligation_id, base_scenario_id, and mutation data, so seed_trace and
mutation_trace are reconstructed correctly, making Stages 1–3 render
properly before the validate stage is run.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The 500-token cap caused all grok-3-mini responses to be truncated
mid-sentence (finish_reason: length). Updated the default in InferConfig,
the CLI --max-tokens argument, and the infer-multi-model Makefile target.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Added authority_pressure, ambiguity_pressure, and bypass_request mutation
families with parameterized templates. Replaced the minimal stubs with
richer mutations covering authority bypass, deadline pressure, incomplete
documentation, review skipping, audit evasion, and forced yes/no responses.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sermengi sermengi self-assigned this Feb 24, 2026
@gorkem-bwl gorkem-bwl merged commit b192e6f into develop Feb 25, 2026
@MuhammadKhalilzadeh MuhammadKhalilzadeh deleted the fix/grs-scenario-inspector-and-max-tokens branch March 12, 2026 07:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants