Skip to content

feat: extend code-grader to accept plain-text and exit-code output #1210

@christso

Description

@christso

Background

Issue #1207 was implemented with a new `shell` grader type. On review, this violates the design principle of auditing existing primitives first — `code-grader` already runs shell commands. The `shell` type is just `code-grader` with a simpler output protocol.

Promptfoo's equivalent (`javascript`/`python` assertions) solves this by having a graceful fallback: if a script returns a plain boolean, number, or string instead of the full structured result, it's accepted. They do not have a `shell` type.

Problem with current code-grader

`code-grader` currently requires stdout to be a JSON object with `{ score, assertions }`. Two behaviors make simple one-liners unusable:

  1. Empty stdout + exit 0 → score 0 (wrong; should be score 1)
  2. Non-zero exit code → throws as execution error (should be score 0 / fail)

Desired Behavior

Extend `code-grader` with a plain-text fallback: when stdout is not valid JSON, interpret it as:

stdout (trimmed, case-insensitive) score
empty 1 (exit-code convention: 0 = pass)
`"true"`, `"pass"`, `"1"` 1
`"false"`, `"fail"`, `"0"` 0
numeric string clamped float score
anything else 0

Non-zero exit with non-JSON stdout → score 0 (fail), not execution error.

Outcome

Simple workspace checks become concise without a new grader type:

```yaml

  • type: code-grader
    command: ["bash", "-c", "[ $(pdfinfo report.pdf | grep Pages | awk '{print $2}') -ge 5 ]"]

  • type: code-grader
    command: ["bash", "-c", "echo $(wc -l < output.txt)"] # stdout numeric → score
    ```

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions