Skip to content

feat: add eval progress tracking#5858

Merged
jam-jee merged 3 commits into
aws:masterfrom
joshuatowner:eval-observability
May 19, 2026
Merged

feat: add eval progress tracking#5858
jam-jee merged 3 commits into
aws:masterfrom
joshuatowner:eval-observability

Conversation

@joshuatowner

@joshuatowner joshuatowner commented May 14, 2026

Copy link
Copy Markdown
Contributor

Add eval progress tracking

Replaces the Rich Live spinner in the terminal wait() path with plain-text status prints that stream correctly to agents and piped processes.

Problem

The eval wait() terminal path used a Rich Live spinner that:

  • Outputs ANSI escape sequences agents can't parse
  • Uses transient=True so output disappears after completion
  • Only shows "Current status: Executing" with no step details

Changes

  • Remove Rich Live spinner from terminal wait() path
  • Print pipeline name and execution ARN at start
  • Print step transitions with ✓/⋯, durations, and job ARNs on each poll
  • Print "Running... Xs" elapsed timer for in-progress steps
  • Print S3 output path on success
  • Print failed step details with log group and CloudWatch link on failure
  • All prints use flush=True for agent compatibility
  • Log group derived from job ARN type (training/processing/transform)

Jupyter path is unchanged.

Example output

Evaluation started: eval-meta-5190c33c
Pipeline: SagemakerEvaluation-LLMAJEvaluation-565d3249
Execution ARN: arn:aws:sagemaker:us-west-2:099324990371:pipeline/.../execution/rstzamudixmz

--------------------------------------

Status Transitions:
  ✓ CreateEvaluationAction: Succeeded (3.0s)
  ⋯ EvaluateCustomInferenceModel: Executing (Running... 17s)
    Job ARN: arn:aws:sagemaker:us-west-2:099324990371:training-job/CustomInference-rstzamudixmz-6ZP4YbszZc

Status: Executing (Elapsed: 21.5s)

On failure:

Failed
Failure reason: Step failure: One or multiple steps failed.

Failed step: EvaluateCustomInferenceModel
Failure reason: ClientError: No S3 objects found under S3 URL...
Job ARN: arn:aws:sagemaker:us-west-2:099324990371:training-job/CustomInference-xyz
Log group: /aws/sagemaker/TrainingJobs
Log stream prefix: CustomInference-xyz
CloudWatch Logs: https://us-west-2.console.aws.amazon.com/cloudwatch/...

Testing

  • 382 existing unit tests pass (0 regressions)
  • 4 new tests covering start, step transitions, success, and failure output
  • Manually validated with real LLM-as-Judge eval jobs

@joshuatowner joshuatowner changed the title feat: replace eval Rich spinner with plain print observability (match… feat: replace eval Rich spinner with rich terminal output matching training May 14, 2026
@joshuatowner joshuatowner changed the title feat: replace eval Rich spinner with rich terminal output matching training feat: add eval progress tracking May 15, 2026
@joshuatowner joshuatowner marked this pull request as ready for review May 18, 2026 07:39
@jam-jee jam-jee merged commit 9c9c50d into aws:master May 19, 2026
19 of 31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants