Skip to content

Add Weather Agent use case with harness, gateway, guardrails, evaluation and observability#1648

Open
JobRamos wants to merge 1 commit into
awslabs:mainfrom
JobRamos:feature/weather-agent-use-case
Open

Add Weather Agent use case with harness, gateway, guardrails, evaluation and observability#1648
JobRamos wants to merge 1 commit into
awslabs:mainfrom
JobRamos:feature/weather-agent-use-case

Conversation

@JobRamos

Copy link
Copy Markdown
Contributor

Summary

  • New end-to-end use case: Weather Agent web app demonstrating four AgentCore pillars
  • Harness + Gateway (Exa MCP) + Guardrails (PII) + Batch Evaluations + Observability
  • Full-stack: FastAPI backend + React frontend, single ./start.sh command
  • Also includes CLI-only mode (./run.sh) for headless demos

Test plan

  • Complete prerequisites (IAM permissions, enable Claude Haiku 4.5, enable CloudWatch Transaction Search)
  • Run ./start.sh on a fresh account
  • Verify chat works (ask weather questions, confirm tool calls)
  • Verify weather cards appear on the right panel
  • Verify traces show up after sending messages
  • Run batch evaluation and confirm scores appear
  • Run ./cleanup.sh and confirm all AWS resources are deleted

@github-actions

Copy link
Copy Markdown

Latest scan for commit: 98feb4c | Updated: 2026-06-11 20:10:35 UTC

Security Scan Results

Scan Metadata

  • Project: ASH
  • Scan executed: 2026-06-11T20:10:18+00:00
  • ASH version: 3.0.0

Summary

Scanner Results

The table below shows findings by scanner, with status based on severity thresholds and dependencies:

Column Explanations:

Severity Levels (S/C/H/M/L/I):

  • Suppressed (S): Security findings that have been explicitly suppressed/ignored and don't affect the scanner's pass/fail status
  • Critical (C): The most severe security vulnerabilities requiring immediate remediation (e.g., SQL injection, remote code execution)
  • High (H): Serious security vulnerabilities that should be addressed promptly (e.g., authentication bypasses, privilege escalation)
  • Medium (M): Moderate security risks that should be addressed in normal development cycles (e.g., weak encryption, input validation issues)
  • Low (L): Minor security concerns with limited impact (e.g., information disclosure, weak recommendations)
  • Info (I): Informational findings for awareness with minimal security risk (e.g., code quality suggestions, best practice recommendations)

Other Columns:

  • Time: Duration taken by each scanner to complete its analysis
  • Action: Total number of actionable findings at or above the configured severity threshold that require attention

Scanner Results:

  • PASSED: Scanner found no security issues at or above the configured severity threshold - code is clean for this scanner
  • FAILED: Scanner found security vulnerabilities at or above the threshold that require attention and remediation
  • MISSING: Scanner could not run because required dependencies/tools are not installed or available
  • SKIPPED: Scanner was intentionally disabled or excluded from this scan
  • ERROR: Scanner encountered an execution error and could not complete successfully

Severity Thresholds (Thresh Column):

  • CRITICAL: Only Critical severity findings cause scanner to fail
  • HIGH: High and Critical severity findings cause scanner to fail
  • MEDIUM (MED): Medium, High, and Critical severity findings cause scanner to fail
  • LOW: Low, Medium, High, and Critical severity findings cause scanner to fail
  • ALL: Any finding of any severity level causes scanner to fail

Threshold Source: Values in parentheses indicate where the threshold is configured:

  • (g) = global: Set in the global_settings section of ASH configuration
  • (c) = config: Set in the individual scanner configuration section
  • (s) = scanner: Default threshold built into the scanner itself

Statistics calculation:

  • All statistics are calculated from the final aggregated SARIF report
  • Suppressed findings are counted separately and do not contribute to actionable findings
  • Scanner status is determined by comparing actionable findings to the threshold
Scanner S C H M L I Time Action Result Thresh
bandit 0 1 0 0 3 0 987ms 1 FAILED MED (g)
cdk-nag 0 0 0 0 0 0 6.6s 0 PASSED MED (g)
cfn-nag 0 0 0 0 0 0 124ms 0 PASSED MED (g)
checkov 0 0 0 0 0 0 5.9s 0 PASSED MED (g)
detect-secrets 0 0 0 0 0 0 864ms 0 PASSED MED (g)
grype 0 0 0 0 0 0 49.4s 0 PASSED MED (g)
npm-audit 0 0 0 0 0 0 861ms 0 PASSED MED (g)
opengrep 0 0 0 0 0 0 <1ms 0 SKIPPED MED (g)
semgrep 0 0 0 0 0 0 <1ms 0 MISSING MED (g)
syft 0 0 0 0 0 0 2.2s 0 PASSED MED (g)

Detailed Findings

Show 1 actionable findings

Finding 1: B104

  • Severity: HIGH
  • Scanner: bandit
  • Rule ID: B104
  • Location: 01-features/01-harness/02-use-cases/04-weather-agent/backend/main.py:130-131

Description:
Possible binding to all interfaces.

Code Snippet:

import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

Report generated by Automated Security Helper (ASH) at 2026-06-11T20:10:14+00:00

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant