Skip to content

AAPP-871 Guardrail#37

Open
bigdata2 wants to merge 2 commits intomainfrom
guardrail
Open

AAPP-871 Guardrail#37
bigdata2 wants to merge 2 commits intomainfrom
guardrail

Conversation

@bigdata2
Copy link
Collaborator

@bigdata2 bigdata2 commented Mar 3, 2026

Add Guardrails client with tracing integration

Adds a Guardrails client to the ADK that allows users to evaluate content against safety rails directly from their agents.

What's included

  • gradient_adk/guardrails.py — Async Guardrails client with a single check() method for evaluating content against jailbreak, content moderation, and sensitive data rails
  • Tracing integration — Guardrail evaluations are automatically captured as tool spans in the ADK trace when used inside @entrypoint
  • __init__.py — Exports Guardrails as a public API
  • Unit tests — 12 tests covering check(), error handling, and tracing
  • README — Updated with guardrails documentation, usage examples, and rail type reference

Usage

from gradient_adk import entrypoint, RequestContext, Guardrails

guardrails = Guardrails()

@entrypoint
async def main(input: dict, context: RequestContext):
    result = await guardrails.check(
        rail_type="jailbreak",
        messages=[{"role": "user", "content": input["prompt"]}],
    )
    if not result["allowed"]:
        return {"error": "Blocked", "violations": result["violations"]}
    return await process(input["prompt"])

Available rail types

Rail Type Description
jailbreak Detects prompt injection and jailbreak attempts
content_moderation Detects harmful, violent, or inappropriate content
sensitive_data Detects PII and sensitive information

Testing

  • Unit tests: 12 tests, all passing
  • Local E2E: Tested via gradient agent run against preview guardrails service
  • Preview E2E: Jailbreak and content moderation rails verified (safe + unsafe prompts)
  • Production E2E: Deployed and verified at guardrails.do-ai.run/v2/rail

bigdata2 added 2 commits March 2, 2026 15:47
Add a Guardrails client to the ADK that allows users to evaluate content
against safety rails (jailbreak, content_moderation, sensitive_data).
Guardrail evaluations are automatically captured as tool spans in the
ADK trace when used inside @entrypoint.

- Add gradient_adk/guardrails.py with Guardrails client, result types,
  and tracing span integration
- Export Guardrails, GuardrailResult, GuardrailsError from __init__.py
- Add 13 unit tests covering check(), error handling, and tracing
- Update README with guardrails feature documentation and examples
@bigdata2 bigdata2 self-assigned this Mar 3, 2026
@bigdata2 bigdata2 requested a review from bbatha March 3, 2026 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant