From da52490d560332792fa1a1713eccd04e9e34c622 Mon Sep 17 00:00:00 2001 From: Ralph Date: Wed, 17 Jun 2026 08:58:47 +0100 Subject: [PATCH] Add eval docs for agent workflows --- README.md | 14 ++++++++++++++ evals/protobuf/cases.jsonl | 3 +++ evals/protobuf/rubric.md | 27 +++++++++++++++++++++++++++ 3 files changed, 44 insertions(+) create mode 100644 evals/protobuf/cases.jsonl create mode 100644 evals/protobuf/rubric.md diff --git a/README.md b/README.md index 8abb15b..8467843 100644 --- a/README.md +++ b/README.md @@ -40,6 +40,20 @@ Triggers on: `*.proto`, `buf.yaml`, `buf.*.yaml`, `buf.gen.yaml`, `buf.gen.*.yam - [Protovalidate][protovalidate] - [Connect RPC][connectrpc] +## Evals and production telemetry + +The `evals/protobuf/` directory contains a small human-review eval set for +schema evolution, Protovalidate design, and Buf configuration review. The cases +are harness-neutral so plugin behavior can be checked before release in Claude +Code or another agent workspace. + +If you publish this plugin through Telvine, keep runtime telemetry metadata-only: +`skill.invocation.start`, `skill.invocation.end`, and `skill.invocation.error` +for skill behavior, plus `plugin.component.invoked` and +`plugin.component.error` for non-skill components. Do not emit private schemas, +repository contents, generated code, connector payloads, tool arguments, or +model outputs. + ## Community For help and discussion around Protobuf, best practices, and more, join us on [Slack][slack]. diff --git a/evals/protobuf/cases.jsonl b/evals/protobuf/cases.jsonl new file mode 100644 index 0000000..8cf765c --- /dev/null +++ b/evals/protobuf/cases.jsonl @@ -0,0 +1,3 @@ +{"id":"schema-evolution","input":"Review a proto change that renames a field, reuses the old tag for a new meaning, and updates generated clients.","expected_outcome":"Flags tag reuse and wire-compatibility risks, recommends reserving old tags/names, and explains generated client impact."} +{"id":"protovalidate-design","input":"Add validation rules for an order message with currency, amount, status, and optional discount fields.","expected_outcome":"Uses Protovalidate-compatible constraints, avoids over-constraining future enum values, and explains cross-field validation assumptions."} +{"id":"buf-config-review","input":"Review a buf.yaml and buf.gen.yaml setup for Go and TypeScript generation in a Connect RPC service.","expected_outcome":"Checks lint/breaking config, module/dependency clarity, generation plugin choices, and avoids obsolete protoc-only guidance."} diff --git a/evals/protobuf/rubric.md b/evals/protobuf/rubric.md new file mode 100644 index 0000000..2c31f1e --- /dev/null +++ b/evals/protobuf/rubric.md @@ -0,0 +1,27 @@ +# Buf protobuf eval rubric + +Score each case from 1-5. + +## Compatibility + +- 5: Correctly identifies protobuf wire-compatibility and schema-evolution risks. +- 3: Finds major risks but misses some reserved-name/tag details. +- 1: Recommends breaking changes without warning. + +## Buf ecosystem fit + +- 5: Uses Buf, Connect, BSR, generation, lint, breaking, and Protovalidate concepts accurately. +- 3: Mostly correct with minor tool/config gaps. +- 1: Gives generic protobuf advice that does not fit Buf workflows. + +## Actionability + +- 5: Provides concrete config, schema, or migration steps maintainers can apply. +- 3: Gives useful advice that needs follow-up. +- 1: Is too vague to act on. + +## Privacy and telemetry + +- 5: Avoids emitting private schemas, repository files, connector payloads, tool arguments, or model outputs beyond the reviewed snippets. +- 3: Includes unnecessary operational detail without sensitive data. +- 1: Exposes private API or schema content.