From da52490d560332792fa1a1713eccd04e9e34c622 Mon Sep 17 00:00:00 2001
From: Ralph <rforgeon@gmail.com>
Date: Wed, 17 Jun 2026 08:58:47 +0100
Subject: [PATCH] Add eval docs for agent workflows

---
 README.md                  | 14 ++++++++++++++
 evals/protobuf/cases.jsonl |  3 +++
 evals/protobuf/rubric.md   | 27 +++++++++++++++++++++++++++
 3 files changed, 44 insertions(+)
 create mode 100644 evals/protobuf/cases.jsonl
 create mode 100644 evals/protobuf/rubric.md

diff --git a/README.md b/README.md
index 8abb15b..8467843 100644
--- a/README.md
+++ b/README.md
@@ -40,6 +40,20 @@ Triggers on: `*.proto`, `buf.yaml`, `buf.*.yaml`, `buf.gen.yaml`, `buf.gen.*.yam
 - [Protovalidate][protovalidate]
 - [Connect RPC][connectrpc]
 
+## Evals and production telemetry
+
+The `evals/protobuf/` directory contains a small human-review eval set for
+schema evolution, Protovalidate design, and Buf configuration review. The cases
+are harness-neutral so plugin behavior can be checked before release in Claude
+Code or another agent workspace.
+
+If you publish this plugin through Telvine, keep runtime telemetry metadata-only:
+`skill.invocation.start`, `skill.invocation.end`, and `skill.invocation.error`
+for skill behavior, plus `plugin.component.invoked` and
+`plugin.component.error` for non-skill components. Do not emit private schemas,
+repository contents, generated code, connector payloads, tool arguments, or
+model outputs.
+
 ## Community
 
 For help and discussion around Protobuf, best practices, and more, join us on [Slack][slack].
diff --git a/evals/protobuf/cases.jsonl b/evals/protobuf/cases.jsonl
new file mode 100644
index 0000000..8cf765c
--- /dev/null
+++ b/evals/protobuf/cases.jsonl
@@ -0,0 +1,3 @@
+{"id":"schema-evolution","input":"Review a proto change that renames a field, reuses the old tag for a new meaning, and updates generated clients.","expected_outcome":"Flags tag reuse and wire-compatibility risks, recommends reserving old tags/names, and explains generated client impact."}
+{"id":"protovalidate-design","input":"Add validation rules for an order message with currency, amount, status, and optional discount fields.","expected_outcome":"Uses Protovalidate-compatible constraints, avoids over-constraining future enum values, and explains cross-field validation assumptions."}
+{"id":"buf-config-review","input":"Review a buf.yaml and buf.gen.yaml setup for Go and TypeScript generation in a Connect RPC service.","expected_outcome":"Checks lint/breaking config, module/dependency clarity, generation plugin choices, and avoids obsolete protoc-only guidance."}
diff --git a/evals/protobuf/rubric.md b/evals/protobuf/rubric.md
new file mode 100644
index 0000000..2c31f1e
--- /dev/null
+++ b/evals/protobuf/rubric.md
@@ -0,0 +1,27 @@
+# Buf protobuf eval rubric
+
+Score each case from 1-5.
+
+## Compatibility
+
+- 5: Correctly identifies protobuf wire-compatibility and schema-evolution risks.
+- 3: Finds major risks but misses some reserved-name/tag details.
+- 1: Recommends breaking changes without warning.
+
+## Buf ecosystem fit
+
+- 5: Uses Buf, Connect, BSR, generation, lint, breaking, and Protovalidate concepts accurately.
+- 3: Mostly correct with minor tool/config gaps.
+- 1: Gives generic protobuf advice that does not fit Buf workflows.
+
+## Actionability
+
+- 5: Provides concrete config, schema, or migration steps maintainers can apply.
+- 3: Gives useful advice that needs follow-up.
+- 1: Is too vague to act on.
+
+## Privacy and telemetry
+
+- 5: Avoids emitting private schemas, repository files, connector payloads, tool arguments, or model outputs beyond the reviewed snippets.
+- 3: Includes unnecessary operational detail without sensitive data.
+- 1: Exposes private API or schema content.