Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
8ccbdaa
README: drop section sign, replace wrapping Unicode dividers
JayantChopra May 11, 2026
afc7cbb
README: clickable links, real markdown tables, trim repo map
JayantChopra May 11, 2026
a08aa12
README: fix one-character-short border on opening-box arrow row
JayantChopra May 11, 2026
dbcb556
README: normalize all Section 2 step-box widths to 74 chars
JayantChopra May 11, 2026
0f59bd1
agents: reshape 8 personas as buildable scaffolds, not pre-built code
JayantChopra May 13, 2026
641023c
Document working Keystone patterns, add LEARNINGS.md
JayantChopra May 13, 2026
7c77b6f
Repo policy: test agents stay outside the repo; private Polarity feed…
JayantChopra May 13, 2026
952d7ca
Validate 6 scaffolds end-to-end on Keystone with grok-4-fast
JayantChopra May 13, 2026
be759a4
README: centered header with badges + nav; add Apache 2.0 LICENSE
JayantChopra May 13, 2026
37fdf4d
README: correct verified count to 6/9 agents and 9/12 specs, remove e…
JayantChopra May 13, 2026
b1d36e2
Polish for fresh-clone readiness: prompts, walkthrough, OSS files
JayantChopra May 13, 2026
4ffda17
README: remove stray em dash from the first-real-eval walkthrough
JayantChopra May 13, 2026
77db586
Embed trailer + spec-creation video, .github/ for community files, sp…
JayantChopra May 13, 2026
8bd7975
Embed videos as clickable covers (GitHub strips video tags from repo …
JayantChopra May 13, 2026
3dc274e
README: drop caption text under video covers (clickability is obvious)
JayantChopra May 13, 2026
e70a995
GitHub automation: PR template, issue forms, CI validate, CODEOWNERS
JayantChopra May 14, 2026
dbbf602
README: remove broken assets/banner.png placeholder; trailer cover IS…
JayantChopra May 14, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Reviewers auto-assigned by GitHub when a PR touches matching paths.
#
# NOTE: GitHub silently drops rules that reference users / teams without
# write access on the repo. If a path here doesn't have an effective owner
# (e.g. you renamed the org or removed the user), the rule does nothing.

# Default owner for everything.
* @JayantChopra

# Repo-shape changes (CI, templates, code of conduct, security policy).
/.github/ @JayantChopra

# The reference implementation: the only committed agent code.
# Touching this changes the canonical pattern other scaffolds mirror.
/agents/stripe-refund-aud/ @JayantChopra

# Validation tooling: gates every PR.
/scripts/ @JayantChopra

# The field notebook of Keystone behaviors. Be careful when changing.
/LEARNINGS.md @JayantChopra
41 changes: 41 additions & 0 deletions .github/CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Code of Conduct

## Our pledge

We pledge to make participation in this project a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.

## Our standards

Examples of behavior that contributes to a positive environment:

- Using welcoming and inclusive language.
- Being respectful of differing viewpoints and experiences.
- Gracefully accepting constructive criticism.
- Focusing on what is best for the community.
- Showing empathy towards other community members.

Examples of unacceptable behavior:

- The use of sexualized language or imagery, and sexual attention or advances of any kind.
- Trolling, insulting or derogatory comments, and personal or political attacks.
- Public or private harassment.
- Publishing others' private information, such as a physical or email address, without their explicit permission.
- Other conduct which could reasonably be considered inappropriate in a professional setting.

## Enforcement responsibilities

Project maintainers are responsible for clarifying and enforcing standards of acceptable behavior and will take appropriate and fair corrective action in response to any behavior they deem inappropriate, threatening, offensive, or harmful.

## Scope

This Code of Conduct applies within all project spaces (issues, PRs, discussions) and when an individual is officially representing the project in public spaces.

## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at **support@polarity.so**. All complaints will be reviewed and investigated promptly and fairly.

All project maintainers are obligated to respect the privacy and security of the reporter of any incident.

## Attribution

This Code of Conduct is adapted from the [Contributor Covenant](https://www.contributor-covenant.org/), version 2.1.
114 changes: 114 additions & 0 deletions .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
# Contributing

Thanks for your interest. The repo is a planning workspace for Polarity Keystone evals: scaffolds (what an agent should do), specs (acceptance tests), and a working notebook of what runs end-to-end.

## What we welcome

- Bug reports against existing scaffolds, specs, or scripts.
- New agent scaffolds or new spec scenarios.
- Validation of the deferred scaffolds (`db-architect`, `devops-shell`, `data-pipeline`).
- Improvements to docs, especially [LEARNINGS.md](LEARNINGS.md) when you discover a new Keystone behavior.
- Fixes to the validation tooling under `scripts/`.

## Before you write code

1. Skim [README.md](README.md) for the repo shape.
2. Read [LEARNINGS.md](LEARNINGS.md). It's the difference between an hour and a day of debugging.
3. For non-trivial work, open an issue first so we can align on direction.

## Repo policy

The repo ships **scaffolds + specs + one reference implementation**. Test agents you build to validate a scaffold are throwaway: build them under `/tmp/build/<slug>/`, upload as a snapshot for the test, then delete the local copy. Don't commit them.

The single committed implementation is `agents/stripe-refund-aud/agent.py`. It's the canonical pattern reference.

## Local setup

```bash
# 1. Install the ks CLI
curl -fsSL https://ks.polarity.so/install.sh | bash

# 2. Wire your Keystone API key
ks setup api-key

# 3. Confirm
ks setup doctor
```

For working with the SDK locally (uploading snapshot agents):

```bash
pip install polarity-keystone
```

For the validation script:

```bash
pip install yamllint pyyaml
```

## Workflow for a new spec

```
1. planning/<spec>.md copy from planning/_template.md
fill in the five questions
2. drafts/<spec>.yaml copy from specs/_template.yaml
iterate freely
3. bash scripts/validate.sh
local lint passes
4. specs/<domain>/<spec>.yaml
promote when stable
5. ks eval run <path> actually run on Keystone
```

## Workflow for a new scaffold

```
1. agents/<slug>/AGENT.md copy from agents/_template.md
describe purpose, inputs, outputs,
acceptance criteria
2. specs/<domain>/<spec>.yaml the acceptance spec for the scaffold
3. Validate locally bash scripts/validate.sh
4. Validate end-to-end build a throwaway agent in /tmp/build/<slug>/,
upload, run the spec, mark scaffold-verified,
delete the throwaway
5. Open a PR
```

## Validation

Before opening a PR:

```bash
bash scripts/validate.sh
```

The script enforces:

- Valid YAML.
- Required spec fields (`version`, `id`, `base`, `task`, `scoring|invariants`).
- `id` is kebab-case and matches the filename.
- Every scoring rule has a positive weight (Keystone server rejects `weight=0`).
- Every `agent.snapshot` references an `agents/<slug>/` folder.

## Conventions

- **Slug = kebab-case = filename stem.** Agent slugs match folder names; spec ids match filenames. The linter enforces this.
- **No emojis in docs or code.**
- **No commit-message attribution to AI tools.** Don't add `Co-Authored-By: Claude` or similar trailers; don't add "Generated with..." footers to PR bodies.
- **Keep PRs focused.** One scaffold or one spec per PR if practical.
- **Update LEARNINGS.md** when you discover a new Keystone behavior, even if you also worked around it in code. Future contributors will thank you.

## PR review

We aim for one reviewer turnaround within a few days. PRs that:

- Pass `bash scripts/validate.sh` locally,
- Reference an issue (or include a one-line problem statement),
- Touch one scaffold or one spec at a time,

…get reviewed faster.

## License

By contributing, you agree your contributions are licensed under the [Apache License 2.0](LICENSE).
67 changes: 67 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
name: Bug report
description: Something is broken or behaves unexpectedly in a scaffold, spec, or script.
title: "bug: <short description>"
labels: ["bug", "triage"]
body:
- type: markdown
attributes:
value: |
Thanks for filing a bug. The more specific you can be, the faster we can fix it.

For security issues, **do not file a public issue**. See [SECURITY.md](../blob/main/.github/SECURITY.md).

- type: textarea
id: summary
attributes:
label: Summary
description: One or two sentences describing the bug.
placeholder: "scripts/validate.sh fails with 'missing scoring block' on a spec that has a top-level `scoring:` section."
validations:
required: true

- type: textarea
id: repro
attributes:
label: Steps to reproduce
description: Minimal steps. Include the spec / scaffold path, command, and any error output.
placeholder: |
1. Clone the repo at <commit SHA>.
2. Run `bash scripts/validate.sh`.
3. Observe error: ...
render: markdown
validations:
required: true

- type: textarea
id: expected
attributes:
label: Expected behavior
validations:
required: true

- type: textarea
id: actual
attributes:
label: Actual behavior
description: Paste full error output / experiment id / scenario JSON if relevant.
validations:
required: true

- type: input
id: ks_version
attributes:
label: Keystone CLI version
description: Output of `ks --version`.
placeholder: "ks version v0.1.13"

- type: input
id: os
attributes:
label: Host OS
placeholder: "macOS 14.5 / Ubuntu 24.04"

- type: textarea
id: notes
attributes:
label: Anything else
description: Logs, screenshots, hypotheses, related issues, etc.
11 changes: 11 additions & 0 deletions .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
blank_issues_enabled: false
contact_links:
- name: Polarity Keystone documentation
url: https://docs.polarity.so/
about: Questions about Keystone itself (not this repo) belong upstream.
- name: Security disclosures
url: https://github.com/Polarityinc/Promising-Spec-Library/blob/main/.github/SECURITY.md
about: Do NOT file a public issue for vulnerabilities.
- name: Polarity support
url: mailto:support@polarity.so
about: General Polarity questions / commercial inquiries.
66 changes: 66 additions & 0 deletions .github/ISSUE_TEMPLATE/spec_or_scaffold_proposal.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
name: New scaffold or spec proposal
description: Propose a new agent scaffold, a new acceptance spec, or both.
title: "proposal: <short description>"
labels: ["proposal", "triage"]
body:
- type: markdown
attributes:
value: |
Use this template to propose a new agent scaffold, a new acceptance spec, or a pair of both.

Read [LEARNINGS.md](../blob/main/LEARNINGS.md) first — Keystone has six undocumented behaviors that constrain what's buildable. Knowing them upfront saves redesign cycles.

- type: dropdown
id: kind
attributes:
label: What are you proposing?
options:
- New agent scaffold (with matching spec)
- New spec for an existing scaffold
- New scaffold without a spec (just the persona)
validations:
required: true

- type: textarea
id: purpose
attributes:
label: Purpose
description: What's the agent supposed to do, in one paragraph?
placeholder: "An agent that reads a Slack channel for outage chatter and posts a structured summary to a Notion page when the chatter clusters around a single incident."
validations:
required: true

- type: textarea
id: acceptance
attributes:
label: Acceptance criteria
description: How will we know the agent works? What invariants would the spec check?
placeholder: |
- summary.md was created
- summary mentions exactly one incident id
- LLM judge: summary is faithful to the input messages
validations:
required: true

- type: textarea
id: services
attributes:
label: External services / data sources
description: Any APIs, databases, or http_mock services the agent talks to. Be specific about whether they need to be reachable inside the sandbox.
placeholder: "Slack API (read-only), Notion API (write). Both can be mocked via http_mock for the spec's test scenarios."

- type: dropdown
id: agent_type
attributes:
label: Likely agent type
description: Snapshot for anything talking to declared services; type:python for embedded code with setup.files inputs.
options:
- snapshot (talks to services / reusable)
- python (embedded code in spec, no services)
- cli (smoke test, no LLM)
- not sure

- type: textarea
id: notes
attributes:
label: Anything else
50 changes: 50 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
<!--
Thanks for sending a pull request. Please fill out the sections below.
For non-trivial changes, open an issue first so we can align on direction.
-->

## Summary

<!-- One or two sentences. What does this change do, and why? -->

## Type of change

<!-- Mark with [x] -->

- [ ] `scaffold` — new agent scaffold (`agents/<slug>/AGENT.md`)
- [ ] `spec` — new acceptance spec (`specs/<domain>/<spec>.yaml`)
- [ ] `validation` — verified an existing scaffold end-to-end on Keystone (status flip + LEARNINGS update)
- [ ] `fix` — bug fix in a scaffold, spec, or script
- [ ] `docs` — README / LEARNINGS / docs only
- [ ] `tooling` — `scripts/`, `.github/`, configs
- [ ] `chore` — anything else

## What changed

<!-- Bullet the important changes. -->

## How was this tested

<!--
- [ ] `bash scripts/validate.sh` passes locally
- [ ] If adding/changing a scaffold: built a throwaway agent at `/tmp/build/<slug>/`,
uploaded as a snapshot, ran the linked spec on Keystone, captured the
experiment id below, then deleted the throwaway.
- [ ] If adding a real reference implementation: it's stdlib-only and fits the bundle cap.
-->

**Experiment IDs (if applicable):**

<!-- e.g. exp-abc12345-xyz -->

## Related issues

<!-- e.g. Fixes #123. Required for non-trivial PRs. -->

## Checklist

- [ ] No secrets, API keys, or `.env` content committed.
- [ ] No `Co-Authored-By` or AI-tool attribution lines in commit messages or this PR body.
- [ ] If this adds a scaffold: the AGENT.md follows `agents/_template.md` shape.
- [ ] If this adds a spec: it validates against `scripts/validate.sh` and references an existing `agents/<slug>/`.
- [ ] If this changes Keystone-runtime behavior we discovered: [LEARNINGS.md](../LEARNINGS.md) is updated.
Loading
Loading