Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions demo/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.demo-workdir/
116 changes: 116 additions & 0 deletions demo/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
# Demo: RunProof en 60 segundos

Esta carpeta muestra el potencial de RunProof con una historia mínima y reproducible:

1. Un agente dice: **"listo, los tests pasan"**.
2. El código todavía está roto.
3. RunProof ejecuta el comando real y bloquea el avance.
4. Se aplica una corrección de una línea.
5. RunProof acepta la verificación solo después de capturar evidencia de ejecución exitosa.

## Qué hay en esta demo

```text
demo/
├── broken-app/
│ ├── app.js # bug intencional: suma usando resta
│ ├── package.json # sin dependencias externas
│ └── test.js # una aserción que debe fallar al inicio
└── scripts/
├── run-demo.sh # demo automatizada para macOS/Linux
└── run-demo.ps1 # demo automatizada para Windows PowerShell
```

El bug intencional está en `broken-app/app.js`:

```js
function sum(a, b) {
return a - b;
}
```

## Ejecutar la demo completa

Desde la raíz del repo:

```bash
./demo/scripts/run-demo.sh
```

En Windows PowerShell:

```powershell
.\demo\scripts\run-demo.ps1
```

Los scripts crean `demo/.demo-workdir/` y corren el flujo normal de usuario: `runproof init`, `runproof run`, edición de artefactos, `runproof ready`, `runproof transition`, verificación fallida, fix de una línea y verificación exitosa. La carpeta temporal está ignorada por git.

## Recorrido manual

### 1. Demuestra que el código está roto

```bash
npm test --prefix demo/broken-app
```

Salida esperada: Node lanza un `AssertionError` porque `sum(2, 2)` devuelve `0` en vez de `4`.

### 2. Lee la promesa falsa del agente

> "Listo, los tests pasan."

RunProof no acepta esa frase como evidencia. Necesita ejecutar el comando.

### 3. Observa cómo RunProof bloquea el cierre falso

El script automatizado sigue primero los pasos normales del workflow:

```bash
python -m runproof init --no-prompt --root demo/.demo-workdir
python -m runproof run demo-sum-bug --profile quick --title "Fix broken sum demo" --root demo/.demo-workdir
# editar proposal.md
python -m runproof ready demo-sum-bug --root demo/.demo-workdir
python -m runproof transition demo-sum-bug task --root demo/.demo-workdir
# editar tasks.md
python -m runproof ready demo-sum-bug --root demo/.demo-workdir
```

Después ejecuta la verificación real dentro del workspace desechable:

```bash
python -m runproof verify demo-sum-bug --command "npm test --prefix broken-app" --root demo/.demo-workdir
```

Con el bug presente, RunProof devuelve un error similar a:

```text
✗ ERROR: .runproof/evidence/demo-sum-bug: verification command failed (exit 1): npm test --prefix broken-app
```

### 4. Aplica la corrección real

Cambia la implementación a:

```js
function sum(a, b) {
return a + b;
}
```

### 5. Vuelve a verificar

Cuando el comando realmente pasa, RunProof registra la evidencia:

```text
✔ Verification recorded: demo-sum-bug
```

## Por qué esto muestra el potencial

RunProof convierte una afirmación informal —"ya está"— en una regla verificable del repositorio:

- si el comando no se ejecutó, no hay evidencia;
- si el comando falló, el cambio queda bloqueado;
- si el comando pasó, queda un registro con salida y checksum bajo `.runproof/evidence/`.

**Promesa corta:** RunProof evita cierres falsos de agentes y solo acepta progreso respaldado por ejecución real.
5 changes: 5 additions & 0 deletions demo/broken-app/app.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
function sum(a, b) {
return a - b;
}

module.exports = { sum };
9 changes: 9 additions & 0 deletions demo/broken-app/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{
"name": "runproof-demo-broken-app",
"private": true,
"version": "1.0.0",
"description": "Tiny intentionally broken app used by the RunProof demo.",
"scripts": {
"test": "node test.js"
}
}
5 changes: 5 additions & 0 deletions demo/broken-app/test.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
const assert = require("node:assert/strict");
const { sum } = require("./app");

assert.equal(sum(2, 2), 4);
console.log("PASS: sum(2, 2) === 4");
110 changes: 110 additions & 0 deletions demo/scripts/run-demo.ps1
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
$ErrorActionPreference = "Stop"

$ScriptDir = Split-Path -Parent $MyInvocation.MyCommand.Path
$DemoDir = Resolve-Path (Join-Path $ScriptDir "..")
$RepoRoot = Resolve-Path (Join-Path $DemoDir "..")
$WorkDir = Join-Path $DemoDir ".demo-workdir"

if ($env:PYTHONPATH) {
$env:PYTHONPATH = "$RepoRoot$([IO.Path]::PathSeparator)$env:PYTHONPATH"
} else {
$env:PYTHONPATH = "$RepoRoot"
}

function Section($Text) {
Write-Host ""
Write-Host "-- $Text"
}

function Invoke-ExpectFail([scriptblock]$Command, [string]$Label) {
& $Command
if ($LASTEXITCODE -eq 0) {
throw "Expected failure, but command passed: $Label"
}
Write-Host "✓ blocked as expected (exit $LASTEXITCODE)"
}

if (Test-Path $WorkDir) {
Remove-Item -Recurse -Force $WorkDir
}
New-Item -ItemType Directory -Path $WorkDir | Out-Null
Copy-Item -Recurse (Join-Path $DemoDir "broken-app") (Join-Path $WorkDir "broken-app")

Section "1/7 Start the normal RunProof workflow"
python -m runproof init --no-prompt --root $WorkDir
python -m runproof run demo-sum-bug --profile quick --title "Fix broken sum demo" --root $WorkDir

Section "2/7 User edits proposal.md, then marks it ready"
@'
---
schema: sdd.artifact.v1
artifact: proposal
change_id: demo-sum-bug
profile: quick
status: draft
created: 2026-05-08
updated: 2026-05-08
---
# Proposal

## Intent

Demonstrate that RunProof blocks a broken test run even when an agent claims the fix is complete.

## Scope

- Keep one intentionally broken function under `broken-app/`.
- Verify the change with `npm test --prefix broken-app`.

## Non-Scope

- No UI.
- No external dependencies.
'@ | Set-Content -Path (Join-Path $WorkDir ".runproof/changes/demo-sum-bug/proposal.md") -NoNewline
python -m runproof ready demo-sum-bug --root $WorkDir
python -m runproof transition demo-sum-bug task --root $WorkDir
python -m runproof run demo-sum-bug --no-create --root $WorkDir

Section "3/7 User edits tasks.md, then marks it ready"
@'
---
schema: sdd.artifact.v1
artifact: tasks
change_id: demo-sum-bug
profile: quick
status: draft
created: 2026-05-08
updated: 2026-05-08
---
# Tasks

- [x] T-001 Reproduce the failing test for the broken sum demo.
- Requirement: failing baseline is visible
- Evidence: `npm test --prefix broken-app`
- [x] T-002 Verify RunProof blocks the failing command before the fix.
- Requirement: fake completion is blocked
- Evidence: `runproof verify demo-sum-bug --command "npm test --prefix broken-app"`
- [x] T-003 Apply the one-line fix and capture passing evidence.
- Requirement: real execution passes
- Evidence: `npm test --prefix broken-app`
'@ | Set-Content -Path (Join-Path $WorkDir ".runproof/changes/demo-sum-bug/tasks.md") -NoNewline
python -m runproof ready demo-sum-bug --root $WorkDir
python -m runproof run demo-sum-bug --no-create --root $WorkDir

Section "4/7 An agent claims: 'done, tests pass'"
Write-Host "🤖 Agent: done, tests pass."

Section "5/7 Reality check: the command fails"
Invoke-ExpectFail { npm test --prefix (Join-Path $WorkDir "broken-app") } "npm test"

Section "6/7 RunProof blocks the fake completion"
Invoke-ExpectFail { python -m runproof verify demo-sum-bug --command "npm test --prefix broken-app" --root $WorkDir } "runproof verify"

Section "7/7 Apply the one-line fix and record real passing evidence"
$AppPath = Join-Path $WorkDir "broken-app/app.js"
(Get-Content $AppPath -Raw).Replace("return a - b;", "return a + b;") | Set-Content -Path $AppPath -NoNewline
npm test --prefix (Join-Path $WorkDir "broken-app")
python -m runproof verify demo-sum-bug --command "npm test --prefix broken-app" --root $WorkDir

Write-Host ""
Write-Host "✅ Demo complete. Evidence is in $WorkDir/.runproof/evidence/demo-sum-bug/"
114 changes: 114 additions & 0 deletions demo/scripts/run-demo.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
#!/usr/bin/env bash
set -u

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
DEMO_DIR="$(cd "$SCRIPT_DIR/.." && pwd)"
REPO_ROOT="$(cd "$DEMO_DIR/.." && pwd)"
WORKDIR="$DEMO_DIR/.demo-workdir"
RUNPROOF=(python -m runproof)

export PYTHONPATH="$REPO_ROOT${PYTHONPATH:+:$PYTHONPATH}"

section() {
printf '\n── %s\n' "$1"
}

run_expect_fail() {
set +e
"$@"
status=$?
set -e
if [ "$status" -eq 0 ]; then
printf 'Expected failure, but command passed: %s\n' "$*" >&2
exit 1
fi
printf '✓ blocked as expected (exit %s)\n' "$status"
}

set -e
rm -rf "$WORKDIR"
mkdir -p "$WORKDIR"
cp -R "$DEMO_DIR/broken-app" "$WORKDIR/broken-app"

section "1/7 Start the normal RunProof workflow"
"${RUNPROOF[@]}" init --no-prompt --root "$WORKDIR"
"${RUNPROOF[@]}" run demo-sum-bug --profile quick --title "Fix broken sum demo" --root "$WORKDIR"

section "2/7 User edits proposal.md, then marks it ready"
cat > "$WORKDIR/.runproof/changes/demo-sum-bug/proposal.md" <<'MARKDOWN'
---
schema: sdd.artifact.v1
artifact: proposal
change_id: demo-sum-bug
profile: quick
status: draft
created: 2026-05-08
updated: 2026-05-08
---
# Proposal

## Intent

Demonstrate that RunProof blocks a broken test run even when an agent claims the fix is complete.

## Scope

- Keep one intentionally broken function under `broken-app/`.
- Verify the change with `npm test --prefix broken-app`.

## Non-Scope

- No UI.
- No external dependencies.
MARKDOWN
"${RUNPROOF[@]}" ready demo-sum-bug --root "$WORKDIR"
"${RUNPROOF[@]}" transition demo-sum-bug task --root "$WORKDIR"
"${RUNPROOF[@]}" run demo-sum-bug --no-create --root "$WORKDIR"

section "3/7 User edits tasks.md, then marks it ready"
cat > "$WORKDIR/.runproof/changes/demo-sum-bug/tasks.md" <<'MARKDOWN'
---
schema: sdd.artifact.v1
artifact: tasks
change_id: demo-sum-bug
profile: quick
status: draft
created: 2026-05-08
updated: 2026-05-08
---
# Tasks

- [x] T-001 Reproduce the failing test for the broken sum demo.
- Requirement: failing baseline is visible
- Evidence: `npm test --prefix broken-app`
- [x] T-002 Verify RunProof blocks the failing command before the fix.
- Requirement: fake completion is blocked
- Evidence: `runproof verify demo-sum-bug --command "npm test --prefix broken-app"`
- [x] T-003 Apply the one-line fix and capture passing evidence.
- Requirement: real execution passes
- Evidence: `npm test --prefix broken-app`
MARKDOWN
"${RUNPROOF[@]}" ready demo-sum-bug --root "$WORKDIR"
"${RUNPROOF[@]}" run demo-sum-bug --no-create --root "$WORKDIR"

section "4/7 An agent claims: 'done, tests pass'"
printf '🤖 Agent: done, tests pass.\n'

section "5/7 Reality check: the command fails"
run_expect_fail npm test --prefix "$WORKDIR/broken-app"

section "6/7 RunProof blocks the fake completion"
run_expect_fail "${RUNPROOF[@]}" verify demo-sum-bug --command "npm test --prefix broken-app" --root "$WORKDIR"

section "7/7 Apply the one-line fix and record real passing evidence"
python - <<'PY' "$WORKDIR/broken-app/app.js"
from pathlib import Path
import sys
path = Path(sys.argv[1])
text = path.read_text()
path.write_text(text.replace("return a - b;", "return a + b;"))
PY
npm test --prefix "$WORKDIR/broken-app"
"${RUNPROOF[@]}" verify demo-sum-bug --command "npm test --prefix broken-app" --root "$WORKDIR"

printf '\n✅ Demo complete. Evidence is in %s/.runproof/evidence/demo-sum-bug/\n' "$WORKDIR"
Loading