Take one prompt ("Build analytics platform for 1M users") and produce validated, functional components with health checks and strong contracts. Not just code - actual deliverables.
Current coding agents:
- Generate lots of code that doesn't work
- No validation between components
- Broken contracts compound into unusable systems
- "Millions of tokens of code that all breaks"
Adversarial wrapper that:
- Breaks down the big task into components
- Builds each component (proposer agent)
- Attacks each component (attacker agent validates, tests, finds flaws)
- Validates before moving forward (health checks, tests pass, contracts proven)
- Only proceeds when component is fully functional
A component is "done" when:
- All files created
- Dependencies declared
- Configuration present
- Unit tests written and passing
- Integration tests (if applicable)
- Edge cases covered
- Service starts successfully
- Endpoints respond (for APIs)
- Connections work (for DB, Kafka, etc.)
- Metrics exposed
- Clear interface/API defined
- Input/output schemas documented
- Error handling specified
- Backward compatibility checked
- README with setup instructions
- API docs (if applicable)
- Dependencies listed
- Example usage
- Attacker agent tried to break it
- Security holes found and fixed
- Performance issues identified
- Edge cases covered
Input: "Build analytics platform for 1M users"
↓
[Task Decomposer]
↓
Components: [SDK, API, Kafka, K8s, DB, Dashboard]
↓
For each component:
↓
[Proposer Agent] → Builds component
↓
[Attacker Agent] → Validates, tests, attacks
↓
[Health Check] → Verifies it works
↓
✅ Component validated → Continue
❌ Component broken → Loop back to Proposer
↓
[Integration Check] → Verify new component works with existing
↓
Next component...
Pick ONE component (e.g., SDK) and prove the system works:
- Proposer builds SDK
- Attacker validates it:
- Writes tests
- Checks edge cases
- Verifies contracts
- Runs health checks
- Only mark done when all checks pass
- Output: Fully functional, tested SDK
Add second component (e.g., API) that depends on SDK:
- Build API using validated SDK
- Validate API independently
- Integration check: Verify API + SDK work together
- Only proceed when integration proven
Scale to all components with:
- Dependency graph
- Parallel building (where possible)
- Integration checkpoints
- System-level validation
No moving forward until component proven working:
while not component_validated:
code = proposer.build(component)
issues = attacker.validate(code)
if issues:
proposer.fix(issues)
else:
health_check_passed = run_health_checks(code)
if health_check_passed:
breakEach component exposes metrics:
GET /healthfor services- Exit codes for scripts
- Connection tests for infrastructure
- Performance benchmarks
Explicit interfaces between components:
SDK → API: {request schema, response schema, error codes}
API → Kafka: {event schema, topics, partitions}
Kafka → DB: {data format, consistency guarantees}
Attacker agent's job:
- "Find ways this breaks"
- "What happens with 10x load?"
- "What if Kafka goes down?"
- "Is this secure?"
- "What edge cases are missing?"
Proposer's output:
// sdk.js
class AnalyticsSDK {
track(event, properties) {
fetch('/api/events', {
method: 'POST',
body: JSON.stringify({event, properties})
})
}
}Attacker's critique:
- ❌ No error handling - what if fetch fails?
- ❌ No retry logic - loses events on network blip
- ❌ No batching - 1 request per event kills performance at scale
- ❌ No validation - can send malformed data
- ❌ No tests - how do we know this works?
Proposer's revised output:
// sdk.js with fixes
class AnalyticsSDK {
constructor(config) {
this.queue = []
this.batchSize = config.batchSize || 10
this.retryAttempts = config.retryAttempts || 3
this.startFlusher()
}
track(event, properties) {
if (!this.validate(event, properties)) {
throw new Error('Invalid event')
}
this.queue.push({event, properties, timestamp: Date.now()})
if (this.queue.length >= this.batchSize) {
this.flush()
}
}
async flush() {
if (this.queue.length === 0) return
const batch = this.queue.splice(0, this.batchSize)
await this.sendWithRetry(batch)
}
async sendWithRetry(batch, attempt = 1) {
try {
const response = await fetch('/api/events/batch', {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify(batch)
})
if (!response.ok) throw new Error(`HTTP ${response.status}`)
} catch (error) {
if (attempt < this.retryAttempts) {
await this.sleep(1000 * attempt)
return this.sendWithRetry(batch, attempt + 1)
}
console.error('Failed to send events:', error)
}
}
validate(event, properties) {
return typeof event === 'string' && event.length > 0 &&
typeof properties === 'object'
}
startFlusher() {
setInterval(() => this.flush(), 5000)
}
sleep(ms) {
return new Promise(resolve => setTimeout(resolve, ms))
}
}
// tests/sdk.test.js
describe('AnalyticsSDK', () => {
test('batches events', async () => {
const sdk = new AnalyticsSDK({batchSize: 2})
sdk.track('click', {button: 'signup'})
sdk.track('click', {button: 'login'})
// Should trigger flush at batchSize
})
test('retries on failure', async () => {
// Mock fetch to fail twice then succeed
// Verify it retries
})
test('validates events', () => {
const sdk = new AnalyticsSDK({})
expect(() => sdk.track('', {})).toThrow()
expect(() => sdk.track('event', null)).toThrow()
})
})
// health-check.js
async function healthCheck() {
const sdk = new AnalyticsSDK({apiUrl: 'http://localhost:3000'})
try {
sdk.track('health_check', {timestamp: Date.now()})
console.log('✅ SDK health check passed')
process.exit(0)
} catch (error) {
console.error('❌ SDK health check failed:', error)
process.exit(1)
}
}Validation passed:
- ✅ Tests run and pass
- ✅ Health check succeeds
- ✅ Contract defined (track method signature, error handling)
- ✅ Performance considerations addressed (batching, retries)
- ✅ Edge cases handled (validation, network failures)
Component approved → move to next
Perfect test because:
- You know the full system design
- Multiple components with dependencies
- Real-world complexity (1M users)
- Clear success criteria (does it actually work?)
Components:
- SDK - Client-side event tracking
- API - Ingestion endpoint
- Kafka - Event streaming
- Database - Storage (ClickHouse/PostgreSQL)
- Processing - Real-time aggregation
- Dashboard - Visualization
- Kubernetes - Orchestration
Dependencies:
SDK → API → Kafka → [Processing, Database] → Dashboard
↓
Kubernetes (hosts everything)
- Build the wrapper -
validated_builder.pythat implements the adversarial validation loop - Test on SDK - Prove it can build ONE component completely
- Add health checks - Define validation criteria
- Scale to full system - Build entire analytics platform
- Completion rate: % of components that are actually functional
- First-run success: Does it work without manual fixes?
- Integration success: Do components work together?
- Scale readiness: Can it handle 1M users?
The goal: One prompt → production-ready system