Skip to content

DanielVisca/Partners

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Agent Partners — Two Agents, One Task

Experiment: Can wrapping two LLM agents together so they act as one improve reliability and handle more complex tasks? This repo compares two relationship models: cooperative (mutually respectful, beneficial partnership) and adversarial (one proposes, the other attacks; defend and iterate).

Same interface either way: you call a single “agent” with a task; internally, two agents collaborate or battle it out. The question was which approach is more reliable and when each is worth the extra cost.


Philosophy

The premise

A single agent can echo its own assumptions. Two agents can either:

  1. Cooperate — like a healthy partnership: triage (solo / consult / full collaboration), draft and review, disagree honestly and protect the outcome.
  2. Adversarially engage — one agent proposes a solution, the other is instructed to attack it (find flaws, edge cases, security holes); the proposer defends or revises; repeat; then synthesize.

The goal was to see whether either setup would:

  • Complete tasks more reliably.
  • Handle harder or more ambiguous tasks better.
  • Be worth the extra API cost (more turns, more tokens).

Cooperative protocol (“married” partners)

Implemented in partnership.py. Two agents are “married” into one callable: same str → str interface as a single agent.

  • Solo — One partner handles the task (fast, cheap).
  • Consult — One drafts, the other reviews; optional revision (draft → review → approve or revise).
  • Collaborate — Full protocol: understand → work → validate → synthesize.

The first agent triages each task (SOLO / CONSULT / COLLABORATE). So the pair stays efficient on simple questions and spends extra effort only when needed.

Adversarial protocol

Implemented in adversarial.py. One agent is the proposer, the other the attacker.

  1. Proposer produces an initial solution.
  2. Attacker must find multiple significant flaws (edge cases, security, performance, design).
  3. Proposer defends or revises.
  4. Repeat for N rounds.
  5. Proposer synthesizes a final solution from the conversation.

No polite agreement: the attacker is instructed not to rubber-stamp. The idea is to stress-test the solution before calling it done.


What’s in this repo

Path Purpose
partnership.py Cooperative protocol: marry(agent_a, agent_b) → one callable.
adversarial.py Adversarial protocol: marry_adversarial(proposer, attacker) → one callable.
agent/ Plug-and-play agent module (PydanticAI-based); used by examples and tests.
examples/simple_cooperative.py Example: cooperative pair on a design problem.
examples/simple_adversarial.py Example: adversarial pair on a design decision.
partnership_cli.py CLI to run design tasks in cooperative or adversarial mode.
direct_adversarial_test.py Validation experiment: single agent vs adversarial pair on code generation (build an SDK, run tests).
adversarial_coding.py Adversarial code build with validation gates (tests, health checks).
COMPARISON.md Written comparison of single vs adversarial on a design task (rate limiter), with metrics and takeaways.

Testing and validation

1. Design task (single vs adversarial) — COMPARISON.md

  • Task: Design a rate-limiting system (same prompt to single agent and to adversarial pair).
  • Metrics: Output size, word count, tokens, API calls, cost.
  • Result: Adversarial produced ~33% longer output, ~3× tokens and cost, 5–6× API calls. Quality differences:
    • Single agent: clear architecture, working approach, generic edge cases.
    • Adversarial: same plus race-condition handling (e.g. atomic Lua), memory explosion addressed (bucketing, numbers), circuit breaker, fail-closed fallback, IPv6, quantified performance, explicit tradeoffs, more production-ready edge cases.

Conclusion from that run: Adversarial changed the kind of thinking: from “here’s a good design” to “here’s a design that survived being attacked.”

2. Code generation with real validation — direct_adversarial_test.py

  • Task: Build a JavaScript Analytics SDK (batching, retry, validation, rate limiting, tests, health check).
  • Setup: Baseline = single agent generates code once. Adversarial = proposer generates → run tests → if failures, attacker critiques → proposer fixes → repeat (up to 5 rounds).
  • Output: Writes files into test_output/baseline_sdk/ and test_output/adversarial_sdk/, runs npm test, and compares success/failure and file counts. Results are also written to test_output/comparison.json.

This tests whether adversarial review and iteration leads to code that actually passes tests more often than a single shot.


Findings

  • Cooperative (partnership):

    • Good for mixed workloads: cheap and fast when one agent can handle the task (solo), more thorough when they consult or collaborate.
    • Fits design discussions and decisions where you want two perspectives without an explicit “attack” dynamic.
  • Adversarial:

    • In the design comparison, it produced measurably stronger output: more edge cases, security and failure-mode thinking, quantification, and explicit tradeoffs.
    • Costs roughly 3× tokens and 5–6× API calls; worth it for production-grade or high-stakes design.
    • For code generation, the direct test compares whether the adversarial loop (generate → test → critique → fix) yields passing tests more often than a single-agent baseline.
  • When to use which (from COMPARISON.md):

    • Adversarial: Production systems, high-stakes decisions, complex tradeoffs, or when you want the solution “battle-tested.”
    • Single / cooperative: Prototypes, well-defined problems, cost-sensitive or low-stakes work.

Summary: The adversarial protocol showed a real quality gain on the design task; the benefit is context-dependent and comes with a real cost increase. Keeping both protocols and choosing by use case is the intended takeaway.


Conclusion

  • Two agents acting as one can be implemented in two ways: cooperative (partnership with triage) and adversarial (propose → attack → defend → synthesize).
  • Cooperative is efficient and good for general use; adversarial is a “production-grade” mode that, in our design comparison, produced more robust, attack-aware solutions at about 3× cost.
  • Recommendation: Keep both; default to cooperative for speed and cost; offer adversarial when the outcome needs to be stress-tested (e.g. production design or critical decisions). The codebase is structured so you can plug in any str → str agents and run the same task through either protocol.

Setup and run

Requirements

  • Python 3.10+
  • API keys: ANTHROPIC_API_KEY (and optionally OPENAI_API_KEY if you use OpenAI-backed agents)
# From repo root
pip install -r requirements.txt
pip install -r agent/requirements.txt

Quick run

Cooperative (consult mode):

python examples/simple_cooperative.py

Adversarial (design decision):

python examples/simple_adversarial.py

CLI (design problems with optional monitoring):

python partnership_cli.py design "Should we use microservices or monolith?" --mode adversarial --rounds 2 -v
python partnership_cli.py design "Rate limiting strategy" --mode cooperative -o design.md

Code-generation comparison (single vs adversarial with test validation):

python direct_adversarial_test.py

Outputs go to test_output/ (baseline and adversarial SDKs, plus comparison.json). Ensure Node/npm are available if the task runs JS tests.

Optional

  • API monitoring: See API_MONITORING.md. The CLI can log calls to api_monitor.jsonl and report token/cost stats.
  • PostHog: The agent module can send LLM analytics to PostHog when configured; see agent/agent.py and agent/requirements.txt.

License

No license file is included; treat as unlicensed unless you add one. If you want to publish as open source, add a license (e.g. MIT) and state it in the README.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages