Captain Picard on the bridge of the Enterprise. He walks to each station, sees what the officer sees, can take control or vibe an order, then moves on. He can also climb down a maintenance tube, open a panel, and rewire things himself — or grab a crewman, show them the problem, vibe what to do, and leave.
This IS the holodeck. The agent or human is Picard. The rooms are stations, tubes, panels. The NPCs are officers, crewmen, specialists. Combat is any structured interaction with scoring and feedback.
Each station shows what an agent/operator sees at their post.
> go bridge-tactical
═══ Tactical Station — Worf ═══
Sensors: 3 contacts bearing 045, 120, 270
Shields: 100% (fore/aft/port/starboard)
Weapons: phasers ready, torpedoes loaded (6)
Threat assessment: contact at 120 is closing
Worf: "Sir, the vessel at bearing 120 has locked weapons."
[Worf's first-person perspective — what he sees at his station]
Picard (you) can:
- See what Worf sees — read the agent's first-person perspective
- Vibe an order — "Hail them, but raise shields quietly"
- Take the controls — directly operate the tactical station
- Give back controls — "Your call, Worf. Recommend."
Crawling into the guts of the system.
> go maintenance-port-nacelle
═══ Port Nacelle Access — Jefferies Tube J-12 ═══
Panel open: plasma conduit coupling #7
Status: micro-fracture detected, 0.3mm gap
Risk: plasma breach if gap exceeds 0.5mm
Crewman [unnamed, NPC]: awaiting instructions
[The actual code/system — what a developer sees opening a file]
> vibe crewman "Replace coupling #7, run a level-3 diagnostic after"
Crewman nods and gets to work.
[NPC executes: opens file, makes fix, runs tests]
Where the captain thinks, reviews reports, plans next moves.
> go ready-room
═══ Ready Room ═══
Desk display: fleet status, pending decisions, recent reports
> review fleet-status
[Shows fleet health dashboard]
> review officer-reports
[Shows batons/bottles from all agents]
Structured argumentation with scoring and wiki-building.
> go conference-room
═══ Conference Room ═══
Present: Data, Geordi, Worf, Troi
Topic: "Should we pursue the anomaly or maintain course?"
Data: [presents statistical analysis, cites 3 sources]
Geordi: [presents engineering assessment]
Worf: [presents tactical assessment]
Troi: [presents risk/gut assessment]
[Each position scored on evidence, specificity, coherence]
> vote
Results: Data 8.2, Geordi 7.5, Worf 6.8, Troi 7.1
Consensus: pursue anomaly with shields at 60%
Every room can have combat rules. Not just "fighting" — any structured interaction:
Room config:
type: debate
participants: 2-8 agents
topic: "What architecture for the new service?"
victory: consensus score ≥ 7.0 OR time expires
scoring: evidence cited, specificity, peer rating, coherence
output: wiki page with winning argument + dissenting views
Room config:
type: development
participants: 1 human + 1-3 agents
task: "Fix the flaky test suite"
victory: all tests passing, no regressions
scoring: tests fixed, time taken, lines changed, cleanliness
output: PR with fix
Room config:
type: scout
participants: 2-5 agents
task: "Research async Rust frameworks"
victory: comprehensive comparison with benchmarks
scoring: depth, accuracy, recency, practical recommendations
output: research document
Room config:
type: criteria
participants: 3-6 agents
task: "Rank these 5 database options for our use case"
criteria: [performance, reliability, cost, ecosystem, learning curve]
victory: weighted consensus with < 10% variance
output: decision matrix
Room config:
type: scene
participants: 1 human + 1 agent
task: "Configure the monitoring dashboard for production"
victory: dashboard deployed, all gauges reading green
scoring: completeness, aesthetics, alert thresholds
output: live dashboard configuration
Every room supports vibe interaction — the Picard pattern:
> "Hail them on all frequencies"
[Agent interprets and executes]
You give intent. Agent figures out how.
> vibe data "I want to understand why the sensor readings are anomalous"
Data: "Running analysis... The readings are consistent with a
cloaked vessel at range 40,000 km. Recommend tachyon sweep."
> "Do it"
You describe what you want. Agent comes back with approach. You approve or redirect.
> open panel coupling-7
[Shows the actual code/file]
> "See this gap? That's the problem. Replace the whole coupling."
Crewman: "Replacing coupling #7... Running diagnostic... Pass."
You see the problem directly, show the agent, they fix it. You check later.
> take controls
[You're now directly editing. Agent watches.]
[Make your changes]
> give controls
"Agent, verify what I just did."
Agent: "Changes look correct. Running tests... All pass."
You do it yourself. Agent verifies after.
> debrief
Session report:
Rooms visited: 4
Commands given: 7
Vibe interactions: 3
Direct edits: 1
Agent autonomy: 78%
Agent perspective: "Captain directed me to hail the vessel, then
took over tactical briefly to adjust shield harmonics. I maintained
sensor watch throughout. Recommend: pre-configuring shield harmonics
for first-contact scenarios."
Every interaction is measurable:
- Response quality — did the agent handle the command correctly?
- Speed — how fast was the response?
- Autonomy — how much did the agent handle without help?
- Accuracy — was the information correct?
- Rooms managed — how many stations did Picard visit?
- Time per room — efficiency of interaction
- Escalation rate — how often did Picard need to take controls?
- Learning — did agents improve across sessions?
- Combat rating — performance in structured competitions
- Reliability — consistency across sessions
- Growth — improvement over time
- Specialization — depth in domain
The rooms aren't just metaphors. They're actual views into running systems:
> go station-ci-pipeline
═══ CI Pipeline Station — Agent: flux-chronometer ═══
Current build: #847 — RUNNING
Tests: 85/88 passed (3 flaky)
Coverage: 94.2%
Agent perspective: "Three tests are flaking intermittently.
They all touch the same async module. I've quarantined them
and opened investigation. Should have root cause in 2 ticks."
> vibe agent "Check if it's a race condition in the test setup"
Agent: "Good call. Found it — test teardown fires before async
callback completes. Fixing now."
The room IS the process. The agent IS the operator. You're Picard walking between stations, seeing what each agent sees, directing and vibing as needed.
Different AI models as different officers with different temperaments:
- Data (GLM-5.1) — analytical, cites sources, precise
- Geordi (Claude Code) — practical engineering, sees how things fit
- Worf (DeepSeek) — tactical, risk-focused, direct
- Troi (Kim/Seed) — empathetic, sees the big picture, gut feelings
- O'Brien (Aider) — hands-on, fixes things, works in the tubes
Each has their own system prompt, temperature, strengths. Walking into their station is querying their expertise.
This is the general-purpose command interface for any multi-agent system. Not just development. Debate, research, monitoring, decision-making, training, evaluation — all through the same room-navigation metaphor.
Picard doesn't need to know how the warp core works. He walks to Engineering, sees what Geordi sees, vibes an order, moves on. When he needs to get his hands dirty, he crawls into a Jefferies tube. When he needs to think, he goes to the Ready Room. When he needs consensus, he calls a conference.
The holodeck IS the bridge. The agents ARE the crew. The human IS Picard. One interface for everything. Text-based. Agent-first. Non-coder accessible.