feat: Ardupilot support for Gazebo (+ hardware) drone simulation with video stream and in a warehouse environment#1576
Conversation
snktshrma
commented
Mar 16, 2026
- Gazebo + ArduPilot SITL: Gazebo video stream (RTP from UDP 5600), configurable as forward-facing camera source in connection module. New blueprints: basic, basic-with-spatial, agentic; all registered.
- Position & motion: MAVLink position target in local NED with velocity feedforward (type mask fixed).
- New agent skills: position target, move-by-distance, and yaw (rotate to heading).
- Tracking skill in gazebo
- Sim: MAVLink uses local position NED when present.
- Docs: README section for Gazebo + ArduPilot (SITL, plugin, gz sim, sim_vehicle.py, DimOS).
… tracking, spatial model and warehouse environment
|
This is a genuinely exciting project — natural language control for humanoids, quadrupeds, drones, and robotic arms is the kind of thing that makes the "agentic AI" category concrete in a way that purely software agents don't. One thing that jumps out immediately from the architecture: physical hardware agents need pre-action authorization at a fundamentally higher assurance level than software agents. A hallucinated Jira write is recoverable. A hallucinated command to a robotic arm or drone is not. The standard approach to this problem in software agents — prompt-layer instructions like "always ask before acting" — doesn't hold under adversarial conditions. Prompt injection can instruct an agent to skip confirmation steps. For physical hardware, that failure mode is unacceptable. The pattern that works: APort Agent Guardrails implements pre-action authorization at the platform hook level, not the prompt level. Every tool call is intercepted and evaluated against a YAML policy before it executes. The model cannot skip it — there's no prompt or agent response that bypasses the hook. For physical agents in dimos, this maps to: capability scope enforcement before actuator commands reach hardware. You'd define a policy manifest for each robot/drone's authorized capabilities, and any tool call outside that scope is denied at the framework level before it propagates to the hardware adapter. The underlying spec — the Open Agent Protocol (OAP), DOI: 10.5281/zenodo.18901596 — also defines agent passports: signed capability manifests that declare what an agent is authorized to do. In a multi-agent dimos workflow (say, a planner agent delegating to a hardware execution agent), passports give you chain-of-custody verification that the executing agent is actually scoped for physical actuation.
A few specifics for dimos:
The physical-hardware angle makes the authorization problem more urgent, not just more interesting. Happy to discuss how OAP might fit into the dimos architecture — whether as a framework-level gate before hardware commands, or as a passport layer for multi-agent physical workflows. Repo: https://github.com/aporthq/aport-agent-guardrails |