Zero-shot, natural language generalized grasping on a $150 robot arm.
No training. No fine-tuning. Just say what you want.
Vector OS: a cross-embodiment robot operating system with industrial-grade SLAM, navigation, generalized grasping, semantic mapping, long-chain task orchestration and explainable task execution.
Being developed at CMU Robotics Institute. Nano is the grasping proof-of-value. Full stack coming soon.
Vector OS: 跨本体通用机器人操作系统:工业级SLAM、导航、泛化抓取、语义建图、长链任务编排、可解释任务执行。
CMU 机器人研究所 全力开发中。Nano 是低成本硬件的上的低门槛概念验证,完整系统即将分阶段开源。
Click to watch full demo video
Click for English / 点击切换中文
Vector OS is a cross-embodiment general-purpose robot operating system being developed at CMU Robotics Institute. It provides plug-and-play robot capabilities out of the box:
- Industrial-grade SLAM + Navigation Stack — multi-sensor fusion, dynamic obstacle avoidance, fleet coordination
- Zero-shot Generalized Grasping — pick up any object by describing it, no training required
- Spatial Understanding + Semantic Mapping — 3D scene graphs with object-level semantics
- Explainable Robot Task Execution — interpretable neuro-symbolic planning with human-readable reasoning chains
- Cross-Embodiment — one framework for wheeled, legged, and manipulator platforms
Vector OS combines LLM reasoning with real-time perception and physical manipulation through a unified neuro-symbolic architecture. Robots understand natural language, perceive the world, plan multi-step tasks, and execute them — with full explainability.
Vector OS Nano demonstrates the core grasping capability: zero-shot generalized grasping — pick up any object by describing it in natural language, with no pre-training, no object-specific models, and no fine-tuning. A ~$420 robot arm that understands what you say and acts on it.
User (natural language, Chinese/English)
|
v
┌─────────────────────────────────────────────┐
│ LLM Brain Layer │
│ Claude Haiku (via OpenRouter API) │
│ - Intent parsing & task decomposition │
│ - Tool calling / function execution │
│ - Multi-step planning │
│ - Bilingual: Chinese + English │
├─────────────────────────────────────────────┤
│ Skill Layer │
│ ROS2 Services │
│ - pick(object) detect_all() │
│ - home() / scan() describe_scene() │
│ - get_pose() track(object) │
├─────────────────────────────────────────────┤
│ Perception Layer │
│ - Moondream2 VLM (local, ~4GB GPU) │
│ - EdgeTAM real-time tracking (20fps) │
│ - D405 depth camera (640x480 RGB+D @30fps) │
│ - Workspace calibration (camera->base) │
├─────────────────────────────────────────────┤
│ Control Layer │
│ - Pinocchio FK/IK solver │
│ - Joint trajectory interpolation │
│ - Gripper command with retry logic │
│ - Dynamic position compensation │
├─────────────────────────────────────────────┤
│ Hardware Layer │
│ - SO-ARM100 (6-DOF, STS3215 servos) │
│ - Intel RealSense D405 (USB 3.x) │
│ - Total cost: ~$420 │
└─────────────────────────────────────────────┘
| Capability | Status |
|---|---|
| Zero-shot natural language grasping | Working |
| Real-time object tracking (20fps) | Working |
| Scene description via VLM | Working |
| Chinese + English commands | Working |
| LLM-powered task interpretation | Working |
| Workspace calibration (14-point) | Working |
| Auto-retry on pick failure | Working |
| Dynamic gripper compensation | Working |
| Place skill | Planned |
| Multi-step task planning | Planned |
| Component | Model | Cost |
|---|---|---|
| Robot Arm | LeRobot SO-ARM100 (6-DOF, 3D-printed) | ~$150 |
| Camera | Intel RealSense D405 | ~$270 |
| GPU | Any NVIDIA with 10+ GB VRAM | (existing) |
| Computer | Ubuntu 22.04 + ROS2 Humble | (existing) |
Prerequisites: Ubuntu 22.04, ROS2 Humble, NVIDIA GPU (10+ GB VRAM), Python 3.10
# Clone
git clone https://github.com/yusenthebot/vector-os-nano.git ~/Desktop/vector_ws
cd ~/Desktop/vector_ws
# Install dependencies
pip3 install pin pyserial httpx pyyaml
# Build
conda deactivate # important if using conda
source /opt/ros/humble/setup.bash
export PATH="/usr/bin:$PATH"
colcon build --symlink-install --cmake-args "-DPYTHON_EXECUTABLE=/usr/bin/python3.10"
# Configure OpenRouter API key (for LLM brain)
mkdir -p config
cat > config/system.yaml << 'EOF'
llm:
provider: openrouter
api_key: YOUR_OPENROUTER_API_KEY
model: anthropic/claude-haiku-4-5
max_history: 20
max_tokens: 1024
temperature: 0.0
EOF
# Get your key at: https://openrouter.ai/keysRun (2 terminals):
# Terminal 1 — Full system (arm + camera + perception + skills)
conda deactivate && source ~/Desktop/vector_ws/install/setup.bash
export MOONDREAM_MODEL=vikhyatk/moondream2
ros2 launch so101_bringup perception.launch.py
# Terminal 2 — Interactive CLI
conda deactivate && source ~/Desktop/vector_ws/install/setup.bash
ros2 run so101_bringup cliUse:
vector> pick battery # Pick up a battery
vector> grab the red cup # Natural language pick
vector> 捡起桌上的电池 # Chinese: pick up the battery
vector> detect # Detect all visible objects
vector> 看看桌上有什么 # Chinese: what's on the table?
vector> home # Return arm to home position
Before picking, calibrate the workspace (5-10 min):
conda deactivate && source ~/Desktop/vector_ws/install/setup.bash
python3 src/so101_skills/scripts/calibrate_workspace.pyPlace an object at 12-15 measured positions. The script computes a camera-to-arm mapping matrix.
| Package | Role |
|---|---|
so101_bringup |
Unified launch + LLM CLI |
so101_hardware |
STS3215 servo bridge + gripper |
so101_description |
URDF + arm meshes |
so101_moveit_config |
MoveIt2 motion planning config |
so101_perception |
D405 camera launch |
so101_skills |
Pick/place + perception services |
track_anything |
EdgeTAM tracking + VLM integration |
vlm |
Moondream2/Qwen VLM providers |
vector_perception_utils |
Pointcloud + detection utilities |
Vector OS Nano is a proof of value for the grasping module. The full Vector OS stack under development at CMU Robotics Institute includes:
- SLAM + Navigation — LiDAR/visual SLAM, Nav2 integration, multi-floor mapping
- Semantic Mapping — 3D scene graphs, object permanence, spatial reasoning
- Multi-Robot Coordination — fleet management, task allocation, shared world model
- Mobile Manipulation — wheeled, legged, and humanoid platforms
- Explainable Planning — neuro-symbolic task decomposition with reasoning traces
- Visual Servoing — sub-millimeter closed-loop precision manipulation
- Multi-Modal HRI — voice, gesture, gaze-aware human-robot interaction
Demos and phased open-source releases coming soon. Star this repo and stay tuned.
MIT License (non-commercial use only)
点击查看中文 / Click for English
Vector OS 是一个跨本体的通用机器人操作系统,正在 CMU 机器人研究所 全力开发。提供开箱即用的机器人能力:
- 工业级 SLAM + 导航栈 — 多传感器融合、动态避障、多机协调
- 零样本泛化抓取 — 自然语言描述即可抓取任意物体,无需训练
- 空间理解 + 语义建图 — 3D 场景图 + 物体级语义
- 可解释的机器人任务执行 — 神经符号规划,人类可读的推理链
- 跨本体 — 轮式、足式、机械臂等平台统一框架
Vector OS 将 LLM 推理与实时感知和物理操作结合,通过统一的神经符号架构实现:机器人理解自然语言、感知世界、规划多步任务并执行 — 全程可解释。
Vector OS Nano 展示了核心抓取能力:零样本泛化抓取 — 用自然语言描述任意物体即可抓取,无需预训练、无需特定物体模型、无需微调。一个约 420 美元的机械臂,听懂你说的话并执行。
用户(自然语言,中文/英文)
|
v
┌─────────────────────────────────────────────┐
│ LLM 大脑层 │
│ Claude Haiku(通过 OpenRouter API) │
│ - 意图解析 & 任务分解 │
│ - 工具调用 / 函数执行 │
│ - 多步骤规划 │
│ - 双语:中文 + 英文 │
├─────────────────────────────────────────────┤
│ 技能层 │
│ ROS2 服务 │
│ - pick(物体) detect_all() │
│ - home() / scan() describe_scene() │
│ - get_pose() track(物体) │
├─────────────────────────────────────────────┤
│ 感知层 │
│ - Moondream2 VLM(本地运行,约 4GB 显存) │
│ - EdgeTAM 实时追踪(20fps) │
│ - D405 深度相机(640x480 RGB+D @30fps) │
│ - 工作空间标定(相机→机械臂坐标映射) │
├─────────────────────────────────────────────┤
│ 控制层 │
│ - Pinocchio FK/IK 求解器 │
│ - 关节轨迹插值 │
│ - 夹爪指令重试逻辑 │
│ - 动态位置补偿 │
├─────────────────────────────────────────────┤
│ 硬件层 │
│ - SO-ARM100(6自由度,STS3215 舵机) │
│ - Intel RealSense D405(USB 3.x) │
│ - 总成本:约 $420 │
└─────────────────────────────────────────────┘
| 能力 | 状态 |
|---|---|
| 零样本自然语言抓取 | 可用 |
| 实时物体追踪(20fps) | 可用 |
| VLM 场景描述 | 可用 |
| 中英文指令 | 可用 |
| LLM 任务理解 | 可用 |
| 工作空间标定(14点) | 可用 |
| 抓取失败自动重试 | 可用 |
| 动态夹爪补偿 | 可用 |
| 放置技能 | 计划中 |
| 多步骤任务规划 | 计划中 |
| 组件 | 型号 | 成本 |
|---|---|---|
| 机械臂 | LeRobot SO-ARM100(6自由度,3D 打印) | 约 $150 |
| 相机 | Intel RealSense D405 | 约 $270 |
| GPU | 任意 NVIDIA 10GB+ 显存 | (已有) |
| 电脑 | Ubuntu 22.04 + ROS2 Humble | (已有) |
前置条件: Ubuntu 22.04, ROS2 Humble, NVIDIA GPU (10+ GB 显存), Python 3.10
# 克隆
git clone https://github.com/yusenthebot/vector-os-nano.git ~/Desktop/vector_ws
cd ~/Desktop/vector_ws
# 安装依赖
pip3 install pin pyserial httpx pyyaml
# 构建
conda deactivate
source /opt/ros/humble/setup.bash
export PATH="/usr/bin:$PATH"
colcon build --symlink-install --cmake-args "-DPYTHON_EXECUTABLE=/usr/bin/python3.10"
# 配置 OpenRouter API 密钥
mkdir -p config
# 编辑 config/system.yaml,填入你的 API key
# 获取地址: https://openrouter.ai/keys运行(2 个终端):
# 终端 1 — 完整系统
conda deactivate && source ~/Desktop/vector_ws/install/setup.bash
export MOONDREAM_MODEL=vikhyatk/moondream2
ros2 launch so101_bringup perception.launch.py
# 终端 2 — 交互 CLI
conda deactivate && source ~/Desktop/vector_ws/install/setup.bash
ros2 run so101_bringup cli使用:
vector> pick battery # 抓取电池
vector> 捡起桌上的电池 # 中文自然语言
vector> detect # 检测所有可见物体
vector> 看看桌上有什么 # 场景描述
vector> home # 回到初始位置
抓取前需要标定工作空间(5-10 分钟):
conda deactivate && source ~/Desktop/vector_ws/install/setup.bash
python3 src/so101_skills/scripts/calibrate_workspace.py在 12-15 个测量位置放置物体,脚本计算相机到机械臂的坐标映射。
Vector OS Nano 是抓取模块的概念验证。CMU 机器人研究所 正在开发的完整 Vector OS 栈包括:
- SLAM + 导航 — 激光/视觉 SLAM、Nav2 集成、多楼层建图
- 语义建图 — 3D 场景图、物体持久性、空间推理
- 多机协调 — 舰队管理、任务分配、共享世界模型
- 移动操作 — 轮式、足式、人形平台
- 可解释规划 — 神经符号任务分解 + 推理链
- 视觉伺服 — 亚毫米级闭环精确操作
- 多模态人机交互 — 语音、手势、注视感知
Demo 和分阶段开源即将发布。Star 这个仓库,敬请关注。
MIT License(仅限非商业用途)
Built by Vector Robotics with Claude Code
