Vector OS Nano

Zero-shot, natural language generalized grasping on a $150 robot arm.
No training. No fine-tuning. Just say what you want.

Vector OS： a cross-embodiment robot operating system with industrial-grade SLAM, navigation, generalized grasping, semantic mapping, long-chain task orchestration and explainable task execution.
Being developed at CMU Robotics Institute. Nano is the grasping proof-of-value. Full stack coming soon.

Vector OS：跨本体通用机器人操作系统：工业级SLAM、导航、泛化抓取、语义建图、长链任务编排、可解释任务执行。
CMU 机器人研究所 全力开发中。Nano 是低成本硬件的上的低门槛概念验证，完整系统即将分阶段开源。

Demo

Click to watch full demo video

Click for English / 点击切换中文

What is Vector OS?

Vector OS is a cross-embodiment general-purpose robot operating system being developed at CMU Robotics Institute. It provides plug-and-play robot capabilities out of the box:

Industrial-grade SLAM + Navigation Stack — multi-sensor fusion, dynamic obstacle avoidance, fleet coordination
Zero-shot Generalized Grasping — pick up any object by describing it, no training required
Spatial Understanding + Semantic Mapping — 3D scene graphs with object-level semantics
Explainable Robot Task Execution — interpretable neuro-symbolic planning with human-readable reasoning chains
Cross-Embodiment — one framework for wheeled, legged, and manipulator platforms

Vector OS combines LLM reasoning with real-time perception and physical manipulation through a unified neuro-symbolic architecture. Robots understand natural language, perceive the world, plan multi-step tasks, and execute them — with full explainability.

Vector OS Nano demonstrates the core grasping capability: zero-shot generalized grasping — pick up any object by describing it in natural language, with no pre-training, no object-specific models, and no fine-tuning. A ~$420 robot arm that understands what you say and acts on it.

System Architecture

User (natural language, Chinese/English)
  |
  v
┌─────────────────────────────────────────────┐
│              LLM Brain Layer                 │
│  Claude Haiku (via OpenRouter API)           │
│  - Intent parsing & task decomposition       │
│  - Tool calling / function execution         │
│  - Multi-step planning                       │
│  - Bilingual: Chinese + English              │
├─────────────────────────────────────────────┤
│              Skill Layer                     │
│  ROS2 Services                               │
│  - pick(object)     detect_all()             │
│  - home() / scan()  describe_scene()         │
│  - get_pose()       track(object)            │
├─────────────────────────────────────────────┤
│           Perception Layer                   │
│  - Moondream2 VLM (local, ~4GB GPU)         │
│  - EdgeTAM real-time tracking (20fps)        │
│  - D405 depth camera (640x480 RGB+D @30fps) │
│  - Workspace calibration (camera->base)      │
├─────────────────────────────────────────────┤
│            Control Layer                     │
│  - Pinocchio FK/IK solver                    │
│  - Joint trajectory interpolation            │
│  - Gripper command with retry logic          │
│  - Dynamic position compensation             │
├─────────────────────────────────────────────┤
│           Hardware Layer                     │
│  - SO-ARM100 (6-DOF, STS3215 servos)        │
│  - Intel RealSense D405 (USB 3.x)           │
│  - Total cost: ~$420                         │
└─────────────────────────────────────────────┘

Capabilities

Capability	Status
Zero-shot natural language grasping	Working
Real-time object tracking (20fps)	Working
Scene description via VLM	Working
Chinese + English commands	Working
LLM-powered task interpretation	Working
Workspace calibration (14-point)	Working
Auto-retry on pick failure	Working
Dynamic gripper compensation	Working
Place skill	Planned
Multi-step task planning	Planned

Hardware (~$420 total)

Component	Model	Cost
Robot Arm	LeRobot SO-ARM100 (6-DOF, 3D-printed)	~$150
Camera	Intel RealSense D405	~$270
GPU	Any NVIDIA with 10+ GB VRAM	(existing)
Computer	Ubuntu 22.04 + ROS2 Humble	(existing)

Quick Start

Prerequisites: Ubuntu 22.04, ROS2 Humble, NVIDIA GPU (10+ GB VRAM), Python 3.10

# Clone
git clone https://github.com/yusenthebot/vector-os-nano.git ~/Desktop/vector_ws
cd ~/Desktop/vector_ws

# Install dependencies
pip3 install pin pyserial httpx pyyaml

# Build
conda deactivate  # important if using conda
source /opt/ros/humble/setup.bash
export PATH="/usr/bin:$PATH"
colcon build --symlink-install --cmake-args "-DPYTHON_EXECUTABLE=/usr/bin/python3.10"

# Configure OpenRouter API key (for LLM brain)
mkdir -p config
cat > config/system.yaml << 'EOF'
llm:
  provider: openrouter
  api_key: YOUR_OPENROUTER_API_KEY
  model: anthropic/claude-haiku-4-5
  max_history: 20
  max_tokens: 1024
  temperature: 0.0
EOF
# Get your key at: https://openrouter.ai/keys

Run (2 terminals):

# Terminal 1 — Full system (arm + camera + perception + skills)
conda deactivate && source ~/Desktop/vector_ws/install/setup.bash
export MOONDREAM_MODEL=vikhyatk/moondream2
ros2 launch so101_bringup perception.launch.py

# Terminal 2 — Interactive CLI
conda deactivate && source ~/Desktop/vector_ws/install/setup.bash
ros2 run so101_bringup cli

Use:

vector> pick battery              # Pick up a battery
vector> grab the red cup          # Natural language pick
vector> 捡起桌上的电池             # Chinese: pick up the battery
vector> detect                    # Detect all visible objects
vector> 看看桌上有什么             # Chinese: what's on the table?
vector> home                      # Return arm to home position

First-Time Calibration

Before picking, calibrate the workspace (5-10 min):

conda deactivate && source ~/Desktop/vector_ws/install/setup.bash
python3 src/so101_skills/scripts/calibrate_workspace.py

Place an object at 12-15 measured positions. The script computes a camera-to-arm mapping matrix.

Packages (9 total)

Package	Role
`so101_bringup`	Unified launch + LLM CLI
`so101_hardware`	STS3215 servo bridge + gripper
`so101_description`	URDF + arm meshes
`so101_moveit_config`	MoveIt2 motion planning config
`so101_perception`	D405 camera launch
`so101_skills`	Pick/place + perception services
`track_anything`	EdgeTAM tracking + VLM integration
`vlm`	Moondream2/Qwen VLM providers
`vector_perception_utils`	Pointcloud + detection utilities

What's Coming

Vector OS Nano is a proof of value for the grasping module. The full Vector OS stack under development at CMU Robotics Institute includes:

SLAM + Navigation — LiDAR/visual SLAM, Nav2 integration, multi-floor mapping
Semantic Mapping — 3D scene graphs, object permanence, spatial reasoning
Multi-Robot Coordination — fleet management, task allocation, shared world model
Mobile Manipulation — wheeled, legged, and humanoid platforms
Explainable Planning — neuro-symbolic task decomposition with reasoning traces
Visual Servoing — sub-millimeter closed-loop precision manipulation
Multi-Modal HRI — voice, gesture, gaze-aware human-robot interaction

Demos and phased open-source releases coming soon. Star this repo and stay tuned.

License

MIT License (non-commercial use only)

点击查看中文 / Click for English

什么是 Vector OS？

Vector OS 是一个跨本体的通用机器人操作系统，正在 CMU 机器人研究所 全力开发。提供开箱即用的机器人能力：

工业级 SLAM + 导航栈 — 多传感器融合、动态避障、多机协调
零样本泛化抓取 — 自然语言描述即可抓取任意物体，无需训练
空间理解 + 语义建图 — 3D 场景图 + 物体级语义
可解释的机器人任务执行 — 神经符号规划，人类可读的推理链
跨本体 — 轮式、足式、机械臂等平台统一框架

Vector OS 将 LLM 推理与实时感知和物理操作结合，通过统一的神经符号架构实现：机器人理解自然语言、感知世界、规划多步任务并执行 — 全程可解释。

Vector OS Nano 展示了核心抓取能力：零样本泛化抓取 — 用自然语言描述任意物体即可抓取，无需预训练、无需特定物体模型、无需微调。一个约 420 美元的机械臂，听懂你说的话并执行。

系统架构

用户（自然语言，中文/英文）
  |
  v
┌─────────────────────────────────────────────┐
│              LLM 大脑层                       │
│  Claude Haiku（通过 OpenRouter API）           │
│  - 意图解析 & 任务分解                         │
│  - 工具调用 / 函数执行                         │
│  - 多步骤规划                                 │
│  - 双语：中文 + 英文                           │
├─────────────────────────────────────────────┤
│              技能层                            │
│  ROS2 服务                                    │
│  - pick(物体)      detect_all()               │
│  - home() / scan() describe_scene()           │
│  - get_pose()      track(物体)                │
├─────────────────────────────────────────────┤
│              感知层                            │
│  - Moondream2 VLM（本地运行，约 4GB 显存）      │
│  - EdgeTAM 实时追踪（20fps）                   │
│  - D405 深度相机（640x480 RGB+D @30fps）       │
│  - 工作空间标定（相机→机械臂坐标映射）           │
├─────────────────────────────────────────────┤
│              控制层                            │
│  - Pinocchio FK/IK 求解器                     │
│  - 关节轨迹插值                                │
│  - 夹爪指令重试逻辑                            │
│  - 动态位置补偿                                │
├─────────────────────────────────────────────┤
│              硬件层                            │
│  - SO-ARM100（6自由度，STS3215 舵机）          │
│  - Intel RealSense D405（USB 3.x）            │
│  - 总成本：约 $420                             │
└─────────────────────────────────────────────┘

能力

能力	状态
零样本自然语言抓取	可用
实时物体追踪（20fps）	可用
VLM 场景描述	可用
中英文指令	可用
LLM 任务理解	可用
工作空间标定（14点）	可用
抓取失败自动重试	可用
动态夹爪补偿	可用
放置技能	计划中
多步骤任务规划	计划中

硬件（总计约 $420）

组件	型号	成本
机械臂	LeRobot SO-ARM100（6自由度，3D 打印）	约 $150
相机	Intel RealSense D405	约 $270
GPU	任意 NVIDIA 10GB+ 显存	（已有）
电脑	Ubuntu 22.04 + ROS2 Humble	（已有）

快速开始

前置条件： Ubuntu 22.04, ROS2 Humble, NVIDIA GPU (10+ GB 显存), Python 3.10

# 克隆
git clone https://github.com/yusenthebot/vector-os-nano.git ~/Desktop/vector_ws
cd ~/Desktop/vector_ws

# 安装依赖
pip3 install pin pyserial httpx pyyaml

# 构建
conda deactivate
source /opt/ros/humble/setup.bash
export PATH="/usr/bin:$PATH"
colcon build --symlink-install --cmake-args "-DPYTHON_EXECUTABLE=/usr/bin/python3.10"

# 配置 OpenRouter API 密钥
mkdir -p config
# 编辑 config/system.yaml，填入你的 API key
# 获取地址: https://openrouter.ai/keys

运行（2 个终端）：

# 终端 1 — 完整系统
conda deactivate && source ~/Desktop/vector_ws/install/setup.bash
export MOONDREAM_MODEL=vikhyatk/moondream2
ros2 launch so101_bringup perception.launch.py

# 终端 2 — 交互 CLI
conda deactivate && source ~/Desktop/vector_ws/install/setup.bash
ros2 run so101_bringup cli

使用：

vector> pick battery              # 抓取电池
vector> 捡起桌上的电池             # 中文自然语言
vector> detect                    # 检测所有可见物体
vector> 看看桌上有什么             # 场景描述
vector> home                      # 回到初始位置

首次标定

抓取前需要标定工作空间（5-10 分钟）：

conda deactivate && source ~/Desktop/vector_ws/install/setup.bash
python3 src/so101_skills/scripts/calibrate_workspace.py

在 12-15 个测量位置放置物体，脚本计算相机到机械臂的坐标映射。

即将到来

Vector OS Nano 是抓取模块的概念验证。CMU 机器人研究所 正在开发的完整 Vector OS 栈包括：

SLAM + 导航 — 激光/视觉 SLAM、Nav2 集成、多楼层建图
语义建图 — 3D 场景图、物体持久性、空间推理
多机协调 — 舰队管理、任务分配、共享世界模型
移动操作 — 轮式、足式、人形平台
可解释规划 — 神经符号任务分解 + 推理链
视觉伺服 — 亚毫米级闭环精确操作
多模态人机交互 — 语音、手势、注视感知

Demo 和分阶段开源即将发布。Star 这个仓库，敬请关注。

开源协议

MIT License（仅限非商业用途）

Built by Vector Robotics with Claude Code

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.sdd		.sdd
agents/devlog		agents/devlog
docs		docs
images		images
src		src
.gitignore		.gitignore
=4.6.0		=4.6.0
CODEBASE_GUIDE.md		CODEBASE_GUIDE.md
FILE_MANIFEST.txt		FILE_MANIFEST.txt
QUICKSTART.md		QUICKSTART.md
README.md		README.md
hand_eye_calibration.yaml		hand_eye_calibration.yaml
progress.md		progress.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vector OS Nano

Demo

What is Vector OS?

System Architecture

Capabilities

Hardware (~$420 total)

Quick Start

First-Time Calibration

Packages (9 total)

What's Coming

License

什么是 Vector OS？

系统架构

能力

硬件（总计约 $420）

快速开始

首次标定

即将到来

开源协议

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vector OS Nano

Demo

What is Vector OS?

System Architecture

Capabilities

Hardware (~$420 total)

Quick Start

First-Time Calibration

Packages (9 total)

What's Coming

License

什么是 Vector OS？

系统架构

能力

硬件（总计约 $420）

快速开始

首次标定

即将到来

开源协议

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages