Skip to content

VectorRobotics/vector-os-nano

Repository files navigation

Vector Robotics

Vector OS Nano

Zero-shot, natural language generalized grasping on a $150 robot arm.
No training. No fine-tuning. Just say what you want.

ROS2 Python PyTorch OpenCV Moondream2 EdgeTAM Claude Pinocchio RealSense LeRobot

Vector OS: a cross-embodiment robot operating system with industrial-grade SLAM, navigation, generalized grasping, semantic mapping, long-chain task orchestration and explainable task execution.
Being developed at CMU Robotics Institute. Nano is the grasping proof-of-value. Full stack coming soon.

Vector OS: 跨本体通用机器人操作系统:工业级SLAM、导航、泛化抓取、语义建图、长链任务编排、可解释任务执行。
CMU 机器人研究所 全力开发中。Nano 是低成本硬件的上的低门槛概念验证,完整系统即将分阶段开源。


Demo

Click to watch full demo video
Click to watch full demo video


Click for English / 点击切换中文

What is Vector OS?

Vector OS is a cross-embodiment general-purpose robot operating system being developed at CMU Robotics Institute. It provides plug-and-play robot capabilities out of the box:

  • Industrial-grade SLAM + Navigation Stack — multi-sensor fusion, dynamic obstacle avoidance, fleet coordination
  • Zero-shot Generalized Grasping — pick up any object by describing it, no training required
  • Spatial Understanding + Semantic Mapping — 3D scene graphs with object-level semantics
  • Explainable Robot Task Execution — interpretable neuro-symbolic planning with human-readable reasoning chains
  • Cross-Embodiment — one framework for wheeled, legged, and manipulator platforms

Vector OS combines LLM reasoning with real-time perception and physical manipulation through a unified neuro-symbolic architecture. Robots understand natural language, perceive the world, plan multi-step tasks, and execute them — with full explainability.

Vector OS Nano demonstrates the core grasping capability: zero-shot generalized grasping — pick up any object by describing it in natural language, with no pre-training, no object-specific models, and no fine-tuning. A ~$420 robot arm that understands what you say and acts on it.

System Architecture

User (natural language, Chinese/English)
  |
  v
┌─────────────────────────────────────────────┐
│              LLM Brain Layer                 │
│  Claude Haiku (via OpenRouter API)           │
│  - Intent parsing & task decomposition       │
│  - Tool calling / function execution         │
│  - Multi-step planning                       │
│  - Bilingual: Chinese + English              │
├─────────────────────────────────────────────┤
│              Skill Layer                     │
│  ROS2 Services                               │
│  - pick(object)     detect_all()             │
│  - home() / scan()  describe_scene()         │
│  - get_pose()       track(object)            │
├─────────────────────────────────────────────┤
│           Perception Layer                   │
│  - Moondream2 VLM (local, ~4GB GPU)         │
│  - EdgeTAM real-time tracking (20fps)        │
│  - D405 depth camera (640x480 RGB+D @30fps) │
│  - Workspace calibration (camera->base)      │
├─────────────────────────────────────────────┤
│            Control Layer                     │
│  - Pinocchio FK/IK solver                    │
│  - Joint trajectory interpolation            │
│  - Gripper command with retry logic          │
│  - Dynamic position compensation             │
├─────────────────────────────────────────────┤
│           Hardware Layer                     │
│  - SO-ARM100 (6-DOF, STS3215 servos)        │
│  - Intel RealSense D405 (USB 3.x)           │
│  - Total cost: ~$420                         │
└─────────────────────────────────────────────┘

Capabilities

Capability Status
Zero-shot natural language grasping Working
Real-time object tracking (20fps) Working
Scene description via VLM Working
Chinese + English commands Working
LLM-powered task interpretation Working
Workspace calibration (14-point) Working
Auto-retry on pick failure Working
Dynamic gripper compensation Working
Place skill Planned
Multi-step task planning Planned

Hardware (~$420 total)

Component Model Cost
Robot Arm LeRobot SO-ARM100 (6-DOF, 3D-printed) ~$150
Camera Intel RealSense D405 ~$270
GPU Any NVIDIA with 10+ GB VRAM (existing)
Computer Ubuntu 22.04 + ROS2 Humble (existing)

Quick Start

Prerequisites: Ubuntu 22.04, ROS2 Humble, NVIDIA GPU (10+ GB VRAM), Python 3.10

# Clone
git clone https://github.com/yusenthebot/vector-os-nano.git ~/Desktop/vector_ws
cd ~/Desktop/vector_ws

# Install dependencies
pip3 install pin pyserial httpx pyyaml

# Build
conda deactivate  # important if using conda
source /opt/ros/humble/setup.bash
export PATH="/usr/bin:$PATH"
colcon build --symlink-install --cmake-args "-DPYTHON_EXECUTABLE=/usr/bin/python3.10"

# Configure OpenRouter API key (for LLM brain)
mkdir -p config
cat > config/system.yaml << 'EOF'
llm:
  provider: openrouter
  api_key: YOUR_OPENROUTER_API_KEY
  model: anthropic/claude-haiku-4-5
  max_history: 20
  max_tokens: 1024
  temperature: 0.0
EOF
# Get your key at: https://openrouter.ai/keys

Run (2 terminals):

# Terminal 1 — Full system (arm + camera + perception + skills)
conda deactivate && source ~/Desktop/vector_ws/install/setup.bash
export MOONDREAM_MODEL=vikhyatk/moondream2
ros2 launch so101_bringup perception.launch.py

# Terminal 2 — Interactive CLI
conda deactivate && source ~/Desktop/vector_ws/install/setup.bash
ros2 run so101_bringup cli

Use:

vector> pick battery              # Pick up a battery
vector> grab the red cup          # Natural language pick
vector> 捡起桌上的电池             # Chinese: pick up the battery
vector> detect                    # Detect all visible objects
vector> 看看桌上有什么             # Chinese: what's on the table?
vector> home                      # Return arm to home position

First-Time Calibration

Before picking, calibrate the workspace (5-10 min):

conda deactivate && source ~/Desktop/vector_ws/install/setup.bash
python3 src/so101_skills/scripts/calibrate_workspace.py

Place an object at 12-15 measured positions. The script computes a camera-to-arm mapping matrix.

Packages (9 total)

Package Role
so101_bringup Unified launch + LLM CLI
so101_hardware STS3215 servo bridge + gripper
so101_description URDF + arm meshes
so101_moveit_config MoveIt2 motion planning config
so101_perception D405 camera launch
so101_skills Pick/place + perception services
track_anything EdgeTAM tracking + VLM integration
vlm Moondream2/Qwen VLM providers
vector_perception_utils Pointcloud + detection utilities

What's Coming

Vector OS Nano is a proof of value for the grasping module. The full Vector OS stack under development at CMU Robotics Institute includes:

  • SLAM + Navigation — LiDAR/visual SLAM, Nav2 integration, multi-floor mapping
  • Semantic Mapping — 3D scene graphs, object permanence, spatial reasoning
  • Multi-Robot Coordination — fleet management, task allocation, shared world model
  • Mobile Manipulation — wheeled, legged, and humanoid platforms
  • Explainable Planning — neuro-symbolic task decomposition with reasoning traces
  • Visual Servoing — sub-millimeter closed-loop precision manipulation
  • Multi-Modal HRI — voice, gesture, gaze-aware human-robot interaction

Demos and phased open-source releases coming soon. Star this repo and stay tuned.

License

MIT License (non-commercial use only)


点击查看中文 / Click for English

什么是 Vector OS?

Vector OS 是一个跨本体的通用机器人操作系统,正在 CMU 机器人研究所 全力开发。提供开箱即用的机器人能力:

  • 工业级 SLAM + 导航栈 — 多传感器融合、动态避障、多机协调
  • 零样本泛化抓取 — 自然语言描述即可抓取任意物体,无需训练
  • 空间理解 + 语义建图 — 3D 场景图 + 物体级语义
  • 可解释的机器人任务执行 — 神经符号规划,人类可读的推理链
  • 跨本体 — 轮式、足式、机械臂等平台统一框架

Vector OS 将 LLM 推理与实时感知和物理操作结合,通过统一的神经符号架构实现:机器人理解自然语言、感知世界、规划多步任务并执行 — 全程可解释。

Vector OS Nano 展示了核心抓取能力:零样本泛化抓取 — 用自然语言描述任意物体即可抓取,无需预训练、无需特定物体模型、无需微调。一个约 420 美元的机械臂,听懂你说的话并执行。

系统架构

用户(自然语言,中文/英文)
  |
  v
┌─────────────────────────────────────────────┐
│              LLM 大脑层                       │
│  Claude Haiku(通过 OpenRouter API)           │
│  - 意图解析 & 任务分解                         │
│  - 工具调用 / 函数执行                         │
│  - 多步骤规划                                 │
│  - 双语:中文 + 英文                           │
├─────────────────────────────────────────────┤
│              技能层                            │
│  ROS2 服务                                    │
│  - pick(物体)      detect_all()               │
│  - home() / scan() describe_scene()           │
│  - get_pose()      track(物体)                │
├─────────────────────────────────────────────┤
│              感知层                            │
│  - Moondream2 VLM(本地运行,约 4GB 显存)      │
│  - EdgeTAM 实时追踪(20fps)                   │
│  - D405 深度相机(640x480 RGB+D @30fps)       │
│  - 工作空间标定(相机→机械臂坐标映射)           │
├─────────────────────────────────────────────┤
│              控制层                            │
│  - Pinocchio FK/IK 求解器                     │
│  - 关节轨迹插值                                │
│  - 夹爪指令重试逻辑                            │
│  - 动态位置补偿                                │
├─────────────────────────────────────────────┤
│              硬件层                            │
│  - SO-ARM100(6自由度,STS3215 舵机)          │
│  - Intel RealSense D405(USB 3.x)            │
│  - 总成本:约 $420                             │
└─────────────────────────────────────────────┘

能力

能力 状态
零样本自然语言抓取 可用
实时物体追踪(20fps) 可用
VLM 场景描述 可用
中英文指令 可用
LLM 任务理解 可用
工作空间标定(14点) 可用
抓取失败自动重试 可用
动态夹爪补偿 可用
放置技能 计划中
多步骤任务规划 计划中

硬件(总计约 $420)

组件 型号 成本
机械臂 LeRobot SO-ARM100(6自由度,3D 打印) 约 $150
相机 Intel RealSense D405 约 $270
GPU 任意 NVIDIA 10GB+ 显存 (已有)
电脑 Ubuntu 22.04 + ROS2 Humble (已有)

快速开始

前置条件: Ubuntu 22.04, ROS2 Humble, NVIDIA GPU (10+ GB 显存), Python 3.10

# 克隆
git clone https://github.com/yusenthebot/vector-os-nano.git ~/Desktop/vector_ws
cd ~/Desktop/vector_ws

# 安装依赖
pip3 install pin pyserial httpx pyyaml

# 构建
conda deactivate
source /opt/ros/humble/setup.bash
export PATH="/usr/bin:$PATH"
colcon build --symlink-install --cmake-args "-DPYTHON_EXECUTABLE=/usr/bin/python3.10"

# 配置 OpenRouter API 密钥
mkdir -p config
# 编辑 config/system.yaml,填入你的 API key
# 获取地址: https://openrouter.ai/keys

运行(2 个终端):

# 终端 1 — 完整系统
conda deactivate && source ~/Desktop/vector_ws/install/setup.bash
export MOONDREAM_MODEL=vikhyatk/moondream2
ros2 launch so101_bringup perception.launch.py

# 终端 2 — 交互 CLI
conda deactivate && source ~/Desktop/vector_ws/install/setup.bash
ros2 run so101_bringup cli

使用:

vector> pick battery              # 抓取电池
vector> 捡起桌上的电池             # 中文自然语言
vector> detect                    # 检测所有可见物体
vector> 看看桌上有什么             # 场景描述
vector> home                      # 回到初始位置

首次标定

抓取前需要标定工作空间(5-10 分钟):

conda deactivate && source ~/Desktop/vector_ws/install/setup.bash
python3 src/so101_skills/scripts/calibrate_workspace.py

在 12-15 个测量位置放置物体,脚本计算相机到机械臂的坐标映射。

即将到来

Vector OS Nano 是抓取模块的概念验证。CMU 机器人研究所 正在开发的完整 Vector OS 栈包括:

  • SLAM + 导航 — 激光/视觉 SLAM、Nav2 集成、多楼层建图
  • 语义建图 — 3D 场景图、物体持久性、空间推理
  • 多机协调 — 舰队管理、任务分配、共享世界模型
  • 移动操作 — 轮式、足式、人形平台
  • 可解释规划 — 神经符号任务分解 + 推理链
  • 视觉伺服 — 亚毫米级闭环精确操作
  • 多模态人机交互 — 语音、手势、注视感知

Demo 和分阶段开源即将发布。Star 这个仓库,敬请关注。

开源协议

MIT License(仅限非商业用途)


Built by Vector Robotics with Claude Code

About

Vector OS Nano: Natural language controlled robot arm. $450 hardware, say 'pick up the battery' and it does. LeRobot SO-ARM100 + RealSense D405 + ROS2 + LLM.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors