Merge pull request #11 from blankey1337/feature/nvidia-sim-integration

blankey1337 · web-flow · commit a5db90e09106 · 2025-11-28T14:03:58.000-08:00
additional sim improvements, dialing in the workflow for remote training
diff --git a/docs/remote_simulation_workflow.md b/docs/remote_simulation_workflow.md
@@ -0,0 +1,110 @@
+# Remote Simulation & Training Workflow
+
+This guide outlines how to develop, train, and test AlohaMini using a remote cloud GPU (e.g., Lambda Labs, AWS) for the heavy simulation, while controlling everything from your local laptop (e.g., MacBook).
+
+## Architecture
+
+*   **Cloud Server (The "Lab")**: Runs NVIDIA Isaac Sim. Handles physics, rendering, and training.
+*   **Local Machine (The "Mission Control")**: Runs the Dashboard and Teleoperation scripts. Connects to the cloud via SSH.
+
+## Prerequisites
+
+1.  **Cloud Instance**: A server with an NVIDIA RTX GPU (A10, A100, RTX 3090/4090).
+    *   Recommended: Lambda Labs or Brev.dev (Ubuntu 20.04/22.04).
+    *   Must have **NVIDIA Drivers** and **Isaac Sim** installed (or use the Isaac Sim Docker container).
+2.  **Local Machine**: Your laptop (Mac/Windows/Linux).
+3.  **SSH Access**: You must be able to SSH into the cloud instance.
+
+## Setup
+
+### 1. Cloud Server Setup
+1.  SSH into your cloud instance.
+2.  Clone this repository:
+    ```bash
+    git clone https://github.com/blankey1337/AlohaMini.git
+    cd AlohaMini
+    ```
+3.  Ensure you are in the python environment that has access to Isaac Sim (often `./python.sh` in the Isaac Sim folder).
+
+### 2. Local Machine Setup
+1.  Clone this repository locally.
+2.  Install dependencies:
+    ```bash
+    pip install -r software/requirements.txt
+    ```
+
+## The Workflow
+
+### Phase 1: Data Collection
+
+1.  **Start the Simulation (Cloud)**
+    Run the simulation environment script. This listens on ports 5555 (Cmd) and 5556 (Obs).
+    ```bash
+    # On Cloud
+    isaac_sim_python software/examples/alohamini/isaac_sim/isaac_alohamini_env.py
+    ```
+
+2.  **Establish Connection (Local)**
+    Forward the ZMQ ports from the cloud to your localhost.
+    ```bash
+    # On Local Mac
+    ssh -L 5555:localhost:5555 -L 5556:localhost:5556 ubuntu@<CLOUD_IP>
+    ```
+
+3.  **Launch Dashboard (Local)**
+    Start the web dashboard to see what the robot sees.
+    ```bash
+    # On Local Mac
+    python software/dashboard/app.py
+    ```
+    Open `http://localhost:5001` in your browser.
+
+4.  **Teleoperate & Record**
+    *   Use the Dashboard to see the camera feed.
+    *   Run the teleop script in another terminal to control the robot with your keyboard:
+        ```bash
+        python software/examples/alohamini/standalone_teleop.py --ip 127.0.0.1
+        ```
+    *   **To Record**: Click the **"Start Recording"** button on the Dashboard.
+    *   Perform the task (e.g., pick up the object).
+    *   Click **"Stop Recording"**.
+    *   Repeat 50-100 times. The data is saved to `AlohaMini/data_sim/` on the **Cloud Server**.
+
+### Phase 2: Training
+
+You train the model directly on the Cloud GPU where the data lives.
+
+1.  **Stop the Simulation** (to free up GPU VRAM).
+2.  **Run Training**:
+    Use the LeRobot training script (or your custom training script) pointing to the generated dataset.
+    ```bash
+    # On Cloud
+    python software/src/lerobot/scripts/train.py \
+        --dataset data_sim \
+        --policy act \
+        --batch_size 8 \
+        --num_epochs 1000
+    ```
+    *Note: Exact training command depends on the LeRobot configuration.*
+
+3.  **Output**: This produces a model file (e.g., `outputs/train/policy.safetensors`).
+
+### Phase 3: Evaluation
+
+Test the trained model in the simulator to see if it works.
+
+1.  **Restart Simulation (Cloud)**:
+    ```bash
+    isaac_sim_python software/examples/alohamini/isaac_sim/isaac_alohamini_env.py
+    ```
+2.  **Run Inference Node (Cloud or Local)**:
+    You need a script that loads the model and closes the loop (reads obs -> runs model -> sends action).
+    *   *Coming Soon: `eval_sim.py` which loads the safetensor and drives the ZMQ robot.*
+
+3.  **Watch (Local)**:
+    Use the Dashboard to watch the robot perform the task autonomously.
+
+## Troubleshooting
+
+*   **Laggy Video**: ZMQ over SSH tunneling is usually fast enough for 640x480, but if it lags, check your internet connection speed to the cloud server.
+*   **"Address already in use"**: Ensure no other python scripts are using ports 5555/5556 on either machine.
diff --git a/software/dashboard/app.py b/software/dashboard/app.py
@@ -5,23 +5,32 @@
 import zmq
 import cv2
 import numpy as np
-from flask import Flask, render_template, Response, jsonify
+from flask import Flask, render_template, Response, jsonify, request
 
 app = Flask(__name__)
 
 # Global state
 latest_observation = {}
 lock = threading.Lock()
 connected = False
+recording = False
+cmd_socket = None
 
-def zmq_worker(ip='127.0.0.1', port=5556):
-    global latest_observation, connected
+def zmq_worker(ip='127.0.0.1', port=5556, cmd_port=5555):
+    global latest_observation, connected, cmd_socket
     context = zmq.Context()
+    
+    # Sub Socket
     socket = context.socket(zmq.SUB)
     socket.setsockopt(zmq.SUBSCRIBE, b"")
     socket.connect(f"tcp://{ip}:{port}")
     socket.setsockopt(zmq.CONFLATE, 1)
     
+    # Cmd Socket (Push to Sim)
+    cmd_socket = context.socket(zmq.PUSH)
+    cmd_socket.setsockopt(zmq.CONFLATE, 1)
+    cmd_socket.connect(f"tcp://{ip}:{cmd_port}")
+    
     print(f"Connecting to ZMQ Stream at {ip}:{port}...")
     
     while True:
@@ -93,6 +102,33 @@ def video_feed(camera_name):
     return Response(generate_frames(camera_name),
                     mimetype='multipart/x-mixed-replace; boundary=frame')
 
+@app.route('/api/command', methods=['POST'])
+def send_command():
+    global cmd_socket
+    if not request.json or 'command' not in request.json:
+        return jsonify({'error': 'No command provided'}), 400
+    
+    cmd = request.json['command']
+    print(f"Received command: {cmd}")
+    
+    # Example handling
+    if cmd == 'reset_sim':
+        # Send reset command (Isaac Sim needs to handle this logic)
+        # For now, we can just zero out velocities or send a special flag
+        if cmd_socket:
+            cmd_socket.send_string(json.dumps({"reset": True}))
+            
+    elif cmd == 'start_recording':
+        # Trigger recording logic
+        if cmd_socket:
+            cmd_socket.send_string(json.dumps({"start_recording": True}))
+            
+    elif cmd == 'stop_recording':
+        if cmd_socket:
+            cmd_socket.send_string(json.dumps({"stop_recording": True}))
+        
+    return jsonify({'status': 'ok'})
+
 @app.route('/api/status')
 def get_status():
     with lock:
diff --git a/software/dashboard/templates/index.html b/software/dashboard/templates/index.html
@@ -11,14 +11,27 @@
         .camera-box img { max-width: 100%; height: auto; display: block; }
         .camera-title { text-align: center; font-size: 0.9em; margin-bottom: 5px; }
         .status-panel { flex: 1; background: #333; padding: 15px; border-radius: 5px; min-width: 300px; }
+        .controls-panel { width: 100%; background: #333; padding: 15px; border-radius: 5px; margin-top: 20px; }
         table { width: 100%; border-collapse: collapse; }
         td, th { padding: 5px; border-bottom: 1px solid #444; font-size: 0.9em; }
         th { text-align: left; color: #aaa; }
         .value { font-family: monospace; color: #4f4; }
+        .btn { padding: 10px 20px; font-size: 16px; margin-right: 10px; cursor: pointer; background: #555; color: white; border: none; border-radius: 4px; }
+        .btn:hover { background: #777; }
+        .btn-record { background: #c00; }
+        .btn-record:hover { background: #e00; }
+        .status-indicator { display: inline-block; width: 10px; height: 10px; border-radius: 50%; margin-right: 5px; }
+        .status-active { background: #0f0; }
+        .status-inactive { background: #555; }
     </style>
 </head>
 <body>
-    <h1>AlohaMini Dashboard</h1>
+    <div style="display:flex; justify-content:space-between; align-items:center;">
+        <h1>AlohaMini Dashboard</h1>
+        <div>
+            <span id="connectionStatus" class="status-indicator status-inactive"></span> <span id="connectionText">Disconnected</span>
+        </div>
+    </div>
     
     <div class="container">
         <div class="camera-grid" id="cameraGrid">
@@ -33,14 +46,63 @@ <h3>Robot State</h3>
         </div>
     </div>
 
+    <div class="controls-panel">
+        <h3>Simulation Controls</h3>
+        <p>Control the remote simulation directly from here.</p>
+        <button class="btn" onclick="sendCommand('reset_sim')">Reset Simulation</button>
+        <button class="btn btn-record" id="recordBtn" onclick="toggleRecording()">Start Recording</button>
+        <span id="recordingStatus" style="margin-left: 10px; color: #aaa;"></span>
+    </div>
+
     <script>
         const knownCameras = ['head_top', 'head_back', 'head_front', 'wrist_left', 'wrist_right', 'cam_high', 'cam_low', 'cam_left_wrist', 'cam_right_wrist'];
         const activeCameras = new Set();
 
+        let isRecording = false;
+
+        function toggleRecording() {
+            isRecording = !isRecording;
+            const btn = document.getElementById('recordBtn');
+            const status = document.getElementById('recordingStatus');
+            
+            if (isRecording) {
+                btn.textContent = "Stop Recording";
+                btn.style.background = "#555";
+                status.textContent = "Recording in progress...";
+                sendCommand('start_recording');
+            } else {
+                btn.textContent = "Start Recording";
+                btn.style.background = "#c00";
+                status.textContent = "Saved.";
+                sendCommand('stop_recording');
+                setTimeout(() => status.textContent = "", 3000);
+            }
+        }
+
+        function sendCommand(cmd) {
+            console.log("Sending command:", cmd);
+            fetch('/api/command', {
+                method: 'POST',
+                headers: { 'Content-Type': 'application/json' },
+                body: JSON.stringify({ command: cmd })
+            });
+        }
+
         function updateStatus() {
             fetch('/api/status')
                 .then(response => response.json())
                 .then(data => {
+                    // Update connection indicator
+                    const connInd = document.getElementById('connectionStatus');
+                    const connText = document.getElementById('connectionText');
+                    if (data.connected) {
+                        connInd.className = 'status-indicator status-active';
+                        connText.textContent = "Connected";
+                    } else {
+                        connInd.className = 'status-indicator status-inactive';
+                        connText.textContent = "Disconnected";
+                    }
+
                     const table = document.getElementById('statusTable');
                     let rows = '';
                     
diff --git a/software/examples/alohamini/isaac_sim/isaac_alohamini_env.py b/software/examples/alohamini/isaac_sim/isaac_alohamini_env.py
@@ -1,7 +1,10 @@
 import base64
 import json
 import os
+import random
 import sys
+import time
+from datetime import datetime
 
 import cv2
 import numpy as np
@@ -40,6 +43,54 @@
 # Locate URDF
 URDF_PATH = os.path.join(repo_root, "software/src/lerobot/robots/alohamini/alohamini.urdf")
 
+class DatasetRecorder:
+    def __init__(self, root_dir="data"):
+        self.root_dir = root_dir
+        self.is_recording = False
+        self.current_episode_dir = None
+        self.frame_idx = 0
+        self.episode_idx = 0
+        
+        if not os.path.exists(root_dir):
+            os.makedirs(root_dir)
+            
+    def start_recording(self):
+        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+        self.current_episode_dir = os.path.join(self.root_dir, f"episode_{timestamp}")
+        os.makedirs(self.current_episode_dir)
+        os.makedirs(os.path.join(self.current_episode_dir, "images"))
+        self.frame_idx = 0
+        self.is_recording = True
+        print(f"Started recording to {self.current_episode_dir}")
+        
+    def stop_recording(self):
+        if self.is_recording:
+            print(f"Stopped recording. Saved {self.frame_idx} frames.")
+            self.is_recording = False
+            self.current_episode_dir = None
+            
+    def save_frame(self, obs, action):
+        if not self.is_recording:
+            return
+            
+        # Save JSON data (state + action)
+        data = {
+            "timestamp": time.time(),
+            "observation": {k: v for k, v in obs.items() if not isinstance(v, np.ndarray)},
+            "action": action
+        }
+        
+        with open(os.path.join(self.current_episode_dir, f"frame_{self.frame_idx:06d}.json"), "w") as f:
+            json.dump(data, f)
+            
+        # Save Images
+        for k, v in obs.items():
+            if isinstance(v, np.ndarray): # Image
+                img_path = os.path.join(self.current_episode_dir, "images", f"{k}_{self.frame_idx:06d}.jpg")
+                cv2.imwrite(img_path, v)
+                
+        self.frame_idx += 1
+
 class IsaacAlohaMini:
     def __init__(self, world, urdf_path):
         self.world = world
@@ -188,6 +239,8 @@ def main():
     
     print(f"Isaac Sim AlohaMini running. Ports: OBS={PORT_OBS}, CMD={PORT_CMD}")
     
+    recorder = DatasetRecorder(root_dir="data_sim")
+    
     while simulation_app.is_running():
         world.step(render=True)
         
@@ -203,7 +256,23 @@ def main():
             joint_cmds = {}
             vx, vy, vth = 0, 0, 0
             
+            # Check for system commands
+            if "start_recording" in cmd:
+                recorder.start_recording()
+                continue
+            if "stop_recording" in cmd:
+                recorder.stop_recording()
+                continue
+            
             for k, v in cmd.items():
+                if k == "reset" and v is True:
+                     # Reset
+                     print("Resetting robot...")
+                     world.reset()
+                     joint_cmds = {name: 0.0 for name in aloha.dof_names}
+                     aloha.set_joint_positions(joint_cmds)
+                     continue
+
                 if k.endswith(".pos"):
                     joint_name = k.replace(".pos", "")
                     joint_cmds[joint_name] = v
@@ -226,6 +295,20 @@ def main():
 
         # 2. Get Obs & Publish
         obs = aloha.get_observations()
+        
+        # Save frame if recording
+        # (Pass current command as action label for now, though it's imperfect as it's the *commanded* not *measured* action)
+        if recorder.is_recording:
+            # Reconstruct action dict from parsed values
+            # This is a simplification; ideally we log exactly what we sent
+            action_log = {
+                "x.vel": vx,
+                "y.vel": vy,
+                "theta.vel": vth,
+                # Add arm joint targets if we had them easily accessible here
+            }
+            recorder.save_frame(obs, action_log)
+        
         encoded_obs = {}
         
         # Process images