Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
99 commits
Select commit Hold shift + click to select a range
9aeffeb
0.10.7 Update (#101)
Xuwznln Oct 12, 2025
fb93b1c
fix startup env check.
Xuwznln Oct 12, 2025
eb1f3fb
Try fix one-key build on linux
Xuwznln Oct 12, 2025
6b5765b
Complete all one key installation
Xuwznln Oct 12, 2025
51d3e61
fix: rename schema field to resource_schema with serialization and va…
Mile-Away Oct 12, 2025
e0da1c7
Fix one-key installation build
Xuwznln Oct 12, 2025
2a8e8d0
Fix conda pack on windows
Xuwznln Oct 13, 2025
ef3f24e
add plr_to_bioyond, and refactor bioyond stations
TablewareBox Oct 13, 2025
b64466d
modify default config
TablewareBox Oct 13, 2025
c70eafa
Fix one-key installation build for windows
Xuwznln Oct 13, 2025
c85c498
Fix workstation startup
Xuwznln Oct 13, 2025
7c440d1
Fix/resource UUID and doc fix (#109)
Xuwznln Oct 16, 2025
0260cbb
Close #107
Xuwznln Oct 16, 2025
d4415f5
Fix/update resource (#112)
Xuwznln Oct 16, 2025
1b43c53
fix resource_get in action
TablewareBox Oct 17, 2025
166d84a
fix(reaction_station): 清空工作流序列和参数避免重复执行 (#113)
ZiWei09 Oct 17, 2025
bc30f23
Update create_resource device_id
Xuwznln Oct 20, 2025
37ee43d
Update ResourceTracker
TablewareBox Oct 18, 2025
bb3ca64
Update graphio together with workstation design.
ZiWei09 Oct 18, 2025
a2a827d
Update workstation & bioyond example
ZiWei09 Oct 21, 2025
9645609
PRCXI Update
qxw138 Oct 21, 2025
42b78ab
Update resource extra & uuid.
Xuwznln Oct 22, 2025
9bd72b4
Update workstation.
ZiWei09 Oct 27, 2025
5fc7eb7
封膜仪、撕膜仪、耗材站接口
ElijahChang929 Jun 7, 2025
8807865
添加Raman和xrd相关代码
WenzheG Nov 5, 2025
b6dfe2b
Resource update & asyncio fix
Xuwznln Oct 31, 2025
813400f
bump version to 0.10.9
Xuwznln Nov 14, 2025
872b3d7
PRCXI Reset Error Correction (#166)
ALITTLELZ Nov 14, 2025
304827f
1114物料手册定义教程byxinyu (#165)
lixinyu1011 Nov 14, 2025
448e007
3d sim (#97)
q434343 Nov 14, 2025
a242253
标准化opcua设备接入unilab (#78)
tt11142023 Nov 14, 2025
37e0f10
add new laiyu liquid driver, yaml and json files (#164)
xiaoyu10031 Nov 14, 2025
a625a86
HR物料同步,前端展示位置修复 (#135)
ZiWei09 Nov 14, 2025
b475db6
nmr
WenzheG Sep 29, 2025
4d3475a
Update devices
Xuwznln Nov 14, 2025
891f126
bump version to 0.10.10
Xuwznln Nov 14, 2025
48895a9
Update repo files.
Xuwznln Nov 14, 2025
4189a2c
Add get_resource_with_dir & get_resource method
Xuwznln Nov 15, 2025
549a502
fix camera & workstation & warehouse & reaction station driver
ZiWei09 Nov 16, 2025
75f0903
update docs, test examples
Xuwznln Nov 18, 2025
7f7b1c1
bump version to 0.10.11
Xuwznln Nov 18, 2025
acf5fde
Add startup_json_path, disable_browser, port config
Xuwznln Nov 18, 2025
d39662f
Update oss config
Xuwznln Nov 18, 2025
931614f
feat(bioyond_studio): 添加项目API接口支持及优化物料管理功能
ZiWei09 Nov 18, 2025
a662c75
feat(bioyond): 添加测量小瓶仓库和更新仓库工厂函数参数
ZiWei09 Nov 19, 2025
554bcad
Support unilabos_samples key
Xuwznln Nov 19, 2025
d328282
add session_id and normal_exit
Xuwznln Nov 20, 2025
8fa3407
Add result schema and add TypedDict conversion.
Xuwznln Nov 25, 2025
f1ad0c9
Fix port error
Xuwznln Nov 25, 2025
ffc583e
Add backend api and update doc
Xuwznln Nov 26, 2025
ed8ee29
Add get_regular_container func
Xuwznln Nov 27, 2025
d390236
Add get_regular_container func
Xuwznln Nov 27, 2025
6fdd482
Transfer_liquid (#176)
ALITTLELZ Nov 26, 2025
c7c14d2
Auto dump logs, fix workstation input schema
Xuwznln Nov 27, 2025
5ce433e
Fix startup with remote resource error
ZiWei09 Nov 28, 2025
52544a2
signal when host node is ready
Xuwznln Dec 2, 2025
9854ed8
fix ros2 future
Xuwznln Dec 4, 2025
b1cdef9
update version to 0.10.12
Xuwznln Dec 4, 2025
53219d8
Update docs
qxw138 Dec 11, 2025
13a6795
Update organic syn station.
Xuwznln Dec 14, 2025
5dc81ec
bump version to 0.10.3
ZiWei09 Dec 18, 2025
28f9373
Close #208. Fix mock devices.
Xuwznln Dec 28, 2025
bc8c49d
test_transfer_liquid
qxw138 Dec 26, 2025
6ca5c72
Fix drag materials.
Xuwznln Jan 6, 2026
121c398
Update LICENSE
Xuwznln Jan 7, 2026
266366c
Bump version to 0.10.4
Xuwznln Jan 7, 2026
8066c20
Update README.md
Xuwznln Jan 7, 2026
0241568
Fix build on macos-intel
Xuwznln Jan 7, 2026
3f80349
Force update resource when adding new resource / transfer to another …
Xuwznln Jan 7, 2026
8580b84
Fix update with different spot and same parent
Xuwznln Jan 7, 2026
2a5ddd6
Upgrade to py 3.11.14; ROS2 Humble 0.7; unilabos 0.10.16
Xuwznln Jan 8, 2026
38c5c26
Fix Conda Build
dependabot[bot] Jan 27, 2026
176de52
v0.10.17
Xuwznln Jan 27, 2026
a277bd2
CI Check use production mode
Xuwznln Jan 27, 2026
3a2d9e9
transfer liquid handles
Xuwznln Jan 27, 2026
5179a7e
workflow upload & set liquid fix & add set liquid with plate
Xuwznln Jan 28, 2026
b551e69
no opcua installation on macos
Xuwznln Feb 2, 2026
06b6f0d
v0.10.18
Xuwznln Feb 28, 2026
a79c0a8
fix container volume
Xuwznln Mar 2, 2026
145fcaa
support container as example
Xuwznln Mar 3, 2026
c001f6a
v0.10.19
Xuwznln Mar 4, 2026
ccbf537
update workbench example
Xuwznln Mar 6, 2026
67a7417
v0.11.0
Xuwznln Apr 15, 2026
1ad4766
fix possible conversion error
Xuwznln Apr 21, 2026
71107e9
use gitee to install pylabrobot
Xuwznln Apr 22, 2026
f6b2bfa
upgrade to 0.11.1
Xuwznln Apr 22, 2026
f71ea2a
Support display_name & desc in new registry system
Xuwznln Apr 27, 2026
916a6df
env installation fix
Xuwznln Apr 27, 2026
bcb1790
v0.11.2
Xuwznln May 14, 2026
8210bad
0521本地部署联通已实现,但是前端操作edge没有执行动作
lixinyu1011 May 21, 2026
42c7d19
本地部署,修复了终止任务和重试任务按钮响应问题,原因future函数阻塞,线程唤醒
lixinyu1011 May 21, 2026
8ba4138
fix macos x64 conda artifacts
Xuwznln May 23, 2026
35de4a5
sync recent dev changes to main
Xuwznln May 23, 2026
19ca6b5
v0.11.3
raoyi971102-gif May 23, 2026
6501468
fix windows conda run encoding in workflows
Xuwznln May 24, 2026
1b39662
增加跳过选项
lixinyu1011 May 27, 2026
2ffbf47
修改edge上报用户执行策略跳过时的处理逻辑
lixinyu1011 May 28, 2026
d3c4a1f
Merge branch 'main' into feat/lixinyu/dev
lixinyu1011 May 29, 2026
cdba8fd
0529分支
lixinyu1011 May 29, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions .claude/state/current.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
schema_version: '2.0'
session_id: s-1778653243-0ce4eb
role: standalone
status: active
started_at: '2026-05-13T06:20:43Z'
last_heartbeat: '2026-05-28T03:47:33Z'
branch: main
cwd: /Users/dp/software/GitHub/LeapLab/Uni-Lab-OS
current_task: ''
decisions: []
blockers: []
artifacts: []
metrics:
edits: 18
commits: 0
errors_fixed: 0
decisions_made: 0
Empty file added .claude/state/current.yaml.lock
Empty file.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -250,5 +250,6 @@ local_test2.py
ros-humble-unilabos-msgs-0.9.13-h6403a04_5.tar.bz2
*.bz2
test_config.py
.history


301 changes: 301 additions & 0 deletions docs/reconnection_design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,301 @@
# 断线重连功能设计文档

## 需求背景

当 edge 端与后端断开连接后重新连接时,需要同步断线期间的状态变化:
- edge 端在启动时会发送 `host_node_ready` 消息给后端
- 后端需要返回断线之前下发给该 edge 的任务列表
- edge 端需要检查当前实际状态,并将状态同步给后端

## 消息流程

```
Edge Backend
| |
|---> host_node_ready -------->|
| (设备列表、动作列表) |
| |
|<--- host_node_ready_response-|
| (断线前的任务列表) |
| |
|--- 检查本地状态 ---| |
| |
|---> reconnection_sync ------>|
| (实际状态同步) |
```

## 数据结构

### host_node_ready_response 消息格式

```json
{
"action": "host_node_ready_response",
"data": {
"pending_jobs": [
{
"job_id": "uuid",
"task_id": "uuid",
"device_id": "device_name",
"action_name": "action_name",
"status": "started|ready|queue",
"timestamp": 1234567890.0
}
]
}
}
```

### reconnection_sync 消息格式

```json
{
"action": "reconnection_sync",
"data": {
"synced_jobs": [
{
"job_id": "uuid",
"actual_status": "completed|failed|not_found",
"result": {},
"error": "error message if failed"
}
],
"active_jobs": [
{
"job_id": "uuid",
"device_id": "device_name",
"action_name": "action_name",
"status": "started"
}
]
}
}
```

## 实现方案

### 1. 消息处理 (ws_client.py)

在 `MessageProcessor` 类中添加:

```python
async def _handle_host_ready_response(self, data: Dict[str, Any]):
"""
处理 host_node_ready 的响应

后端返回断线前下发的任务列表,需要:
1. 检查这些任务在 edge 端的实际状态
2. 将实际状态同步给后端
"""
pending_jobs = data.get("pending_jobs", [])

if not pending_jobs:
logger.info("[Reconnection] No pending jobs from backend")
return

logger.info(f"[Reconnection] Received {len(pending_jobs)} pending jobs from backend")

# 收集实际状态
synced_jobs = []
active_jobs = []

for job_info in pending_jobs:
job_id = job_info.get("job_id")
device_id = job_info.get("device_id")
action_name = job_info.get("action_name")
backend_status = job_info.get("status")

# 检查本地状态
actual_status = await self._check_job_actual_status(job_id, device_id, action_name)

if actual_status["status"] == "completed" or actual_status["status"] == "failed":
# 任务已完成,同步结果
synced_jobs.append({
"job_id": job_id,
"actual_status": actual_status["status"],
"result": actual_status.get("result", {}),
"error": actual_status.get("error", "")
})
elif actual_status["status"] == "running":
# 任务仍在运行
active_jobs.append({
"job_id": job_id,
"device_id": device_id,
"action_name": action_name,
"status": "started"
})
else:
# 任务不存在,可能已被清理
synced_jobs.append({
"job_id": job_id,
"actual_status": "not_found",
"error": "Job not found in edge"
})

# 发送同步消息
self.send_message({
"action": "reconnection_sync",
"data": {
"synced_jobs": synced_jobs,
"active_jobs": active_jobs
}
})

logger.info(f"[Reconnection] Synced {len(synced_jobs)} completed jobs, {len(active_jobs)} active jobs")
```

### 2. 状态检查 (ws_client.py)

```python
async def _check_job_actual_status(
self, job_id: str, device_id: str, action_name: str
) -> Dict[str, Any]:
"""
检查任务的实际状态

返回:
- status: "completed" | "failed" | "running" | "not_found"
- result: 任务结果 (如果已完成)
- error: 错误信息 (如果失败)
"""
# 1. 检查 DeviceActionManager 中的任务状态
job_info = self.device_manager.get_job_info(job_id)
if job_info:
if job_info.status == JobStatus.STARTED:
return {"status": "running"}
elif job_info.status == JobStatus.QUEUE or job_info.status == JobStatus.READY:
return {"status": "running"} # 排队中也算运行中

# 2. 检查 HostNode 中的 ROS2 goal 状态
host_node = HostNode.get_instance(0)
if host_node:
goal_status = host_node.get_goal_status(job_id)
if goal_status:
if goal_status["is_active"]:
return {"status": "running"}
elif goal_status["is_succeeded"]:
return {
"status": "completed",
"result": goal_status.get("result", {})
}
elif goal_status["is_failed"]:
return {
"status": "failed",
"error": goal_status.get("error", "Unknown error")
}

# 3. 检查本地存储的任务结果 (如果有持久化)
from unilabos.app.web.controller import get_job_result
stored_result = get_job_result(job_id)
if stored_result:
return {
"status": stored_result["status"],
"result": stored_result.get("result", {}),
"error": stored_result.get("error", "")
}

# 4. 任务不存在
return {"status": "not_found"}
```

### 3. HostNode 扩展 (host_node.py)

在 `HostNode` 类中添加方法:

```python
def get_goal_status(self, job_id: str) -> Optional[Dict[str, Any]]:
"""
获取指定 job_id 的 ROS2 goal 状态

返回:
- is_active: 是否正在执行
- is_succeeded: 是否成功完成
- is_failed: 是否失败
- result: 结果数据 (如果已完成)
- error: 错误信息 (如果失败)
"""
with self._goal_handles_lock:
if job_id in self._goal_handles:
goal_handle = self._goal_handles[job_id]
status = goal_handle.status

return {
"is_active": status in [GoalStatus.STATUS_EXECUTING, GoalStatus.STATUS_ACCEPTED],
"is_succeeded": status == GoalStatus.STATUS_SUCCEEDED,
"is_failed": status in [GoalStatus.STATUS_ABORTED, GoalStatus.STATUS_CANCELED],
"result": self._goal_results.get(job_id, {}),
"error": self._goal_errors.get(job_id, "")
}

return None
```

## 状态处理逻辑

### 场景 1: 任务已完成但后端未收到

- Edge 检测到任务已完成
- 通过 `reconnection_sync` 发送完整的结果给后端
- 后端更新任务状态

### 场景 2: 任务仍在执行

- Edge 检测到任务仍在运行
- 通过 `reconnection_sync` 告知后端任务状态为 `started`
- 后端保持任务状态,等待后续的 `job_status` 更新

### 场景 3: 任务不存在

- Edge 在本地找不到该任务
- 可能原因:
- 任务从未到达 edge
- 任务已完成并被清理
- Edge 重启导致内存状态丢失
- 通过 `reconnection_sync` 告知后端 `not_found`
- 后端决定是否重新下发或标记为失败

### 场景 4: 任务在队列中

- Edge 检测到任务在排队
- 通过 `reconnection_sync` 告知后端任务状态为 `queue`
- 后端保持任务状态,等待执行

## 注意事项

1. **时序问题**: `host_node_ready_response` 可能在 edge 启动后立即到达,此时某些组件可能尚未完全初始化
- 解决: 在处理前检查 HostNode 是否就绪

2. **并发问题**: 在检查状态时,任务状态可能正在变化
- 解决: 使用锁保护关键数据结构的访问

3. **结果持久化**: 如果 edge 重启,内存中的任务状态会丢失
- 当前方案: 依赖 `get_job_result` 从本地存储读取
- 未来优化: 考虑将关键任务状态持久化到文件

4. **大量任务**: 如果断线时间较长,可能有大量待同步任务
- 解决: 分批处理,避免单次消息过大

## 测试场景

1. **正常重连**: edge 断线后立即重连,任务仍在执行
2. **延迟重连**: edge 断线较长时间后重连,部分任务已完成
3. **重启重连**: edge 进程重启后重连,内存状态丢失
4. **多任务重连**: 断线期间有多个设备的多个任务
5. **并发重连**: 重连时有新任务下发

## 实现步骤

1. ✅ 在 `MessageProcessor._process_message` 中添加 `host_node_ready_response` 处理分支
2. ⬜ 实现 `_handle_host_ready_response` 方法
3. ⬜ 实现 `_check_job_actual_status` 方法
4. ⬜ 在 `HostNode` 中添加 `get_goal_status` 方法
5. ⬜ 添加必要的数据结构 (如 `_goal_results`, `_goal_errors`)
6. ⬜ 编写单元测试
7. ⬜ 集成测试

## 后续优化

1. 添加任务状态持久化,支持进程重启后恢复
2. 优化大量任务的同步性能
3. 添加状态同步的重试机制
4. 支持增量状态同步 (只同步变化的部分)
4 changes: 4 additions & 0 deletions stat.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
cd /Users/dp/software/GitHub/LeapLab &&
source /Users/dp/miniforge3/etc/profile.d/conda.sh &&
conda activate unilab &&
unilab --graph unilabos/test/experiments/fault_injection.json --config unilabos/test/experiments/fault_injection_config.py --ak a3d111bb-571a-4548-aa5d-c58ccca64466 --sk c2450c73-e84c-4319-b25f-b5cc4d575e7e --upload_registry --addr http://127.0.0.1:48197/api/v1
Loading
Loading