fix(build): 评测机离线构建前重建 vendor 校验和#260
Conversation
评测机 clone 时按隐藏文件递归过滤,会删除 os/vendor/*/.cargo-checksum.json, 而 cargo 的 vendored directory 源强制要求该文件,导致离线 make all 在加载 依赖阶段即失败(No such file or directory)。 新增 scripts/restore_vendor_checksums.py:从 Cargo.lock(非隐藏,过滤后存活) 离线重建各 crate 的 .cargo-checksum.json;Makefile os-cargo-config 在追加 vendored 源后调用之。已在判题 docker 镜像内 --network none 验证:递归删尽隐藏 文件后,两架构均成功产出 kernel-rv / kernel-la。
There was a problem hiding this comment.
Code Review
This pull request introduces a Python script scripts/restore_vendor_checksums.py and integrates it into the Makefile to reconstruct .cargo-checksum.json files from Cargo.lock for vendored dependencies, resolving build failures caused by the evaluation environment deleting hidden files. The code review feedback suggests several improvements to make the script more robust: restricting Cargo.lock parsing to [[package]] blocks to prevent key-value pollution, supporting single quotes when parsing package metadata from Cargo.toml, and explicitly specifying encoding="utf-8" when reading .cargo-checksum.json to prevent potential encoding issues on Windows.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| def parse_lock(path): | ||
| """Cargo.lock -> {(name, version): checksum}(仅含 registry crate)。""" | ||
| cks, name, ver, ck = {}, None, None, None | ||
| with open(path, encoding="utf-8") as f: | ||
| for line in f: | ||
| s = line.strip() | ||
| if s == "[[package]]": | ||
| if name and ver and ck: | ||
| cks[(name, ver)] = ck | ||
| name = ver = ck = None | ||
| elif s.startswith("name ="): | ||
| name = s.split("=", 1)[1].strip().strip('"') | ||
| elif s.startswith("version ="): | ||
| ver = s.split("=", 1)[1].strip().strip('"') | ||
| elif s.startswith("checksum ="): | ||
| ck = s.split("=", 1)[1].strip().strip('"') | ||
| if name and ver and ck: | ||
| cks[(name, ver)] = ck | ||
| return cks |
There was a problem hiding this comment.
问题分析:
当前 parse_lock 的实现没有限制解析范围在 [[package]] 块内。如果 Cargo.lock 中包含其他非 package 的配置段(例如 [metadata] 或 [workspace]),或者某个 package 是没有 checksum 的本地/Git 依赖,其解析出的 name 和 version 可能会被后续其他段的 key-value 污染,导致在文件末尾或下一个 package 处错误地关联了不属于它的校验和。
改进建议:
引入 in_package 状态标志,仅在 [[package]] 块内解析 name、version 和 checksum,并在遇到其他 [...] 段或文件结束时安全地提交当前 package。
def parse_lock(path):
"""Cargo.lock -> {(name, version): checksum}(仅含 registry crate)。"""
cks = {}
name = ver = ck = None
in_package = False
with open(path, encoding="utf-8") as f:
for line in f:
s = line.strip()
if s == "[[package]]":
if in_package and name and ver and ck:
cks[(name, ver)] = ck
name = ver = ck = None
in_package = True
elif s.startswith("["):
if in_package and name and ver and ck:
cks[(name, ver)] = ck
name = ver = ck = None
in_package = False
elif in_package:
if s.startswith("name ="):
name = s.split("=", 1)[1].strip().strip('"')
elif s.startswith("version ="):
ver = s.split("=", 1)[1].strip().strip('"')
elif s.startswith("checksum ="):
ck = s.split("=", 1)[1].strip().strip('"')
if in_package and name and ver and ck:
cks[(name, ver)] = ck
return cks| def pkg_name_ver(cargo_toml): | ||
| """从 vendored Cargo.toml 的 [package] 表读 name/version(已规范化为具体值)。""" | ||
| name = ver = None | ||
| in_pkg = False | ||
| with open(cargo_toml, encoding="utf-8") as f: | ||
| for line in f: | ||
| s = line.strip() | ||
| if s.startswith("["): | ||
| in_pkg = s == "[package]" | ||
| continue | ||
| if in_pkg: | ||
| if name is None: | ||
| m = re.match(r'name\s*=\s*"([^"]+)"', s) | ||
| if m: | ||
| name = m.group(1) | ||
| if ver is None: | ||
| m = re.match(r'version\s*=\s*"([^"]+)"', s) | ||
| if m: | ||
| ver = m.group(1) | ||
| if name and ver: | ||
| break | ||
| return name, ver |
There was a problem hiding this comment.
问题分析:
当前正则表达式仅支持双引号包裹的字符串(如 name = "foo")。虽然 cargo vendor 自动生成的 Cargo.toml 通常使用双引号,但 TOML 规范同样允许单引号(如 name = 'foo')。为了提高脚本的健壮性,建议兼容单引号和双引号。
改进建议:
修改正则表达式,使用 ["\'] 匹配单双引号。
def pkg_name_ver(cargo_toml):
"""从 vendored Cargo.toml 的 [package] 表读 name/version(已规范化为具体值)。"""
name = ver = None
in_pkg = False
with open(cargo_toml, encoding="utf-8") as f:
for line in f:
s = line.strip()
if s.startswith("["):
in_pkg = s == "[package]"
continue
if in_pkg:
if name is None:
m = re.match(r'name\s*=\s*["\']([^"\']+)["\']', s)
if m:
name = m.group(1)
if ver is None:
m = re.match(r'version\s*=\s*["\']([^"\']+)["\']', s)
if m:
ver = m.group(1)
if name and ver:
break
return name, ver| with open(ckfile) as f: | ||
| cur = json.load(f).get("package") |
There was a problem hiding this comment.
问题分析:
在读取 .cargo-checksum.json 时,open(ckfile) 没有指定 encoding="utf-8"。虽然该文件通常只包含 ASCII 字符,但在非 UTF-8 默认编码的系统环境(例如某些 Windows 环境)下,显式指定 encoding="utf-8" 可以避免潜在的 UnicodeDecodeError,并与脚本中其他 open 调用保持一致。
改进建议:
在 open 函数中显式添加 encoding="utf-8" 参数。
| with open(ckfile) as f: | |
| cur = json.load(f).get("package") | |
| with open(ckfile, encoding="utf-8") as f: | |
| cur = json.load(f).get("package") |
评测机 clone 时按隐藏文件递归过滤,会删除 os/vendor/*/.cargo-checksum.json, 而 cargo 的 vendored directory 源强制要求该文件,导致离线 make all 在加载 依赖阶段即失败(No such file or directory)。
新增 scripts/restore_vendor_checksums.py:从 Cargo.lock(非隐藏,过滤后存活) 离线重建各 crate 的 .cargo-checksum.json;Makefile os-cargo-config 在追加 vendored 源后调用之。已在判题 docker 镜像内 --network none 验证:递归删尽隐藏 文件后,两架构均成功产出 kernel-rv / kernel-la。