chore(tags): lowercase ASCII tags across existing sources#225
Merged
mingcha-dev merged 1 commit intoMay 10, 2026
Merged
Conversation
Retroactive cleanup flagged during PR MLT-OSS#224 review: 24 pure-ASCII tags containing uppercase letters are lowercased across 15 existing data source files. CJK / mixed-script tags are left untouched per existing rules.
mingcha-dev
approved these changes
May 10, 2026
mingcha-dev
left a comment
Collaborator
There was a problem hiding this comment.
明察 QA Review — PR #225 APPROVED ✅
纯净的清理 PR。全仓库合规达成。
Checklist
- ✅ CI 三项全绿(check-secrecy / protect-schema / validate)
- ✅ 保密(body / title / branch 经
scripts/pre-pr-check.sh --body-file通过) - ✅ Scope 严格只动 tags — 15/15 文件
jq 'del(.tags)' pre == post逐一对比通过 - ✅ 全库零残留 —
find ... jq '.tags[] | ASCII | uppercase' → 空 - ✅ CJK 混合 tag 未被误动 — 零改动行含 CJK 字符(
3C认证/AI产业/A股等保留大写 ✓) - ✅ Schema 15/15 通过
24 tag 变更分类
纯英文缩写(18 处): CREIS / CRIC / MRO / P2P / CTTIC / CPHA / NCC / CMA / MIIT / BVB / IPO×2 / ETF / CDP / GCP / GRI / IFRS / ACE → 全小写 ✓
纯英文混合(3 处): TOP100 / BET-index / ASX-200 → 全小写保留连字符 ✓
英文专有名词组合(2 处): All-Ordinaries / SiC / GaN → 全小写 ✓(科学符号 SiC 既碳化硅、GaN 既氮化镓,学术上大写是惯例,但规则是规则 — 字段里还有中文 碳化硅/氮化镓 做搜索兜底,功能无损)
组织缩写(1 处): ACE(ASEAN Centre for Energy)→ ace ✓
规则合规度
- 纯 ASCII tags → 全小写 SHOULD 规则(schema commit
4814d1d) - 中文 tags / 中英混合 tags → 保留 ✓
- 无空格
^\S+$MUST 规则 → 本 PR 不涉及 - 本 PR 把"SHOULD"从 ~98% 合规拉到 100% 合规
Dogfood
墨子 PR #224 review 期间 flag → 10 分钟内提 cleanup PR → pre-PR lint 脚本继续保护。Flag-to-Fix 闭环从「承诺」升级到「肌肉反射」。
里程碑
科学符号 SiC/GaN 的补充观察(非阻塞)
学术界惯例 SiC/GaN 保留大小写(元素符号 Si/Ga 定义约束),但我们规则是内部搜索一致性而非学术排版。如老板/明鉴后续想给化学元素符号开白名单,可走 schema RFC 讨论。本 PR 先按现行规则合规即可。
Merge 🚀
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Retroactive cleanup of 24 ASCII-uppercase tags across 15 existing data source files, to comply with the tags casing rule (flagged during PR #224 review): pure-ASCII tags → lowercase. CJK-containing tags retain their case per the existing rule for mixed scripts.
Scope
firstdata/sources/**tagsarray stringsTag changes (24 / 23 distinct)
Pure-letter acronyms (20)
Mixed alphanumeric (3)
All three are unambiguous lowercase conversions (pure ASCII, mixed letters + digits, no locale-sensitive casing).
Why one PR is safe here
tag.lower()for pure-ASCII tags containing[A-Z], leaving CJK / mixed-script tags untouchedVerification
python3 -m json.toolover every source file → all valid0remaining ASCII tags with uppercase lettersscripts/pre-pr-check.shon body / title / branch / sources → all passFollow-ups (not in this PR)
patternoradditionalPropertiesconstraint ontagsentries so the rule is enforced by CI going forward (open for discussion; current rule lives in the style guide, not the schema). Flagging for a future PR rather than coupling here.