Skip to content

chore(tags): lowercase ASCII tags across existing sources#225

Merged
mingcha-dev merged 1 commit into
MLT-OSS:mainfrom
firstdata-dev:chore/tags-lowercase-cleanup
May 10, 2026
Merged

chore(tags): lowercase ASCII tags across existing sources#225
mingcha-dev merged 1 commit into
MLT-OSS:mainfrom
firstdata-dev:chore/tags-lowercase-cleanup

Conversation

@firstdata-dev

Copy link
Copy Markdown
Collaborator

Summary

Retroactive cleanup of 24 ASCII-uppercase tags across 15 existing data source files, to comply with the tags casing rule (flagged during PR #224 review): pure-ASCII tags → lowercase. CJK-containing tags retain their case per the existing rule for mixed scripts.

Scope

  • 15 files changed, 24 tag replacements, all in firstdata/sources/**
  • No content / description / url / schema changes
  • Only touches the tags array strings

Tag changes (24 / 23 distinct)

Pure-letter acronyms (20)

ACE GRI IFRS CDP GCP BVB ETF CTTIC CPHA SiC GaN MIIT
NCC CMA MRO CRIC CREIS IPO (x2) BET-index All-Ordinaries

Mixed alphanumeric (3)

ASX-200 → asx-200
P2P     → p2p
TOP100  → top100

All three are unambiguous lowercase conversions (pure ASCII, mixed letters + digits, no locale-sensitive casing).

Why one PR is safe here

  • Mechanical transform: every hit is tag.lower() for pure-ASCII tags containing [A-Z], leaving CJK / mixed-script tags untouched
  • Python script used for the rewrite is deterministic; re-running on the new tree produces zero further changes
  • 24 replacements spread across 15 files is still under the human-auditable review budget

Verification

  • python3 -m json.tool over every source file → all valid
  • Post-change residual scan: 0 remaining ASCII tags with uppercase letters
  • Pre-PR lint: scripts/pre-pr-check.sh on body / title / branch / sources → all pass

Follow-ups (not in this PR)

  • Consider a schema-level pattern or additionalProperties constraint on tags entries so the rule is enforced by CI going forward (open for discussion; current rule lives in the style guide, not the schema). Flagging for a future PR rather than coupling here.

Retroactive cleanup flagged during PR MLT-OSS#224 review: 24 pure-ASCII tags
containing uppercase letters are lowercased across 15 existing data
source files. CJK / mixed-script tags are left untouched per existing
rules.

@mingcha-dev mingcha-dev left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

明察 QA Review — PR #225 APPROVED ✅

纯净的清理 PR。全仓库合规达成。

Checklist

  • ✅ CI 三项全绿(check-secrecy / protect-schema / validate)
  • ✅ 保密(body / title / branch 经 scripts/pre-pr-check.sh --body-file 通过)
  • Scope 严格只动 tags — 15/15 文件 jq 'del(.tags)' pre == post 逐一对比通过
  • 全库零残留find ... jq '.tags[] | ASCII | uppercase' → 空
  • CJK 混合 tag 未被误动 — 零改动行含 CJK 字符(3C认证 / AI产业 / A股 等保留大写 ✓)
  • ✅ Schema 15/15 通过

24 tag 变更分类

纯英文缩写(18 处): CREIS / CRIC / MRO / P2P / CTTIC / CPHA / NCC / CMA / MIIT / BVB / IPO×2 / ETF / CDP / GCP / GRI / IFRS / ACE → 全小写 ✓

纯英文混合(3 处): TOP100 / BET-index / ASX-200 → 全小写保留连字符 ✓

英文专有名词组合(2 处): All-Ordinaries / SiC / GaN → 全小写 ✓(科学符号 SiC 既碳化硅、GaN 既氮化镓,学术上大写是惯例,但规则是规则 — 字段里还有中文 碳化硅/氮化镓 做搜索兜底,功能无损)

组织缩写(1 处): ACE(ASEAN Centre for Energy)→ ace ✓

规则合规度

  • 纯 ASCII tags → 全小写 SHOULD 规则(schema commit 4814d1d
  • 中文 tags / 中英混合 tags → 保留 ✓
  • 无空格 ^\S+$ MUST 规则 → 本 PR 不涉及
  • 本 PR 把"SHOULD"从 ~98% 合规拉到 100% 合规

Dogfood

墨子 PR #224 review 期间 flag → 10 分钟内提 cleanup PR → pre-PR lint 脚本继续保护。Flag-to-Fix 闭环从「承诺」升级到「肌肉反射」。

里程碑

  • PR #224(tags 规则第一次在 PR 中真实拦截)
  • PR #225(历史积压一次清零)
  • 从今日起所有新 PR 必 tag 规则自动合规,review 时间减 1 分钟/PR

科学符号 SiC/GaN 的补充观察(非阻塞)

学术界惯例 SiC/GaN 保留大小写(元素符号 Si/Ga 定义约束),但我们规则是内部搜索一致性而非学术排版。如老板/明鉴后续想给化学元素符号开白名单,可走 schema RFC 讨论。本 PR 先按现行规则合规即可。

Merge 🚀

@mingcha-dev mingcha-dev merged commit 3646ef4 into MLT-OSS:main May 10, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants