Skip to content

Commit e5cc5b1

Browse files
author
shijiashuai
committed
chore: replace legacy docs workflow with unified GitHub Pages deployment, update docs
1 parent eab7b50 commit e5cc5b1

6 files changed

Lines changed: 113 additions & 24 deletions

File tree

.github/workflows/ci.yml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,13 @@ on:
66
pull_request:
77
branches: [ master, main, develop ]
88

9+
permissions:
10+
contents: read
11+
12+
concurrency:
13+
group: ci-${{ github.workflow }}-${{ github.ref }}
14+
cancel-in-progress: true
15+
916
jobs:
1017
lint:
1118
runs-on: ubuntu-latest
Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,23 @@
1-
name: Deploy Docs to GitHub Pages
1+
name: Deploy Docs
22

33
on:
44
push:
55
branches: [master, main]
66
paths:
77
- "docs/**"
8-
- ".github/workflows/deploy-docs.yml"
8+
- "README.md"
9+
- "README.zh-CN.md"
10+
- ".github/workflows/pages.yml"
911
workflow_dispatch:
1012

11-
# 设置 GITHUB_TOKEN 权限以允许部署到 GitHub Pages
1213
permissions:
1314
contents: read
1415
pages: write
1516
id-token: write
1617

17-
# 只允许一个并发部署,跳过正在排队的运行
1818
concurrency:
1919
group: pages
20-
cancel-in-progress: false
20+
cancel-in-progress: true
2121

2222
jobs:
2323
build:
@@ -26,7 +26,9 @@ jobs:
2626
- name: Checkout
2727
uses: actions/checkout@v4
2828
with:
29-
fetch-depth: 0
29+
sparse-checkout: |
30+
docs
31+
sparse-checkout-cone-mode: false
3032

3133
- name: Setup Node.js
3234
uses: actions/setup-node@v4
@@ -43,20 +45,17 @@ jobs:
4345
run: npm run docs:build
4446
working-directory: docs
4547

46-
- name: Setup Pages
47-
uses: actions/configure-pages@v5
48-
4948
- name: Upload artifact
5049
uses: actions/upload-pages-artifact@v3
5150
with:
5251
path: docs/.vitepress/dist
5352

5453
deploy:
54+
needs: build
55+
runs-on: ubuntu-latest
5556
environment:
5657
name: github-pages
5758
url: ${{ steps.deployment.outputs.page_url }}
58-
needs: build
59-
runs-on: ubuntu-latest
6059
steps:
6160
- name: Deploy to GitHub Pages
6261
id: deployment

README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
# CleanBook — Smart Bookmark Cleaning & Classification
22

3-
[![Docs](https://img.shields.io/badge/docs-GitHub%20Pages-blue)](https://lessup.github.io/bookmarks-cleaner/)
3+
[![CI](https://github.com/LessUp/bookmarks-cleaner/actions/workflows/ci.yml/badge.svg)](https://github.com/LessUp/bookmarks-cleaner/actions/workflows/ci.yml)
4+
[![Docs](https://github.com/LessUp/bookmarks-cleaner/actions/workflows/pages.yml/badge.svg)](https://lessup.github.io/bookmarks-cleaner/)
5+
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
6+
[![Python](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/)
47

58
English | [简体中文](README.zh-CN.md)
69

README.zh-CN.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
# CleanBook —— 智能书签清理与分类
22

3-
[![Docs](https://img.shields.io/badge/docs-GitHub%20Pages-blue)](https://lessup.github.io/bookmarks-cleaner/)
3+
[![CI](https://github.com/LessUp/bookmarks-cleaner/actions/workflows/ci.yml/badge.svg)](https://github.com/LessUp/bookmarks-cleaner/actions/workflows/ci.yml)
4+
[![Docs](https://github.com/LessUp/bookmarks-cleaner/actions/workflows/pages.yml/badge.svg)](https://lessup.github.io/bookmarks-cleaner/)
5+
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
6+
[![Python](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/)
47

58
[English](README.md) | 简体中文
69

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# GitHub Pages 优化 (2026-03-10)
2+
3+
## 工作流修复与优化
4+
5+
- **pages.yml** — 修复路径触发引用错误(`deploy-docs.yml``pages.yml`,原文件名不匹配导致工作流自身变更不触发重建)
6+
- **pages.yml** — 扩展路径触发范围(新增 `README.md``README.zh-CN.md`
7+
- **pages.yml** — 添加 sparse-checkout 仅拉取 `docs/` 目录,加速 CI 构建
8+
- **pages.yml**`cancel-in-progress` 改为 `true`,避免文档部署排队堆积
9+
- **pages.yml** — 移除不必要的 `fetch-depth: 0``configure-pages` 步骤
10+
11+
## 文档站首页重写
12+
13+
- **docs/index.md** — 增强 Hero 区:tagline 补充 Python 版本信息,action 按钮新增"系统架构"入口
14+
- **docs/index.md** — Feature 卡片从 4 个扩展到 6 个:新增"统一 Emoji 清理"和"去重 · 健康巡检"
15+
- **docs/index.md** — 新增处理流水线 ASCII 架构图(BookmarkProcessor → AIClassifier → Standardizer → Exporter)
16+
- **docs/index.md** — 新增最小示例代码块和技术栈表格
17+
18+
## README 徽章
19+
20+
- **README.md / README.zh-CN.md** — 统一添加 CI、License、Python 徽章;Docs 徽章改为工作流状态徽章

docs/index.md

Lines changed: 68 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -4,25 +4,82 @@ layout: home
44
hero:
55
name: CleanBook
66
text: 智能书签清理与分类
7-
tagline: 规则 + 机器学习 + 可选 LLM默认离线可用
7+
tagline: 规则 + ML + 可选 LLM · 默认离线可用 · Python 3.10+
88
actions:
99
- theme: brand
1010
text: 快速上手
1111
link: /quickstart_zh
1212
- theme: alt
13-
text: 设计文档
14-
link: /DESIGN
13+
text: 系统架构
14+
link: /design/system_architecture
1515
- theme: alt
1616
text: GitHub
1717
link: https://github.com/LessUp/bookmarks-cleaner
1818

1919
features:
20-
- title: 规则优先
21-
details: 基于受控词表与分面分类,配置驱动,无需改代码即可定制规则与权重。
22-
- title: ML 辅助
23-
details: 高置信度样本自动沉淀为训练集,轻量 sklearn 模型增强分类准确率。
24-
- title: LLM 可选
25-
details: 支持 OpenAI 兼容接口(GPT-4o-mini 等),失败自动降级到离线路径。
26-
- title: 多格式导出
27-
details: 输出 HTML(Netscape 格式可导入浏览器)、Markdown、JSON,结构最多两级。
20+
- title: 🎯 规则优先 · 配置驱动
21+
details: 基于受控词表(Controlled Vocabulary)与分面分类(Faceted Classification),在 config.json 和 taxonomy/*.yaml 中定义规则与权重,无需改代码即可定制。
22+
- title: 🤖 ML 辅助 · 自动沉淀
23+
details: 高置信度样本自动沉淀为训练集,轻量 scikit-learn 模型渐进增强分类准确率;--train 一键训练。
24+
- title: 💡 LLM 可选 · 自动降级
25+
details: 支持 OpenAI 兼容接口(GPT-4o-mini 等),含二次聚类组织器;未配置或调用失败时自动回退到离线路径。
26+
- title: 📦 多格式导出
27+
details: 输出 HTML(Netscape 格式可直接导入浏览器)、Markdown、JSON;分类结构最多两级,结果简洁可读。
28+
- title: 🧹 统一 Emoji 清理
29+
details: 读入 → 标准化 → 导出三处兜底清理标题 emoji 前缀,避免跨浏览器导出时叠加重复。
30+
- title: 🔗 去重 · 健康巡检
31+
details: 快速去重 + 高级去重全时开启,合并跨浏览器导出更稳;可选 --health-check 链接可达性巡检。
2832
---
33+
34+
## 处理流水线
35+
36+
```
37+
浏览器书签 HTML
38+
39+
40+
┌─────────────────────────────────────────────┐
41+
│ BookmarkProcessor │
42+
│ 加载 → 快速去重 → 高级去重 → emoji 清理 │
43+
├─────────────────────────────────────────────┤
44+
│ AIBookmarkClassifier │
45+
│ ┌──────┐ ┌────┐ ┌──────┐ ┌────┐ ┌─────┐ │
46+
│ │ 规则 │→│ ML │→│ 语义 │→│画像│→│ LLM │ │
47+
│ └──────┘ └────┘ └──────┘ └────┘ └─────┘ │
48+
│ 加权投票 → 融合置信度 │
49+
├─────────────────────────────────────────────┤
50+
│ TaxonomyStandardizer │
51+
│ 受控词表映射 → subject + resource_type │
52+
├─────────────────────────────────────────────┤
53+
│ DataExporter │
54+
│ HTML · Markdown · JSON │
55+
└─────────────────────────────────────────────┘
56+
```
57+
58+
## 最小示例
59+
60+
```powershell
61+
# 安装(推荐 pipx)
62+
pipx install .
63+
64+
# 处理书签
65+
cleanbook -i examples/demo_bookmarks.html -o output
66+
67+
# 批处理 + 训练 ML
68+
cleanbook -i "tests/input/*.html" --train
69+
70+
# 交互向导
71+
cleanbook-wizard
72+
```
73+
74+
## 技术栈
75+
76+
| 组件 | 技术 |
77+
|------|------|
78+
| 语言 | Python 3.10+ |
79+
| CLI | Click + Rich(交互向导) |
80+
| 解析 | BeautifulSoup4 + lxml |
81+
| ML | scikit-learn · jieba · langdetect |
82+
| LLM | OpenAI 兼容接口(可选) |
83+
| 导出 | HTML (Netscape) · Markdown · JSON |
84+
| 分类体系 | 受控词表 + 分面分类(YAML 配置) |
85+
| 质量 | pytest · flake8 · mypy |

0 commit comments

Comments
 (0)