|
| 1 | +# CleanBook — Smart Bookmark Cleaning & Classification |
| 2 | + |
| 3 | +[](https://lessup.github.io/bookmarks-cleaner/) |
| 4 | + |
| 5 | +[简体中文](README.md) | English |
| 6 | + |
| 7 | +KISS: Rules + ML + optional LLM, offline-ready by default. Unified title emoji cleanup, powerful deduplication, outputs HTML/Markdown/JSON. |
| 8 | + |
| 9 | +## Features |
| 10 | + |
| 11 | +- Rules first, ML/semantic assist, optional LLM integration (auto-fallback on failure) |
| 12 | +- Unified title cleaning to avoid stacked emoji prefixes |
| 13 | +- Always-on deduplication for stable cross-browser export merging |
| 14 | +- Output classification limited to two levels for cleaner results |
| 15 | + |
| 16 | +## Installation (pipx Recommended) |
| 17 | + |
| 18 | +```powershell |
| 19 | +python -m pip install --user pipx |
| 20 | +python -m pipx ensurepath |
| 21 | +pipx install . |
| 22 | +``` |
| 23 | + |
| 24 | +Two commands available after installation: |
| 25 | + |
| 26 | +- `cleanbook`: Command-line processing (equivalent to `python main.py`) |
| 27 | +- `cleanbook-wizard`: Interactive wizard experience |
| 28 | + |
| 29 | +## Quick Example |
| 30 | + |
| 31 | +```powershell |
| 32 | +cleanbook -i examples/demo_bookmarks.html -o output |
| 33 | +cleanbook -i "tests/input/*.html" --train |
| 34 | +cleanbook-wizard |
| 35 | +``` |
| 36 | + |
| 37 | +Common flags: `--workers` parallel, `--train` train ML, `--no-ml` disable ML, `--health-check` reachability check. |
| 38 | + |
| 39 | +## LLM (Optional) |
| 40 | + |
| 41 | +Edit `config.json` to enable: |
| 42 | + |
| 43 | +```json |
| 44 | +"llm": { |
| 45 | + "enable": true, |
| 46 | + "provider": "openai", |
| 47 | + "base_url": "https://api.openai.com", |
| 48 | + "model": "gpt-4o-mini", |
| 49 | + "api_key_env": "OPENAI_API_KEY" |
| 50 | +} |
| 51 | +``` |
| 52 | + |
| 53 | +Set environment variable: |
| 54 | + |
| 55 | +```powershell |
| 56 | +$env:OPENAI_API_KEY = "your_api_key" |
| 57 | +``` |
| 58 | + |
| 59 | +Falls back to offline classification when key is unset or API fails. |
| 60 | + |
| 61 | +With `organizer.enable`, a secondary LLM pass clusters, sorts and summarizes categories after classification. |
| 62 | + |
| 63 | +## Project Structure |
| 64 | + |
| 65 | +``` |
| 66 | +. |
| 67 | +├─ src/ |
| 68 | +│ ├─ cleanbook/ # Unified CLI wrapper |
| 69 | +│ │ └─ cli.py |
| 70 | +│ ├─ ai_classifier.py # Rules + ML + semantic + user profile + LLM |
| 71 | +│ ├─ enhanced_classifier.py |
| 72 | +│ ├─ enhanced_clean_tidy.py |
| 73 | +│ ├─ bookmark_processor.py |
| 74 | +│ ├─ emoji_cleaner.py # Title emoji cleaning |
| 75 | +│ └─ ... |
| 76 | +├─ models/ # Models & cache |
| 77 | +├─ examples/ |
| 78 | +├─ docs/ |
| 79 | +├─ config.json |
| 80 | +├─ main.py # Top-level entry |
| 81 | +├─ pyproject.toml # Packaging & CLI entry points |
| 82 | +└─ changelog/ |
| 83 | +``` |
| 84 | + |
| 85 | +## Distribution |
| 86 | + |
| 87 | +- **Local/Team**: `pipx install .` for isolated global commands |
| 88 | +- **Open Source**: GitHub Release with example data; optionally publish to PyPI |
| 89 | +- **Windows standalone**: Optional PyInstaller single-file EXE |
| 90 | + |
| 91 | +## License |
| 92 | + |
| 93 | +MIT — see `LICENSE`. |
0 commit comments