This project strictly follows Spec-Driven Development (SDD) paradigm. All code implementations must use the specification documents in the /specs directory as the single source of truth.
AI Agent Workflow:
- Review Specs First: Before writing code, read relevant product docs, RFCs, and API definitions in
/specs - Spec-First: For new features or interface changes, propose spec modifications first, wait for user confirmation before coding
- Implementation: Code must 100% comply with spec definitions (variable names, API paths, data types, etc.)
- Test Verification: Write unit and integration tests based on acceptance criteria in
/specs
For complete AI agent workflow instructions, see AGENTS.md.
CleanBook is an AI-powered bookmark classification system that automatically analyzes, classifies, and organizes browser bookmarks.
Core Components:
src/ai_classifier.py- Central orchestrator coordinating multiple classification strategiessrc/bookmark_processor.py- Batch bookmark processing coordinatorsrc/plugins/- Modular classifier plugins (rule, ML, embedding, LLM)src/services/- Cross-cutting services (embedding, taxonomy, performance monitoring)
- Follow PEP8 Python coding standards
- Use type hints throughout
- Complete docstrings for all functions and classes
- High-value comments explaining why, not what
- Configuration in JSON format with clear structure
- Ensemble learning combining multiple classification techniques
- LRU caching for performance optimization
- Dynamic rule addition and weight adjustment
- Confidence scoring system
- Parse HTML bookmark file
- Extract bookmark features
- Apply classification rules
- ML model prediction (if enabled)
- Result fusion and optimization
- Output multiple format files
- Multi-threaded parallel processing
- Intelligent caching strategy
- Batch processing mechanism
- Lazy component initialization
Main configuration file is config.json, containing:
category_rules: Classification rule definitionsai_settings: AI-related settingscategory_order: Category display ordertitle_cleaning_rules: Title cleaning rules
# Install dependencies
pip install -r requirements.txt
# Install in development mode
pip install -e .
# Run health check
python main.py --health-check
# Process bookmarks (CLI mode)
python main.py -i examples/demo_bookmarks.html
# Start interactive mode
python main.py --interactive
# Run tests
pytestProcessing generates three output formats:
- HTML: Importable to browsers
- JSON: Detailed classification metadata and statistics
- Markdown: Readable classification report
- ML features require additional dependencies (scikit-learn, jieba, etc.)
- Adjust thread count for optimal performance with large bookmark sets
- Customize classification rules and weights via configuration
- Supports both Chinese and English content
- System has learning capability from user feedback
The project includes comprehensive test suite:
- Unit tests covering core functionality
- Integration tests validating component coordination
- Property-based tests using Hypothesis
- End-to-end tests simulating complete workflows
Run tests: pytest
Adding new classification methods:
- Create new classifier plugin in
src/plugins/classifiers/ - Inherit from
BaseClassifier - Implement
classify()method - Register in
CLASSIFIER_REGISTRY - Add corresponding test cases