QWEN.md - Qwen Code Interaction Guide

⚠️ Important: Spec-Driven Development (SDD)

This project strictly follows Spec-Driven Development (SDD) paradigm. All code implementations must use the specification documents in the /specs directory as the single source of truth.

AI Agent Workflow:

Review Specs First: Before writing code, read relevant product docs, RFCs, and API definitions in /specs
Spec-First: For new features or interface changes, propose spec modifications first, wait for user confirmation before coding
Implementation: Code must 100% comply with spec definitions (variable names, API paths, data types, etc.)
Test Verification: Write unit and integration tests based on acceptance criteria in /specs

For complete AI agent workflow instructions, see AGENTS.md.

Project Overview

CleanBook is an AI-powered bookmark classification system that automatically analyzes, classifies, and organizes browser bookmarks.

Core Components:

src/ai_classifier.py - Central orchestrator coordinating multiple classification strategies
src/bookmark_processor.py - Batch bookmark processing coordinator
src/plugins/ - Modular classifier plugins (rule, ML, embedding, LLM)
src/services/ - Cross-cutting services (embedding, taxonomy, performance monitoring)

Code Style Conventions

Follow PEP8 Python coding standards
Use type hints throughout
Complete docstrings for all functions and classes
High-value comments explaining why, not what
Configuration in JSON format with clear structure

Key Implementation Details

Classifier Architecture

Ensemble learning combining multiple classification techniques
LRU caching for performance optimization
Dynamic rule addition and weight adjustment
Confidence scoring system

Processing Flow

Parse HTML bookmark file
Extract bookmark features
Apply classification rules
ML model prediction (if enabled)
Result fusion and optimization
Output multiple format files

Performance Optimization

Multi-threaded parallel processing
Intelligent caching strategy
Batch processing mechanism
Lazy component initialization

Configuration Files

Main configuration file is config.json, containing:

category_rules: Classification rule definitions
ai_settings: AI-related settings
category_order: Category display order
title_cleaning_rules: Title cleaning rules

Common Commands

# Install dependencies
pip install -r requirements.txt

# Install in development mode
pip install -e .

# Run health check
python main.py --health-check

# Process bookmarks (CLI mode)
python main.py -i examples/demo_bookmarks.html

# Start interactive mode
python main.py --interactive

# Run tests
pytest

Output File Formats

Processing generates three output formats:

HTML: Importable to browsers
JSON: Detailed classification metadata and statistics
Markdown: Readable classification report

Important Notes

ML features require additional dependencies (scikit-learn, jieba, etc.)
Adjust thread count for optimal performance with large bookmark sets
Customize classification rules and weights via configuration
Supports both Chinese and English content
System has learning capability from user feedback

Test Strategy

The project includes comprehensive test suite:

Unit tests covering core functionality
Integration tests validating component coordination
Property-based tests using Hypothesis
End-to-end tests simulating complete workflows

Run tests: pytest

Extension Development Guide

Adding new classification methods:

Create new classifier plugin in src/plugins/classifiers/
Inherit from BaseClassifier
Implement classify() method
Register in CLASSIFIER_REGISTRY
Add corresponding test cases

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QWEN.md - Qwen Code Interaction Guide

⚠️ Important: Spec-Driven Development (SDD)

Project Overview

Code Style Conventions

Key Implementation Details

Classifier Architecture

Processing Flow

Performance Optimization

Configuration Files

Common Commands

Output File Formats

Important Notes

Test Strategy

Extension Development Guide

FilesExpand file tree

QWEN.md

Latest commit

History

QWEN.md

File metadata and controls

QWEN.md - Qwen Code Interaction Guide

⚠️ Important: Spec-Driven Development (SDD)

Project Overview

Code Style Conventions

Key Implementation Details

Classifier Architecture

Processing Flow

Performance Optimization

Configuration Files

Common Commands

Output File Formats

Important Notes

Test Strategy

Extension Development Guide