Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
328 changes: 328 additions & 0 deletions QUALITY_SCORING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,328 @@
# Multi-Dimensional Quality Scoring

## Overview

This module provides automated quality assessment for structured outputs (JSON, Markdown, Code, Text) using a multi-dimensional scoring algorithm.

## Features

- **Auto-format detection**: Automatically identifies content type
- **5-dimensional scoring**: Comprehensive quality assessment
- **Fast performance**: 100+ submissions per second
- **Actionable feedback**: Specific improvement suggestions
- **Threshold validation**: Pass/fail determination

## Scoring Dimensions

| Dimension | Weight | Description |
|-----------|--------|-------------|
| **Completeness** | 30% | Required fields/sections present |
| **Format Compliance** | 20% | Valid syntax, proper structure |
| **Coverage** | 25% | Depth and breadth of content |
| **Clarity** | 15% | Readability, organization |
| **Validity** | 10% | Logical consistency |

**Pass Threshold**: 0.70 (70%)

## Quality Ratings

| Score Range | Rating | Description |
|-------------|--------|-------------|
| 0.90+ | A+ | Excellent |
| 0.85-0.89 | A | Very Good |
| 0.80-0.84 | B+ | Good |
| 0.75-0.79 | B | Above Average |
| 0.70-0.74 | C+ | Acceptable |
| 0.65-0.69 | C | Below Average |
| 0.60-0.64 | D | Poor |
| < 0.60 | F | Failing |

## Installation

No external dependencies required. Uses Python 3.10+ standard library only.

```bash
# Copy the module
cp quality_scorer.py your_project/

# Run tests
python3 test_quality_scorer.py

# Run examples
python3 examples.py
```

## Usage

### Basic Usage

```python
from quality_scorer import QualityScorer

scorer = QualityScorer()
result = scorer.score(your_content)

print(f"Score: {result.weighted_score}")
print(f"Rating: {result.quality_rating}")
print(f"Pass: {result.pass_threshold}")
```

### Output Structure

```python
@dataclass
class QualityScore:
weighted_score: float # 0.0-1.0
quality_rating: str # A+, A, B+, B, C+, C, D, F
scores: Dict[str, float] # Individual dimension scores
feedback: List[str] # Improvement suggestions
pass_threshold: bool # True if >= 0.70
```

### Example Output

```json
{
"weighted_score": 0.847,
"quality_rating": "A",
"scores": {
"completeness": 0.900,
"format": 1.000,
"coverage": 0.850,
"clarity": 0.750,
"validity": 0.800
},
"feedback": [
"Detected format: json",
"JSON structure has good depth",
"Well-formatted with proper indentation"
],
"pass_threshold": true
}
```

## Format-Specific Scoring

### JSON

**Completeness**:
- Non-empty objects/arrays
- Nested structures present
- Reasonable key count (≥3)

**Format**:
- Valid JSON syntax
- Proper nesting

**Coverage**:
- Structure depth (≥2 levels)
- Key count (≥5)
- Content length (≥200 chars)

**Clarity**:
- Formatted with indentation
- Descriptive key names

**Validity**:
- No null/empty values
- No placeholder text

### Markdown

**Completeness**:
- Headers present
- Sufficient content (>100 chars)
- Lists or structure

**Format**:
- Valid header levels (≤6)
- No broken links
- Proper list syntax

**Coverage**:
- Multiple sections (≥3)
- List items (≥5)
- Word count (≥200)

**Clarity**:
- Logical header hierarchy
- Reasonable line length (<120)
- Whitespace separation

**Validity**:
- No placeholder text
- No empty sections

### Code

**Completeness**:
- Functions/classes present
- Comments/documentation
- Multi-line structure (>5 lines)

**Format**:
- Balanced braces/parentheses
- Proper syntax

**Coverage**:
- Multiple functions (≥3)
- Comment lines (≥5)
- Total lines (≥50)

**Clarity**:
- Proper indentation
- Reasonable line length (<100)
- Blank line separation

**Validity**:
- No syntax error markers
- No placeholder code

### Text

**Completeness**:
- Adequate word count (≥50)
- Multiple paragraphs
- Proper punctuation

**Format**:
- Proper spacing
- Capitalization
- No excessive newlines

**Coverage**:
- Multiple paragraphs (≥3)
- Sentences (≥10)
- Word count (≥200)

**Clarity**:
- Reasonable sentence length (10-25 words)
- Paragraph breaks
- Line length (<100)

**Validity**:
- No placeholder text
- No empty sections

## Performance

Tested on MacBook Pro M1:
- **100 submissions**: < 0.01s
- **1,000 submissions**: < 0.1s
- **10,000 submissions**: < 1s

Meets requirement: **100 submissions < 10s** ✅

## API Reference

### `QualityScorer`

Main scoring class.

#### Methods

##### `detect_format(content: str) -> str`

Auto-detect content format.

**Returns**: `'json'`, `'markdown'`, `'code'`, or `'text'`

##### `score(content: str) -> QualityScore`

Score content across all dimensions.

**Args**:
- `content`: Content to score

**Returns**: `QualityScore` object

### `score_submission(content: str) -> dict`

Convenience function returning dict instead of dataclass.

## Testing

```bash
# Run all tests
python3 test_quality_scorer.py

# Expected output:
# ✓ Format detection tests passed
# ✓ JSON scoring tests passed
# ✓ Markdown scoring tests passed
# ✓ Code scoring tests passed
# ✓ Text scoring tests passed
# ✓ Performance test passed: 100 submissions in 0.00s
# ✓ Edge case tests passed
# ✓ Dimension scoring tests passed
# ✓ Quality rating tests passed
# ✓ Feedback generation tests passed
# ✅ All tests passed!
```

## Examples

See `examples.py` for comprehensive usage examples:

```bash
python3 examples.py
```

Includes:
1. JSON content scoring
2. Markdown content scoring
3. Code content scoring
4. Batch scoring
5. Quality comparison

## Integration with ContentSplit API

```python
from quality_scorer import QualityScorer

# In your API endpoint
@app.post("/api/repurpose")
async def repurpose_content(request: RepurposeRequest):
# Generate content
results = generate_content(request)

# Score quality
scorer = QualityScorer()
for platform, content in results.items():
quality = scorer.score(content)
results[platform] = {
"content": content,
"quality_score": quality.weighted_score,
"quality_rating": quality.quality_rating
}

return results
```

## Limitations

- **Language**: English-optimized (works with other languages but may need tuning)
- **Context**: No semantic understanding (syntax/structure only)
- **Domain**: General-purpose (not specialized for specific domains)

## Future Enhancements

Potential improvements (not in scope for bounty):

- NLP-based feedback generation
- ML classifier for format detection
- Domain-specific rubrics
- Multi-language support
- Semantic similarity scoring

## License

MIT License - Free to use and modify

## Author

Built for Mint-Claw/content-split bounty #1

## Support

For issues or questions, open an issue on GitHub.
Loading