Skip to content

nigiva/cobjectric

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

108 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Cobjectric

Complex Object Metric - A Python library for computing metrics on complex objects (JSON, dictionaries, lists, etc.).

CI codecov PyPI version PyPI downloads Python Version Documentation License Code style: black Ruff

πŸ“– Description

Cobjectric is a library designed to help developers calculate metrics on complex objects such as JSON, dictionaries, and arrays. It was originally created for Machine Learning projects where comparing and evaluating generated JSON structures against ground truth data was a repetitive manual task.

πŸ“¦ Installation

pip install cobjectric

πŸš€ Core Features

Cobjectric provides three main functionalities for analyzing complex structured data:

1. Fill Rate - Measure Data Completeness

Compute how "complete" your data is by measuring which fields are filled vs missing.

from cobjectric import BaseModel

class Person(BaseModel):
    name: str
    age: int
    email: str

person = Person.from_dict({
    "name": "John Doe",
    "age": 30,
    # email is missing
})

result = person.compute_fill_rate()
print(result.fields.name.value)   # 1.0 (present)
print(result.fields.age.value)    # 1.0 (present)
print(result.fields.email.value)  # 0.0 (missing)
print(result.mean())              # 0.667 (2 out of 3 fields filled)

Use cases: Data quality assessment, completeness scoring, field-level statistics.

2. Fill Rate Accuracy - Compare Completeness States

Compare the completeness of two models (got vs expected). Focus on field state (filled/missing), not on actual values.

got = Person.from_dict({"name": "John", "age": 30})           # email missing
expected = Person.from_dict({"name": "Jane", "age": 25, "email": "jane@example.com"})

accuracy = got.compute_fill_rate_accuracy(expected)
print(accuracy.fields.name.value)   # 1.0 (both filled)
print(accuracy.fields.age.value)    # 1.0 (both filled)
print(accuracy.fields.email.value)  # 0.0 (got missing, expected filled)
print(accuracy.mean())              # 0.667 (2 out of 3 states match)

Note: Fill Rate Accuracy compares state only (field present/missing), not values. To validate actual values, use Similarity.

Use cases: Validation pipelines, comparing generated vs expected data structures, quality control.

3. Similarity - Compare Values with Fuzzy Matching

Compare field values between two models with support for fuzzy text matching via rapidfuzz and intelligent list alignment strategies.

from cobjectric import BaseModel, Spec, ListCompareStrategy
from cobjectric.similarity import fuzzy_similarity_factory

class Person(BaseModel):
    name: str = Spec(similarity_func=fuzzy_similarity_factory("WRatio"))
    tags: list[Tag] = Spec(list_compare_strategy=ListCompareStrategy.OPTIMAL_ASSIGNMENT)

got = Person.from_dict({"name": "John Doe", "tags": [...]})
expected = Person.from_dict({"name": "john doe", "tags": [...]})

similarity = got.compute_similarity(expected)
print(similarity.fields.name.value)  # 0.99 (fuzzy match despite case difference)
print(similarity.fields.tags.mean()) # Uses optimal assignment for best matching

Key features:

  • Fuzzy text matching via rapidfuzz: handles typos, case differences, word order
  • List alignment strategies:
    • PAIRWISE: Compare by index (default)
    • LEVENSHTEIN: Order-preserving alignment based on similarity
    • OPTIMAL_ASSIGNMENT: Hungarian algorithm for best one-to-one matching
  • Numeric similarity: Gradual similarity based on difference thresholds

Use cases: ML model evaluation, fuzzy matching, comparing generated text with ground truth, list item matching.

Additional Features

  • Pre-defined Specs: Optimized Specs for common types (KeywordSpec, TextSpec, NumericSpec, BooleanSpec, DatetimeSpec)
  • Contextual Normalizers: Normalizers that receive field context for intelligent type coercion
  • Statistical Aggregation: mean(), std(), var(), min(), max(), quantile() on all results
  • Nested Models: Recursive computation on complex structures
  • List Aggregation: Access aggregated statistics across list items via items.aggregated_fields.name.mean()
  • Path Access: result["address.city"] or result["items[0].name"]
  • Pandas Export: Export results to pandas Series and DataFrames for analysis (requires cobjectric[pandas])
  • Custom Functions: Define your own fill rate, accuracy, or similarity functions per field
  • Field Normalizers: Transform values before validation

See the documentation for complete details.

πŸ“š Full Documentation

πŸ“– https://cobjectric.nigiva.com

The documentation includes:

πŸ› οΈ Development

Getting Started

Prerequisites

  • Python 3.13.9 or higher
  • uv - Fast Python package installer
  1. Install dependencies with uv (including optional extras for testing):
uv sync --dev --all-extras
  1. Install pre-commit hooks:
uv run pre-commit install --hook-type pre-push

Available Commands

The project uses invoke for task management.

To see all available commands:

uv run inv --list
# or shorter:
uv run inv -l

To get help on a specific command:

uv run inv --help <command>
# Example:
uv run inv --help precommit

Release Guide

See the RELEASE.md file for the release guide.

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

Citing Cobjectric

If you use Cobjectric in your research or projects, please consider citing it:

@software{cobjectric2025,
  author = {Nigiva},
  title = {Cobjectric: A Library for Computing Metrics on Complex Objects},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/nigiva/cobjectric}},
  version = {3.0.0}
}

About

Complex Object Metric - A Python library for computing metrics on complex objects (JSON, dictionaries, lists, etc.).

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages