Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 9 additions & 9 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ repos:
# pass_filenames: false
# # types: [python]


- id: black
name: black
language: system
Expand Down Expand Up @@ -39,24 +38,25 @@ repos:
# Tell ruff to fix sorting of imports
- "--fix"
- "--format=github"
- "--target-version=py37"
- "--target-version=py311"
- "."
# types: [python]
pass_filenames: false

# https://jaredkhan.com/blog/mypy-pre-commit
- id: mypy
name: mypy
# entry: python -c "import sys; print(sys.argv)"
entry: mypy
args: ["--check-untyped-defs"]
language: python
# use your preferred Python version
# language_version: python3.7
# additional_dependencies: ["mypy==0.790"]
types: [python]
# use require_serial so that script
# is only called once per commit
language_version: python3.13
require_serial: true
exclude: shape.py|compareshape.py
# To be able to exclude properly
pass_filenames: true
types: [python]
# args: ["--config-file=pyproject.toml"]
exclude: .*(shape|compareshape)\.py$
# Print the number of files as a sanity-check
# verbose: true

Expand Down
37 changes: 16 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,14 @@
# [Entityshape](https://www.wikidata.org/wiki/Q119899931)
A python library to compare a wikidata entity
(item or lexeme) with a
[Wikibase Entity Schema](https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas).
# [EntityValidator](https://www.wikidata.org/wiki/Q119899931)
A python library and FastAPI backend to compare a wikidata entity
(item or lexeme) with a [Wikibase Entity Schema](https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas).

Based on https://github.com/Teester/entityshape by Mark Tully
and https://github.com/dpriskorn/PyEntityshape by Dennis Priskorn

# Features
* compare a given wikidata item with an entityschema and dig into missing properties, too many statement, etc.
* determine whether an item is valid according to a certain schema or not
* support for any Wikibase
* the backend currently only support Wikidata, but the library has support for any Wikibase

# Limitations
The shape and compareshape classes currently only support:
Expand All @@ -22,41 +21,36 @@ It is still a bit unclear if and how the qualifier validation works.
Validation of lexemes is still considered experimental.
Feel free to open an issue with a working or non-working example.

# Installation
~~# Installation
Get it from pypi

`$ pip install pyentityshape`
`$ pip install entityvalidator`~~

# Usage

## Jupyter Notebooks
~~## Jupyter Notebooks
Example notebooks with code for validation of multiple items:
[hiking paths](https://public-paws.wmcloud.org/User:So9q/Validating%20a%20group%20of%20items-all-hiking-paths-in-sweden.ipynb)
[campsites](https://public-paws.wmcloud.org/User:So9q/Validating%20a%20group%20of%20items-all-campsites-in-sweden.ipynb)
[shelters](https://public-paws.wmcloud.org/User:So9q/Validating%20a%20group%20of%20items-all-shelters-in-sweden.ipynb)
[shelters](https://public-paws.wmcloud.org/User:So9q/Validating%20a%20group%20of%20items-all-shelters-in-sweden.ipynb)~~

## CLI
Example:
```
# Note that we default to English so the lang parameter here is optional.
import pprint
# Note that we default to Wikidata so the mediawiki_api_url and wikibase_url parameters here are optional.
e = EntityShape(eid="E1",
e = EntityValidator(eid="E1",
entity_id="Q1",
lang="en",
# mediawiki_api_url='http://localhost/api.php',
# wikibase_url='http://wikibase.svc'
)
result = e.validate_and_get_result()
# Get human readable result
print(result)
"Valid: False\nProperties_without_enough_correct_statements: instance of (P31)"
# Access the data
print(result.properties_without_enough_correct_statements)
"{'P31'}"
# Machine readable json result
pprint(e.get_result)
```

## Validation
The is_valid method on the Result object mimics all red warnings displayed by https://www.wikidata.org/wiki/User:Teester/EntityShape.js
The is_valid method on the Result object mimics all red warnings displayed by https://www.wikidata.org/wiki/User:Teester/entityvalidator.js

It currently checks these five conditions that all have to be false for the item to be valid:
1. properties with too many statements found
Expand Down Expand Up @@ -115,7 +109,8 @@ advice and help with Ruff to make this better.
GPLv3+

# What I learned
* Forking other peoples undocumented spaghetti code is not much fun.
* Forking other peoples undocumented code is not much fun.
* I want to find a more reliable validator that support somevalue and novalue
* Pydantic is wonderful yet again it makes working with OOP easy peasy :)
* Ruff is crazy fast and very nice!
* Ruff is crazy fast and very nice!
* FastAPI is super nice
109 changes: 109 additions & 0 deletions api.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
import logging
from typing import Any

from fastapi import APIRouter, FastAPI, HTTPException, Query
from starlette.responses import RedirectResponse

import config
from entityvalidator import (
ApiError,
EidError,
EntityIdError,
EntityValidator,
WikibaseEntitySchemaDownloadError,
)
from entityvalidator.exceptions import (
MissingInformationError,
NoEntitySchemaDataError,
WikibasePropertiesDownloadError,
)

app = FastAPI()
logging.basicConfig(level=config.loglevel)
logger = logging.getLogger(__name__)
router = APIRouter()


@app.get("/", include_in_schema=False)
def root_redirect():
return RedirectResponse(url="/docs")


@router.get("/validate")
def validate_entities(
# mandatory
eid: str = Query(..., description="EntitySchema ID, ex. E100"),
entity_ids: str = Query(
..., description="Comma-separated list of entity IDs, e.g. Q42,Q43"
),
# We only support Wikidata for now
# wikibase_url: str = Query(default="http://www.wikidata.org"),
# mediawiki_api_url: str = Query(default="https://www.wikidata.org/w/api.php")
) -> dict[str, Any]:
"""
Validate a list of entity IDs against a specific Wikibase entity schema.

Args:
eid (str): The EntitySchema ID to validate against (e.g., "E100").
entity_ids (str): A comma-separated list of entity IDs to validate (max 100 IDs).

Returns:
dict: A dictionary containing the validation results:
- results (list): Details of the validation.

Raises:
HTTPException: Raised in case of validation failure, missing data, API errors,
or unexpected exceptions.
"""
# Split by comma and strip whitespace
entity_list = [e.strip() for e in entity_ids.split(",") if e.strip()]

# Validate at least 1 entity
if not entity_list:
raise HTTPException(
status_code=400, detail="At least one entity ID must be provided."
)

# Optional: validate max length
if len(entity_list) > 100:
raise HTTPException(status_code=400, detail="Maximum 100 entity IDs allowed.")
try:
entity_validator = EntityValidator(
entity_ids=entity_list,
eid=eid,
)
entity_validator.__download_and_validate__()

return {
"results": entity_validator.get_results,
}

except (EntityIdError, EidError) as e:
raise HTTPException(
status_code=422,
detail={"error": "Invalid entity id", "message": str(e)},
) from e
except ApiError as e:
raise HTTPException(
status_code=502,
detail={"error": "Upstream API failed", "message": str(e)},
) from e
except (WikibaseEntitySchemaDownloadError, WikibasePropertiesDownloadError) as e:
raise HTTPException(
status_code=502,
detail={"error": "Wikibase download failed", "message": str(e)},
) from e
except (NoEntitySchemaDataError, MissingInformationError) as e:
raise HTTPException(
status_code=422,
detail={"error": "Missing data", "message": str(e)},
) from e
except Exception as e:
logger.exception("Unexpected error during validation")
raise HTTPException(
status_code=500,
detail={"error": "Internal server error", "message": str(e)},
) from e


app.include_router(router, prefix="/v1")
4 changes: 4 additions & 0 deletions config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
import logging

loglevel = logging.INFO
user_agent = "EntityValidator (https://github.com/dpriskorn/entityvalidator)"
71 changes: 0 additions & 71 deletions entityshape/__init__.py

This file was deleted.

11 changes: 0 additions & 11 deletions entityshape/models/property_value.py

This file was deleted.

Loading
Loading