Skip to content
This repository was archived by the owner on Mar 3, 2026. It is now read-only.

Latest commit

 

History

History
32 lines (22 loc) · 750 Bytes

File metadata and controls

32 lines (22 loc) · 750 Bytes

Xcrap Parser

Xcrap Parser is a declarative, model-driven parser for extracting data from HTML and JSON files, with the ability to interleave both to extract even more information.

It is inspired by the parser embedded in the Xcrap Framework available for Node.js. It was built using Parsel for HTML parsing and JMESPath for JSON parsing.

Installation

pip install xcrap-parser

Simple Usage

from xcrap_parser import HtmlParsingModel

html = "<html><title>Title</title><body><h1>Heading</h1></body></html>"

root_parsing_model = HtmlParsingModel({
    "title": {
        "query": "title::text"
    },
    "heading": {
        "query": "h1::text"
    }
})

data = root_parsing_model.parse(html)

print(data)