Xcrap Parser

Xcrap Parser is a declarative, model-driven parser for extracting data from HTML and JSON files, with the ability to interleave both to extract even more information.

It is inspired by the parser embedded in the Xcrap Framework available for Node.js. It was built using Parsel for HTML parsing and JMESPath for JSON parsing.

Installation

pip install xcrap-parser

Simple Usage

from xcrap_parser import HtmlParsingModel

html = "<html><title>Title</title><body><h1>Heading</h1></body></html>"

root_parsing_model = HtmlParsingModel({
    "title": {
        "query": "title::text"
    },
    "heading": {
        "query": "h1::text"
    }
})

data = root_parsing_model.parse(html)

print(data)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Xcrap Parser

Installation

Simple Usage

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Xcrap Parser

Installation

Simple Usage