Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 11 additions & 3 deletions docs/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,15 +85,15 @@ Once you have your DAR file, you can use the `dar_parser.py` tool to read and an

### Step 1: Parse the DAR File

Run the `dar_parser.py` script to parse the DAR file and extract key information.
Run the `dar_parser.py` script with the `--summary` flag to parse the DAR file and display a summary.

```bash
python tools/dar_parser.py
python tools/dar_parser.py --summary output.dar
```

### Step 2: View the Parsed Data

The script will prompt you for the path to your DAR file and output a summary, including:
The script will output a summary that includes:
- Number of renders captured.
- Summary of scraping results.
- Number of HTTP request entries.
Expand All @@ -110,6 +110,14 @@ parser = DARParser('output.dar')
parser.print_dar_summary()
```

### Step 3: Validate the DAR File

Use the validator script to ensure the file conforms to the DAR schema:

```bash
python tools/validators/dar_validator.py --validate output.dar
```

## Integrating DAR in Your Scraping Workflow

DAR is especially useful when integrated into web scraping workflows, providing insights into how pages are loaded and data is collected. Here’s how you can incorporate DAR into your existing scraping setup.
Expand Down
10 changes: 9 additions & 1 deletion docs/usage-examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ Parsing DAR files is straightforward using the `dar_parser.py` script. This pars
2. **Parse the DAR file using the `dar_parser.py` script**:

```bash
python tools/dar_parser.py
python tools/dar_parser.py --summary sample_output.dar
```

3. **View the parsed data**: The script will output a summary of the DAR file, including the number of renders, summary of results, and number of request entries.
Expand Down Expand Up @@ -122,6 +122,14 @@ parser = DARParser('session.dar')
parser.print_dar_summary()
```

## Validating a DAR File

Before using a DAR file in production, you can validate it against the schema:

```bash
python tools/validators/dar_validator.py --validate session.dar
```

## Error Handling and Metrics

DAR files include detailed error logs and performance metrics that are invaluable for debugging and optimizing your scraping processes. These insights allow you to fine-tune your approach and ensure data quality.
Expand Down
28 changes: 24 additions & 4 deletions tools/dar_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
"""

import json
import argparse


class DARParser:
Expand Down Expand Up @@ -89,12 +90,31 @@ def print_dar_summary(self):
print(f"Number of Request Entries: {len(self.get_request_entries())}")


def main():
arg_parser = argparse.ArgumentParser(
description="Parse a DAR file and display information"
)
arg_parser.add_argument(
"file",
help="Path to the DAR file to parse",
)
arg_parser.add_argument(
"--summary",
action="store_true",
help="Print a summary of the DAR file",
)

args = arg_parser.parse_args()

parser = DARParser(args.file)

if args.summary:
parser.print_dar_summary()


if __name__ == "__main__":
# Example usage
file_path = input("Enter the path to your DAR file: ")
try:
parser = DARParser(file_path)
parser.print_dar_summary()
main()
except Exception as e:
print(f"An error occurred: {e}")

31 changes: 23 additions & 8 deletions tools/validators/dar_validator.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
"""

import json
import sys
import argparse


class DARValidator:
Expand Down Expand Up @@ -120,15 +120,30 @@ def print_validation_report(self):
print("DAR file is valid and conforms to the schema.")


if __name__ == "__main__":
if len(sys.argv) != 2:
print("Usage: python dar_validator.py <path_to_dar_file>")
sys.exit(1)
def main():
arg_parser = argparse.ArgumentParser(
description="Validate a DAR file against the schema"
)
arg_parser.add_argument(
"file",
help="Path to the DAR file to validate",
)
arg_parser.add_argument(
"--validate",
action="store_true",
help="Run validation and print a report",
)

args = arg_parser.parse_args()

validator = DARValidator(args.file)
if args.validate:
validator.print_validation_report()


file_path = sys.argv[1]
if __name__ == "__main__":
try:
validator = DARValidator(file_path)
validator.print_validation_report()
main()
except Exception as e:
print(f"An error occurred: {e}")