fix(data): Handle missing papers.jsonl file by VooDisss · Pull Request #6 · UKPLab/PeerQA

VooDisss · 2025-08-18T10:48:48Z

This commit fixes a bug where the data processing pipeline would crash if the data/papers.jsonl file was missing or empty.

The PaperLoader class in peerqa/data_loader.py would unconditionally try to read papers.jsonl at initialization, causing a ValueError if the file didn't exist. This would prevent the extract_text_from_pdf.py script from running and creating the file in the first place.

This commit makes the PaperLoader more robust by:

Checking if papers.jsonl exists and is not empty before reading it.
Initializing an empty DataFrame if the file is missing, allowing the script to proceed.
Adding a safeguard to has_paper_id to handle an empty DataFrame.

This ensures that the data processing pipeline can be run from a clean state without errors.

This commit fixes a bug where the data processing pipeline would crash if the `data/papers.jsonl` file was missing or empty. The `PaperLoader` class in `peerqa/data_loader.py` would unconditionally try to read `papers.jsonl` at initialization, causing a `ValueError` if the file didn't exist. This would prevent the `extract_text_from_pdf.py` script from running and creating the file in the first place. This commit makes the `PaperLoader` more robust by: - Checking if `papers.jsonl` exists and is not empty before reading it. - Initializing an empty DataFrame if the file is missing, allowing the script to proceed. - Adding a safeguard to `has_paper_id` to handle an empty DataFrame. This ensures that the data processing pipeline can be run from a clean state without errors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(data): Handle missing papers.jsonl file#6

fix(data): Handle missing papers.jsonl file#6
VooDisss wants to merge 1 commit intoUKPLab:mainfrom
VooDisss:main

VooDisss commented Aug 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

VooDisss commented Aug 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant