Peer-Review

This project provides a complete pipeline for analyzing and comparing peer reviews generated by large language models (LLMs) with human-written reviews. It includes modules for review generation, semantic similarity computation, knowledge graph construction, and structural analysis.

The required environment and dependencies are listed in requirements.txt. To replicate the environment, run:
```
pip install -r requirements.txt
```

Code

The Code/ directory contains all scripts used for data collection, filtering, and analysis. The full pipeline for preparing consistent and labeled paper–review pairs includes:

Paper fetching and rating extraction (Paper_fetcher.py)
Retrieves all submissions and corresponding review ratings from OpenReview for a given conference and year.
Consistency filtering using rating variance (KDE_filter.py)
Applies kernel density estimation (KDE) to identify papers with low inter-review disagreement (low standard deviation in ratings).
Paper selection and quality labeling (Paper_select.py)
Selects consistent papers and categorizes them into good, borderline, or bad based on rating quantiles.
Full paper download (Paper_download.py)
Downloads the PDF files for all selected papers in each quality group.
Review fetching (Paper_review.py, Paper_review_process.py)
Extracts the full set of human-written reviews associated with the selected papers.
Semantic similarity analysis (similarity.py)
- Loads pre-segmented “IMRaD” sections (abstract, introduction, related work, method, results, conclusion) encoded by BGE-M3.
- Computes cosine similarity between embeddings of each review component (summary, strengths, weaknesses, questions) and each paper section.
- Saves per-paper similarity scores (real vs. LLM reviews) into JSON files under ../Data/<Conference>/<Year>/similarity_results/.
Knowledge graph construction and metrics (knowledge_graph_construct.py)
- Builds a directed graph for each review segment using PL-Marker predictions (entities + relations).
- Computes structural metrics (node count, edge count, average degree, label entropy) on each graph.
- Aligns real vs. LLM question nodes by filtering to match counts, then saves all graph metrics to CSV under ../Data/Knowledge_Graph/<Conference>/<Year>/<Category>/graph_metrics_clean.csv.

Data

All processed data, including real human reviews, LLM-generated reviews, data used for semantic similarity analysis, and knowledge graph construction, are available via the following link:

🔗 Download Data (Google Drive)

Prompt Template

We provide standardized prompt templates used for LLM-based review generation, aligned with official review rubrics from ICLR and NeurIPS. These templates help ensure consistency in review outputs across different models and conferences.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Code		Code
images		images
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Peer-Review

Code

Data

Prompt Template

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Peer-Review

Code

Data

Prompt Template

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages