Skip to content

RichardLRC/Peer-Review

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Peer-Review

This project provides a complete pipeline for analyzing and comparing peer reviews generated by large language models (LLMs) with human-written reviews. It includes modules for review generation, semantic similarity computation, knowledge graph construction, and structural analysis.

  • The required environment and dependencies are listed in requirements.txt. To replicate the environment, run:
    pip install -r requirements.txt

Code

The Code/ directory contains all scripts used for data collection, filtering, and analysis. The full pipeline for preparing consistent and labeled paper–review pairs includes:

  1. Paper fetching and rating extraction (Paper_fetcher.py)
    Retrieves all submissions and corresponding review ratings from OpenReview for a given conference and year.

  2. Consistency filtering using rating variance (KDE_filter.py)
    Applies kernel density estimation (KDE) to identify papers with low inter-review disagreement (low standard deviation in ratings).

  3. Paper selection and quality labeling (Paper_select.py)
    Selects consistent papers and categorizes them into good, borderline, or bad based on rating quantiles.

  4. Full paper download (Paper_download.py)
    Downloads the PDF files for all selected papers in each quality group.

  5. Review fetching (Paper_review.py, Paper_review_process.py)
    Extracts the full set of human-written reviews associated with the selected papers.

  6. Semantic similarity analysis (similarity.py)

    • Loads pre-segmented “IMRaD” sections (abstract, introduction, related work, method, results, conclusion) encoded by BGE-M3.
    • Computes cosine similarity between embeddings of each review component (summary, strengths, weaknesses, questions) and each paper section.
    • Saves per-paper similarity scores (real vs. LLM reviews) into JSON files under ../Data/<Conference>/<Year>/similarity_results/.
  7. Knowledge graph construction and metrics (knowledge_graph_construct.py)

    • Builds a directed graph for each review segment using PL-Marker predictions (entities + relations).
    • Computes structural metrics (node count, edge count, average degree, label entropy) on each graph.
    • Aligns real vs. LLM question nodes by filtering to match counts, then saves all graph metrics to CSV under ../Data/Knowledge_Graph/<Conference>/<Year>/<Category>/graph_metrics_clean.csv.

Data

All processed data, including real human reviews, LLM-generated reviews, data used for semantic similarity analysis, and knowledge graph construction, are available via the following link:

🔗 Download Data (Google Drive)

Prompt Template

We provide standardized prompt templates used for LLM-based review generation, aligned with official review rubrics from ICLR and NeurIPS. These templates help ensure consistency in review outputs across different models and conferences.

Prompt Template Diagram Prompt Template Diagram

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages