TabReX: Tabular Referenceless eXplainable Evaluation

This repository contains reference code for TabReX, a referenceless, explainable evaluation framework for LLM-generated tables using graph-based reasoning and rubric-aware scoring.

Overview

Evaluating tables generated by large language models is hard: many metrics flatten tables into text (losing structure), while reference-based metrics require a gold table, limiting generalization to new schemas and valid alternative table layouts. TabReX addresses this with a reference-less, property-driven evaluation pipeline that provides interpretable scores and cell-level error traces.

TabReX works in three key steps:

Canonicalize to Knowledge Graphs
Convert the source text and the generated table into canonical knowledge graphs to preserve structure and factual relations.
LLM-guided Graph Alignment
Align nodes/edges between the two graphs using an LLM-guided matching procedure to robustly handle paraphrases, schema variation, and re-orderings.
Rubric-aware Scoring + Explanations
Compute interpretable, property-based scores that quantify structural fidelity and factual correctness, producing fine-grained diagnostics (e.g., which cells/relations are unsupported or mismatched).

To systematically test evaluation robustness, we introduce TabReX-Bench, a benchmark spanning six domains and twelve planner-driven perturbation types across three difficulty tiers, enabling stress-testing of table evaluation metrics under controlled shifts.

Overall, TabReX is designed to be trustworthy and explainable: it delivers human-aligned judgments, remains stable under harder perturbations, and enables detailed model/prompt analysis for structured generation systems.

This repository contains the code, data, and analysis pipelines for TabReX, a benchmark and evaluation suite for table robustness and table-to-table similarity metrics. It includes:

The TabReX benchmark (original tables + 12 perturbations per item)
Implementations/wrappers for multiple metrics (EM/ROUGE/chrF/BERTScore, BLEURT, HScore, PScore, TabEval, TabXEval/TabScore, QuestEval)
Human-correlation scripts (Spearman/Kendall/Weighted Kendall/RBO, top-k overlap)
A table-to-graph pipeline for analysis and alignment

Repository Structure

.
├── TabReX/                      # Core pipeline and table→graph converters
│   ├── TabRex.py               # Main driver for graph generation and scoring
│   ├── __init__.py
│   └── table_to_graph_modules/ # rule_md, rule_html, llm_html converters
├── metrics/                     # Metric wrappers and TabXEval integration
│   ├── em_chrf_rouge_bert.py   # EM/ROUGE-L/chrF/BERTScore
│   ├── bleurt_metric.py        # BLEURT (evaluate/TF)
│   ├── hscore_metric.py        # HScore (format/content similarity)
│   ├── pscore_metric.py        # PScore (LLM-based)
│   ├── tabeval_metric.py       # TabEval (unroll + NLI)
│   ├── TabXEval_metric.py      # TabScore via TabXEval pipeline
│   ├── questeval_metric.py     # QuestEval-style QA F1
│   ├── tabxeval.py             # Local TabXEval score_calc hook
│   └── TabXEval/               # Prompts, pipelines and examples
├── correlation/                 # Correlation analyses vs. human ranking
│   ├── correlation.py          # 7-way (GT/easy/medium/hard×2) correlation
│   └── t2t_correlation/        # Flat-12 text2table ranking correlation utilities
│       ├── human_ranking.json
│       ├── perturb_mapping.jsonl
│       └── t2t_correlation.py
├── perturbation/                # Perturbation planning utilities (optional)
│   └── perturbation_planning.py
├── data/                      # Datasets
│   ├── TabReX_Bench.json       # Default dataset (original + 12 perturbations)
│   └── original_710_tbl.json   # 710 original tables (JSON array)
├── requirements.txt            # Project dependencies
└── README.md                   # This file

Setup

Clone and create a virtual environment

git clone https://github.com/CoRAL-ASU/TabReX.git
cd TabReX
python -m venv .venv && source .venv/bin/activate

Install dependencies

pip install -r requirements.txt
(Optional, if you use TabXEval extras) pip install -r metrics/TabXEval/requirements.txt

Environment variables

Create a .env in the repo root and set:
- OPENAI_API_KEY=... (required for PScore/TabEval and some TabReX steps)
- GEMINI_API_KEY=... (only if you plan to run Gemini-specific tools)

Usage

Datasets

All metric wrappers default to data/TabReX_Bench.json unless otherwise noted.
File shape: JSON array with keys original, and perturbation1..perturbation12 (strings) or {table: ..., metadata: ...} objects.

TabReX Pipeline (main)

The core driver is TabReX/TabRex.py; it converts tables to knowledge graphs, aligns summary/table triplets, and computes TabReX scores.
Minimal example (single index):
- python TabReX/TabRex.py --index 0 --table-converter rule_md --output TabRex_out.json --output-pkl TabRex_out.pkl
Notes:
- The pipeline uses OpenAI APIs for certain steps; ensure OPENAI_API_KEY is set.

Optional Metrics

HScore
- python metrics/hscore_metric.py --input data/TabReX_Bench.json --out_dir results/metrics --prefix tabrex
EM/ROUGE-L/chrF/BERTScore
- Reads data/TabReX_Bench.json by default.
- python metrics/em_chrf_rouge_bert.py
BLEURT
- python metrics/bleurt_metric.py --input data/TabReX_Bench.json --out_dir results/metrics --prefix tabrex
PScore (requires OPENAI_API_KEY)
- python metrics/pscore_metric.py --input data/TabReX_Bench.json --out_dir results/metrics --prefix tabrex --workers 8
TabEval (OPENAI_API_KEY for unroll, Transformers for NLI)
- python metrics/tabeval_metric.py --input data/TabReX_Bench.json --out_dir results/metrics --prefix tabrex
- Env overrides:
  - TABEVAL_NLI_MODEL (default: roberta-large-mnli)
  - TABEVAL_UNROLL_WORKERS (default: 100)
TabXEval/TabScore
- python metrics/TabXEval_metric.py --input data/TabReX_Bench.json --out_dir results/metrics --prefix tabrex --workers 16
QuestEval
- python metrics/questeval_metric.py (uses defaults; requires OPENAI_API_KEY)

Metrics write outputs to --out_dir (recommended: results/metrics).

Human Correlation

7-way correlation (EM/HScore/etc. PKLs vs. human ranking heuristics)
- python correlation/correlation.py --in-dir correlation/pkls --out results/correlation/summary.json
Flat-12 text2table correlation (model/prompt flattening)
- python correlation/t2t_correlation/t2t_correlation.py \ --human correlation/t2t_correlation/human_ranking.json \ --mapping correlation/t2t_correlation/perturb_mapping.jsonl \ --scores-dir results/metrics \ --out-dir results/correlation/flat12

Notes & Recommendations

Outputs: Prefer a unified results layout, e.g., results/metrics and results/correlation.
Models: Several scripts have a default model name; you can override via code/env if needed.
Heavy deps: BLEURT/TF, Transformers/torch may be large; consider using a GPU environment (optional).

Citation

If you use this repository in your research, please cite the accompanying paper (TabReX).

@misc{anvekar2025tabrextabularreferenceless,
      title={TabReX : Tabular Referenceless eXplainable Evaluation}, 
      author={Tejas Anvekar and Juhna Park and Aparna Garimella and Vivek Gupta},
      year={2025},
      eprint={2512.15907},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2512.15907}, 
}

License

Please see the LICENSE file if provided. If absent, contact the authors for licensing information.

Contributing

Contributions are welcome. Please open an issue or a pull request for fixes and improvements.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TabReX: Tabular Referenceless eXplainable Evaluation

Overview

Repository Structure

Setup

Usage

Datasets

TabReX Pipeline (main)

Optional Metrics

Human Correlation

Notes & Recommendations

Citation

License

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
TabReX		TabReX
correlation		correlation
data		data
metrics		metrics
perturbation		perturbation
static/images		static/images
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

CoRAL-ASU/TabReX

Folders and files

Latest commit

History

Repository files navigation

TabReX: Tabular Referenceless eXplainable Evaluation

Overview

Repository Structure

Setup

Usage

Datasets

TabReX Pipeline (main)

Optional Metrics

Human Correlation

Notes & Recommendations

Citation

License

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages