TabXEval: Why this is a Bad Table? An eXhaustive Rubric for Table Evaluation

This repository contains code and resources for TabXEval, an exhaustive and explainable evaluation framework for table tasks (e.g., table extraction / table generation). TabXEval is built around a rubric-based view of table quality that captures both structural and content-level discrepancies that standard metrics often miss.

Overview

Evaluating predicted tables is tricky: two tables can be “close” in content but differ in subtle (and important) ways—headers, alignment, row/column structure, formatting, or small semantic mismatches. Many existing automatic metrics under-diagnose these issues, making it hard to compare systems or debug failures.

TabXEval addresses this by using an explicit evaluation rubric and a two-phase pipeline:

TabAlign (Structure-first alignment)
First, TabXEval aligns reference and predicted tables structurally—pairing corresponding headers/cells using a combination of rule-based and LLM-assisted alignment—so later comparisons are made between the right elements.
TabCompare (Fine-grained comparison + explanations)
After alignment, TabXEval performs systematic semantic + syntactic comparison over aligned cells to produce granular, interpretable feedback (what is wrong, where, and why).

To validate robustness and real-world applicability, the paper introduces TabXBench—a multi-domain benchmark with realistic table perturbations and human annotations—and reports a sensitivity–specificity analysis showing TabXEval’s robustness and explainability across table tasks.

Repository Structure

.
├── evaluation_pipeline/     # Core evaluation scripts and utilities
│   ├── eval.py             # Main evaluation script
│   ├── eval_gemini.py      # Gemini model evaluation
│   ├── eval_llama.py       # LLaMA model evaluation
│   ├── fuzzy_table_matching.py  # Fuzzy matching utilities
│   └── comparison_utils.py # Comparison utilities
├── tabxbench/             # Benchmark datasets and tools
├── EVALUATION_OF_MODELS/  # Evaluation results and analysis
└── TabXEval.pdf          # Research paper

Setup

Clone the repository:

git clone https://github.com/yourusername/tabxeval.git
cd tabxeval

Install dependencies:

pip install -r requirements.txt

Set up environment variables: Create a .env file in the root directory with your API keys:

OPENAI_API_KEY=your_openai_api_key

Usage

Running Evaluations

To evaluate a model using the framework:

python evaluation_pipeline/eval.py \
    --align_prompt path/to/align_prompt.txt \
    --compare_prompt path/to/compare_prompt.txt \
    --input_tables path/to/input_tables.json \
    --output_path path/to/output/

Available Models

The framework supports evaluation of multiple models:

GPT-4
Gemini
LLaMA

Citation

If you use this framework in your research, please cite our paper:

@inproceedings{pancholi-etal-2025-tabxeval,
    title = "{T}ab{XE}val: Why this is a Bad Table? An e{X}haustive Rubric for Table Evaluation",
    author = "Pancholi, Vihang  and
      Bafna, Jainit Sushil  and
      Anvekar, Tejas  and
      Shrivastava, Manish  and
      Gupta, Vivek",
    editor = "Che, Wanxiang  and
      Nabende, Joyce  and
      Shutova, Ekaterina  and
      Pilehvar, Mohammad Taher",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-acl.1176/",
    doi = "10.18653/v1/2025.findings-acl.1176",
    pages = "22913--22934",
    ISBN = "979-8-89176-256-5",
}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
EVALUATION_OF_MODELS		EVALUATION_OF_MODELS
evaluation_pipeline		evaluation_pipeline
metric_results		metric_results
prompts		prompts
static		static
syn_data_sensitivity		syn_data_sensitivity
tabxbench		tabxbench
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval_scores.ipynb		eval_scores.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TabXEval: Why this is a Bad Table? An eXhaustive Rubric for Table Evaluation

Overview

Repository Structure

Setup

Usage

Running Evaluations

Available Models

Citation

License

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TabXEval: Why this is a Bad Table? An eXhaustive Rubric for Table Evaluation

Overview

Repository Structure

Setup

Usage

Running Evaluations

Available Models

Citation

License

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages