This repository is being superseded by V2 of IFC-Bench, which is hosted on Hugging Face: https://huggingface.co/datasets/sylvainhellin/ifc-bench (this is due to the free tier having much more generous rate limits for hosting large datasets). V2 contains all the questions and models from V1 (this archived repository), as well as many more models and questions (21 BIM projects with 37 IFC models across architectural, structural, MEP and speciality disciplines, as well as 1,027 question-answer pairs covering diverse BIM information retrieval tasks). Therefore, I would recommend using V2 from now on.
A benchmark dataset for evaluating BIM (Building Information Modeling) comprehension and reasoning capabilities in AI systems. Provides curated IFC models with question-answer pairs for testing BIM-related AI implementations.
Dataset snapshot:
| question | answer | ifc_model | project | |
|---|---|---|---|---|
| 0 | What is the total gross floor area of the buil... | The total gross floor area of the building is ... | arc | duplex |
| 1 | What is the height of the ceiling in room A203? | The height of the ceiling in room A203 is 2.58 m | arc | duplex |
| 2 | Give me the name of all the rooms in the build... | The list of all the rooms in the building is: ... | arc | duplex |
| 3 | How many windows are there on the north facade? | I cannot calculate the number of window on th... | arc | duplex |
| 4 | What is the width of the door 1hOSvn6df7F8_7Gc... | The width of the door is 1.25 m | arc | duplex |
- Features
- Dataset Structure
- Getting Started
- Models Overview
- Contributing
- License
- Citation
- Acknowledgments
- Versioned datasets: Currently at V1 with 2 BIM models and 105 QA pairs
- Diverse question types:
- Spatial reasoning
- Element properties
- System relationships
- Construction sequencing
- Rich contextual data:
- Original IFC files
- Model snapshots
- Architectural descriptions
- License documentation
- Machine-readable format: CSV dataset with clear column structure
ifc-bench/
├── projects/ # Directory for all projects
│ ├── duplex/ # First project
│ │ ├── arc.ifc # Architecture model
│ │ ├── mep.ifc # MEP model
│ │ ├── license.txt # Project license
│ │ ├── model_card.csv # Project metadata
│ │ └── snapshot.png # Visual snapshot
│ └── dental_clinic/ # Second project
│ ├── arc.ifc # Architecture model
│ ├── str.ifc # Structural model
│ ├── mep.ifc # MEP model
│ └── ... # Other project files
├── questions/ # Question-answer pairs
│ └── ifc-bench-v1.csv # Primary dataset
└── docs/ # Supplementary materials
└── CONTRIBUTING.md # Contribution guidelines
- Disciplines: Architectural, MEP
- License: CC-BY-4.0
- Complexity: Simple
- Source: buildingSMART Sample Files
- Disciplines: Architectural, Structural, MEP
- License: CC-BY-4.0
- Complexity: Intermediate
- Source: buildingSMART Sample Files
- Python 3.8+
- pandas (for data analysis)
- ifcopenshell (optional, for working with IFC files)
Install requirements:
pip install pandas ifcopenshellgit clone https://github.com/sylvainHellin/ifc-bench.git
cd ifc-benchimport pandas as pd
# Load dataset
df = pd.read_csv('questions/ifc-bench-v1.csv')
# Explore questions by model
duplex_questions = df[df['ifc_model'] == 'duplex']
print(f"Duplex model has {len(duplex_questions)} questions")
# Sample question format
sample_q = df.iloc[0]
print(f"""
Question: {sample_q.question}
Answer: {sample_q.answer}
Model: {sample_q.ifc_model}
Project: {sample_q.project}
""")| Column | Description | Example |
|---|---|---|
question |
Natural language question | "What is the total gross floor area of the building?" |
answer |
Ground truth answer | "The total gross floor area of the building is 354.67 sqm" |
ifc_model |
Model identifier | "arc" |
project |
Question category | "duplex" |
Verify dataset integrity using SHA-256 checksum:
shasum -a 256 questions/ifc-bench-v1.csv
# Expected output: f67a48770d74b6e0ff0868c923c3e1d976110350b2c439564d7ceccc16a46f35We welcome contributions through:
- 🆕 New IFC models (with permissive licensing)
- ➕ Additional QA pairs for existing models
- ✏️ Documentation improvements
- 🐛 Error corrections in existing answers
Please see our Contribution Guidelines for details.
- Dataset: Licensed under CC BY 4.0
- Models: Inherit their original licenses (see individual model folders)
If using in research, please cite:
@misc{ifc-bench,
title = {{ifc-bench}: {BIM} Comprehension \& Reasoning Benchmark Dataset},
author = {Sylvain Hellin},
year = {2024},
url = {https://github.com/sylvainHellin/ifc-bench},
note = {Version 1.0}
}Special thanks to:
- buildingSMART International for providing sample files
- The openBIM community for quality assurance
- Early adopters for feedback and validation
📌 Maintainer: Sylvain Hellin | 📧 Contact: sylvain.hellin@tum.de | 🐛 Issue Tracker: GitHub Issues

