This repository contains the official QFrCoLA: a Quebec-French Corpus of Linguistic Acceptability Judgments dataset (see dataset directory) and the source code used to generate the tables and training of the model illustrated in QFrCoLA: a Quebec-French Corpus of Linguistic Acceptability Judgments.
Available on HuggingFace.
Statistics are divided by splits.
label: the binary label of the sentence, where0means ungrammatical and1means grammatical.sentence: the sentence.source: the URL source of the sentence.category: the aggregated BDL category of the sentence linguistic phenomena.
You can manually download our dataset splits available in dataset, or you can use the HuggingFace dataset class as
follows:
from datasets import load_dataset
dataset = load_dataset("davebulaval/qfrcola")This dataset is under CC-BY-NC-SA 4.0.
@inproceedings{beauchemin2025qfrcola,
title={QFrCoLA: a Quebec-French Corpus of Linguistic Acceptability Judgments},
author={Beauchemin, David and Khoury, Richard},
booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
pages={119--130},
year={2025}
}
In the directory article_src, you can find the source code used to clean the dataset and compute the statistics and
in la_tda, the code is used to fine-tune all our models. The code was adapted from the official repository of the
article Acceptability Judgements via Examining the Topology of Attention Maps.
The following table is necessary for this dataset to be indexed by search engines such as Google Dataset Search.
| property | value | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| name | QFrCoLA: a Quebec-French Corpus of Linguistic Acceptability Judgments |
||||||||||
| alternateName | QFrCoLA |
||||||||||
| url | https://github.com/GRAAL-Research/qfrcola |
||||||||||
| description | QFrCoLA is a dataset of binary normative linguistic acceptability judgments in Quebec French, with in-domain sentences from the Banque de dépannage linguistique (BDL) and out-of-domain sentences from the Académie française.
|
||||||||||
| creator |
|
||||||||||
| provider |
|
||||||||||
| license |
|
||||||||||
| citation | ... |

