Skip to content

GRAAL-Research/QFrCoLA

Repository files navigation

QFrCoLA: a Quebec-French Corpus of Linguistic Acceptability Judgments

This repository contains the official QFrCoLA: a Quebec-French Corpus of Linguistic Acceptability Judgments dataset (see dataset directory) and the source code used to generate the tables and training of the model illustrated in QFrCoLA: a Quebec-French Corpus of Linguistic Acceptability Judgments.

About the Dataset

image

Available on HuggingFace.

Example of Sentences in the Dataset

img.png

Statistics About the Dataset Compared to Other Similar Datasets

Statistics are divided by splits.

img_1.png

Dataset Structure

Data Fields

  • label: the binary label of the sentence, where 0 means ungrammatical and 1 means grammatical.
  • sentence: the sentence.
  • source: the URL source of the sentence.
  • category: the aggregated BDL category of the sentence linguistic phenomena.

Download the Dataset

You can manually download our dataset splits available in dataset, or you can use the HuggingFace dataset class as follows:

from datasets import load_dataset

dataset = load_dataset("davebulaval/qfrcola")

License

This dataset is under CC-BY-NC-SA 4.0.

To Cite

@inproceedings{beauchemin2025qfrcola,
  title={QFrCoLA: a Quebec-French Corpus of Linguistic Acceptability Judgments},
  author={Beauchemin, David and Khoury, Richard},
  booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
  pages={119--130},
  year={2025}
}

About the Source Code

In the directory article_src, you can find the source code used to clean the dataset and compute the statistics and in la_tda, the code is used to fine-tune all our models. The code was adapted from the official repository of the article Acceptability Judgements via Examining the Topology of Attention Maps.

Dataset Metadata

The following table is necessary for this dataset to be indexed by search engines such as Google Dataset Search.

property value
name QFrCoLA: a Quebec-French Corpus of Linguistic Acceptability Judgments
alternateName QFrCoLA
url
description QFrCoLA is a dataset of binary normative linguistic acceptability judgments in Quebec French, with in-domain sentences from the Banque de dépannage linguistique (BDL) and out-of-domain sentences from the Académie française.
creator
property value
name David Beauchemin
sameAs https://scholar.google.com/citations?hl=fr&user=ntoPgSUAAAAJ
name Richard Khoury
sameAs https://scholar.google.com/citations?user=9MrPtC0AAAAJ&hl=en&oi=ao
provider
property value
name GRAIL
sameAs https://grail.ift.ulaval.ca/
license
property value
name CC-BY-NC-SA 4.0
url
citation ...

About

FrCoLA: a French Corpus of Linguistic Acceptability Judgments

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors