The Libyan Restaurants (LR) is a manually annotated sentiment analysis dataset written in the Libyan Arabic dialect. The data was collected from real-world restaurant reviews on Facebook and Google Maps.
The dataset is intended to serve as a benchmark resource for evaluating machine learning and deep learning models in low-resource Arabic dialect NLP. This repository provides the dataset, annotation details, and baseline results to ensure reproducibility and fair comparison across studies.
| Attribute | Value |
|---|---|
| Total comments | 4,609 |
| Positive reviews | 2,509 (55.4%) |
| Negative reviews | 2,020 (44.6%) |
| Language | Libyan Arabic dialect |
| Domain | Restaurant reviews |
| Sources | Facebook, Google Maps |
| Annotation | Manual (3 annotators) |
| Labels | Binary (1 = Positive, 0 = Negative) |
The dataset is provided in Excel format with the following structure:
message: User comment written in Libyan Arabic.label: Sentiment label (1 = Positive, 0 = Negative).
message,label
"المطعم باهي والخدمة سريعة",1
"الخدمة بلهون بكل والطلب تأخر",0
To ensure high linguistic validity and contextual accuracy, the following procedure was used:
- Filtration: Comments were filtered to retain only Libyan dialect text.
- Labeling: Two native Libyan annotators independently labeled each comment.
- Polarity: Labels reflect overall sentiment polarity.
- Exclusions: Neutral, spam, and non-opinion comments were excluded.
Baseline experiments were conducted using TF–IDF features and classical machine learning models. These results are provided as reference baselines for future research.
| Model | Accuracy | F1-score |
|---|---|---|
| Multinomial Naïve Bayes | 0.844 | 0.842 |
| Support Vector Machine | 0.833 | 0.833 |
| Logistic Regression | 0.823 | 0.823 |
This dataset is designed for:
- Sentiment analysis benchmarking.
- Arabic dialect NLP research.
- Low-resource language modeling.
- Comparison between Classical ML and Deep Learning.
- Cross-dialect and transfer learning studies.
- Domain-specific: Contains restaurant reviews only.
- Geography: Focused primarily on Tripoli-based data.
- Labels: Binary sentiment labels only (no neutral or multi-class labels).
- Scale: Dataset size (4.6k) may limit the training of very large transformer models from scratch.
If you use this dataset in your research, please cite the following paper:
@article{libyan_restaurants_corpus,
title={Libyan Restaurants: A New Annotated Corpus for Sentiment Analysis of the Libyan Arabic Dialect},
author={Arif, Manar and Lamami, Rabia and Saheri, Weiam and Essgaer, Mansour and Agaal, Asma and Abuhajar, Aisha},
year={2025},
journal={IEEE Conference Proceedings}
}This dataset is released under the Creative Commons Attribution 4.0 (CC BY 4.0) license. You are free to use, share, and adapt the data with proper attribution.