Libyan Restaurants

A Benchmark Dataset for Sentiment Analysis in the Libyan Arabic Dialect

📌 Overview

The Libyan Restaurants (LR) is a manually annotated sentiment analysis dataset written in the Libyan Arabic dialect. The data was collected from real-world restaurant reviews on Facebook and Google Maps.

The dataset is intended to serve as a benchmark resource for evaluating machine learning and deep learning models in low-resource Arabic dialect NLP. This repository provides the dataset, annotation details, and baseline results to ensure reproducibility and fair comparison across studies.

📊 Dataset Statistics

Attribute	Value
Total comments	4,609
Positive reviews	2,509 (55.4%)
Negative reviews	2,020 (44.6%)
Language	Libyan Arabic dialect
Domain	Restaurant reviews
Sources	Facebook, Google Maps
Annotation	Manual (3 annotators)
Labels	Binary (1 = Positive, 0 = Negative)

📁 Data Format

The dataset is provided in Excel format with the following structure:

message: User comment written in Libyan Arabic.
label: Sentiment label (1 = Positive, 0 = Negative).

Example Data

message,label
"المطعم باهي والخدمة سريعة",1
"الخدمة بلهون بكل والطلب تأخر",0

🧾 Annotation Process

To ensure high linguistic validity and contextual accuracy, the following procedure was used:

Filtration: Comments were filtered to retain only Libyan dialect text.
Labeling: Two native Libyan annotators independently labeled each comment.
Polarity: Labels reflect overall sentiment polarity.
Exclusions: Neutral, spam, and non-opinion comments were excluded.

🧪 Benchmark Results (Baseline)

Baseline experiments were conducted using TF–IDF features and classical machine learning models. These results are provided as reference baselines for future research.

Model	Accuracy	F1-score
Multinomial Naïve Bayes	0.844	0.842
Support Vector Machine	0.833	0.833
Logistic Regression	0.823	0.823

🎯 Intended Use

This dataset is designed for:

Sentiment analysis benchmarking.
Arabic dialect NLP research.
Low-resource language modeling.
Comparison between Classical ML and Deep Learning.
Cross-dialect and transfer learning studies.

⚠️ Limitations

Domain-specific: Contains restaurant reviews only.
Geography: Focused primarily on Tripoli-based data.
Labels: Binary sentiment labels only (no neutral or multi-class labels).
Scale: Dataset size (4.6k) may limit the training of very large transformer models from scratch.

📚 Citation

If you use this dataset in your research, please cite the following paper:

@article{libyan_restaurants_corpus,
  title={Libyan Restaurants: A New Annotated Corpus for Sentiment Analysis of the Libyan Arabic Dialect},
  author={Arif, Manar and Lamami, Rabia and Saheri, Weiam and Essgaer, Mansour and Agaal, Asma and Abuhajar, Aisha},
  year={2025},
  journal={IEEE Conference Proceedings}
}

📄 License

This dataset is released under the Creative Commons Attribution 4.0 (CC BY 4.0) license. You are free to use, share, and adapt the data with proper attribution.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Libyan Resturant.xlsx		Libyan Resturant.xlsx
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Libyan Restaurants

A Benchmark Dataset for Sentiment Analysis in the Libyan Arabic Dialect

📌 Overview

📊 Dataset Statistics

📁 Data Format

Example Data

🧾 Annotation Process

🧪 Benchmark Results (Baseline)

🎯 Intended Use

⚠️ Limitations

📚 Citation

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Libyan Restaurants

A Benchmark Dataset for Sentiment Analysis in the Libyan Arabic Dialect

📌 Overview

📊 Dataset Statistics

📁 Data Format

Example Data

🧾 Annotation Process

🧪 Benchmark Results (Baseline)

🎯 Intended Use

⚠️ Limitations

📚 Citation

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages