Skip to content

Mansour-Essgaer/Libyan-Resturant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Libyan Restaurants

A Benchmark Dataset for Sentiment Analysis in the Libyan Arabic Dialect

📌 Overview

The Libyan Restaurants (LR) is a manually annotated sentiment analysis dataset written in the Libyan Arabic dialect. The data was collected from real-world restaurant reviews on Facebook and Google Maps.

The dataset is intended to serve as a benchmark resource for evaluating machine learning and deep learning models in low-resource Arabic dialect NLP. This repository provides the dataset, annotation details, and baseline results to ensure reproducibility and fair comparison across studies.


📊 Dataset Statistics

Attribute Value
Total comments 4,609
Positive reviews 2,509 (55.4%)
Negative reviews 2,020 (44.6%)
Language Libyan Arabic dialect
Domain Restaurant reviews
Sources Facebook, Google Maps
Annotation Manual (3 annotators)
Labels Binary (1 = Positive, 0 = Negative)

📁 Data Format

The dataset is provided in Excel format with the following structure:

  • message: User comment written in Libyan Arabic.
  • label: Sentiment label (1 = Positive, 0 = Negative).

Example Data

message,label
"المطعم باهي والخدمة سريعة",1
"الخدمة بلهون بكل والطلب تأخر",0

🧾 Annotation Process

To ensure high linguistic validity and contextual accuracy, the following procedure was used:

  • Filtration: Comments were filtered to retain only Libyan dialect text.
  • Labeling: Two native Libyan annotators independently labeled each comment.
  • Polarity: Labels reflect overall sentiment polarity.
  • Exclusions: Neutral, spam, and non-opinion comments were excluded.

🧪 Benchmark Results (Baseline)

Baseline experiments were conducted using TF–IDF features and classical machine learning models. These results are provided as reference baselines for future research.

Model Accuracy F1-score
Multinomial Naïve Bayes 0.844 0.842
Support Vector Machine 0.833 0.833
Logistic Regression 0.823 0.823

🎯 Intended Use

This dataset is designed for:

  • Sentiment analysis benchmarking.
  • Arabic dialect NLP research.
  • Low-resource language modeling.
  • Comparison between Classical ML and Deep Learning.
  • Cross-dialect and transfer learning studies.

⚠️ Limitations

  • Domain-specific: Contains restaurant reviews only.
  • Geography: Focused primarily on Tripoli-based data.
  • Labels: Binary sentiment labels only (no neutral or multi-class labels).
  • Scale: Dataset size (4.6k) may limit the training of very large transformer models from scratch.

📚 Citation

If you use this dataset in your research, please cite the following paper:

@article{libyan_restaurants_corpus,
  title={Libyan Restaurants: A New Annotated Corpus for Sentiment Analysis of the Libyan Arabic Dialect},
  author={Arif, Manar and Lamami, Rabia and Saheri, Weiam and Essgaer, Mansour and Agaal, Asma and Abuhajar, Aisha},
  year={2025},
  journal={IEEE Conference Proceedings}
}

📄 License

This dataset is released under the Creative Commons Attribution 4.0 (CC BY 4.0) license. You are free to use, share, and adapt the data with proper attribution.

About

Libyan Restaurants: A Benchmark Dataset for Sentiment Analysis in the Libyan Arabic Dialect

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors