Credits :
- Yudistira Dwi Cahya
- Wulan Akhsah
- Kamal Muftie Yafi
- Rachel Thyffani Margaretha S
- Vesya Padmadewi
- Dataset: Hoax dataset obtained from MAFINDO (Masyarakat Anti Fitnah Indonesia);
- Slang: Modified Kamus Alay based on Kamus Alay (Colloquial Indonesian Lexicon);
- Feature Extraction:
Bag-of-Words,TF-IDF; - Classifier:
Naive Bayes,SVM,Logistic Regression,Decision Tree,kNN,ANN. - Cross-Validation:
GridSearch,RandomSearch
This dataset contained two label values, namely "1" for hoax and "0" for not hoax. The total data in this dataset is 4,701. Each label has a varied amount of data distribution, including 3850 data for hoax and 851 data for not hoax.
| Label | Hoax | Not Hoax |
|---|---|---|
| Total Data | 3850 | 851 |
- Text cleaning/preprocessing
- Non-standard word replacement
- Feature extraction: BoW, TF-IDF
- Classification: Naive Bayes, SVM, Logistic Regression, Decision Tree, kNN, ANN
- Cross-Validation: Grid Search, Random Search
- Post analysis: topicwizard, Voyant Tools, WordCloud