This project analyzes TikTok comments related to Hijabistahub to understand public sentiment and engagement patterns, then benchmarks multiple text-classification models (Naive Bayes, SVM, Gradient Boosting). The objective is to convert unstructured social feedback into actionable insights that can support content strategy and brand perception monitoring.
- Sentiment classification of TikTok comments (positive vs negative)
- Engagement insights using view-count trends and influencer comparisons
- End-to-end ML workflow (preprocessing → training/testing → evaluation → visualization)
Sentiment Distribution
|
Top Comment Terms
|
View Count Trend Over Time
|
Influencer vs View Count
|
Training & Testing Workflow
|
Text Preprocessing Pipeline
|
- Cleaned noisy social text (symbols, duplicates, inconsistent casing)
- Structured comments into a usable dataset for modeling
- Tokenization
- Case transformation
- Token-length filtering
- Stopword removal (English + additional filtering)
The following models were trained and evaluated with consistent preprocessing:
- Naive Bayes
- SVM
- Gradient Boosting
- RapidMiner Studio (workflow-based modeling and evaluation)
- Python (Jupyter Notebooks) for preprocessing / labeling support
- CSV datasets for training/testing inputs
- Open RapidMiner Studio
- Load the
.rmpprocesses - Ensure CSV paths are mapped correctly
- Run the training/testing workflows and visualization process
- Open the notebooks (
.ipynb) - Run cells in order to reproduce preprocessing and dataset preparation





