This project is a sentiment analysis application specifically designed for Indonesian language text. It analyzes user input and classifies the sentiment as Positive, Negative, or Neutral with confidence scores.
- Indonesian Language Focus: Specifically tailored for Indonesian text preprocessing
- Machine Learning: Uses Logistic Regression with TF-IDF vectorization
- Real-time Analysis: Instant sentiment prediction with confidence scores
- Comprehensive Preprocessing: Includes slang normalization, stemming, and stopword removal
- Three-class Classification - Positive, Negative, and Neutral sentiments
- Confidence Scores - View probability distribution across all classes
- Text Preprocessing Pipeline
- Twitter-specific cleaning (mentions, hashtags, URLs)
- Emoji removal
- Slang normalization
- Stopword removal
- Stemming (Sastrawi)
- Capital ratio feature extraction
- Python 3.8+ - Programming language
- Streamlit - Web application framework
- scikit-learn - Machine learning library
- pandas - Data manipulation
- NLTK - Natural language processing
- Sastrawi - Indonesian stemmer
- TextBlob - Text processing
- NLTK Stopwords - Stopword removal
- TF-IDF Vectorization - Text feature extraction
- Logistic Regression - Classification model
- scipy - Sparse matrix operations
- Python 3.8 or higher
- pip (Python package manager)
- Git
git clone https://github.com/AmeliaSyahla/sentiment-analysis.git
cd sentiment-analysis# Windows
python -m venv .venv
.venv\Scripts\activate
# macOS/Linux
python3 -m venv .venv
source .venv/bin/activatepip install -r requirements.txtpython -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')"Ensure you have these files in your project directory:
sentiment_analysis/
├── Assets/
│ ├── Fix_Final_Berita_dan_Tweet.csv # Your dataset
│ ├── full_lexicon.csv # Sentiment lexicon
│ └── combined_slang_words.txt # Slang dictionary (optional)
├── app.py
├── train_model.py
└── requirements.txt
Before using the application, you need to train the sentiment analysis model:
python train_model.pyThis will generate:
model.pkl- Trained Logistic Regression modelvectorizer.pkl- TF-IDF vectorizermodel_config.json- Model configuration
streamlit run app.pyThe application will open in your default browser at http://localhost:8501
