A smart waste classification system using Natural Language Processing (NLP) with TF-IDF and Cosine Similarity to categorize waste items into Organic, Anorganic, and B3 (Hazardous) categories.
- Real-time text-based waste classification
- Supports Indonesian language input
- Three categories: Organic, Anorganic, B3 (Hazardous)
- Confidence scores for each category displayed as progress bars
- Uses TF-IDF (Term Frequency-Inverse Document Frequency) with Cosine Similarity
- Add, edit, and delete training data
- View all dataset entries in a sortable table
- Search and filter by category
- Real-time word count calculation
- Persistent storage using JSON
- Re-train model with updated dataset
- View all past classifications
- Detailed view with confidence scores
- Timestamps for each classification
- Sorted by newest first
- Persistent storage using JSON
- Modern, intuitive GUI built with Java Swing
- Color-coded categories for easy identification
- Responsive design with split-pane views
- Hover effects and smooth interactions
-
Download the JAR file
- Go to the Releases page
- Download
WasteSense-1.0-SNAPSHOT.jar
-
Run the application
java -jar WasteSense-AI.jar
Or simply double-click the JAR file if your system has Java configured.
-
Clone the repository
git clone https://github.com/yourusername/wastesense-ai.git cd wastesense-ai -
Build with Maven
mvn clean package
-
Run the application
java -jar target/WasteSense-AI-1.0.jar
- Java: JDK 11 or higher
- Operating System: Windows, macOS, or Linux
- Memory: Minimum 512 MB RAM
- Disk Space: 50 MB free space
-
First Launch
- The application creates a
data/folder automatically - Two JSON files are created:
dataset.jsonandclassification_history.json
- The application creates a
-
Adding Training Data
- Click "Manage Dataset" button on the main screen
- Click "Add Entry" button
- Select category: Organic, Anorganic, or B3
- Enter waste item text (e.g., "botol plastik bekas")
- Click "Save"
-
Training the Model
- After adding dataset entries, click "Re-train Model"
- Ensure each category has at least one entry
- Model will be trained with TF-IDF vectors
-
Classifying Waste
- Enter waste description in the text area (e.g., "kaleng minuman aluminium")
- Click "Classify" button
- View the predicted category and confidence scores
- Click "Manage Dataset" β "Add Entry"
- Fill in category and text
- Word count is calculated automatically
- Double-click any row in the table, or
- Select row and click "Edit" button
- Modify and save changes
- Select row and click "Delete" button
- Confirm deletion
- Use the search bar to filter by text
- Use category dropdown to filter by category
- Click "Clear" to reset filters
- Click column headers to sort
- Click "View History" (if button exists) or access from menu
- View all past classifications sorted by newest
- Click any row to see detailed confidence scores
- Click "Refresh" to reload data
TF-IDF (Term Frequency-Inverse Document Frequency)
- Converts text into numerical vectors
- Term Frequency (TF): How often a word appears in a document
- Inverse Document Frequency (IDF): How unique a word is across all documents
Cosine Similarity
- Measures similarity between input text and category documents
- Ranges from 0 (no similarity) to 1 (identical)
- The category with highest similarity score wins
Formula:
TF-IDF(term, doc) = TF(term, doc) Γ IDF(term)
IDF(term) = log(N / df(term))
Cosine Similarity = (A Β· B) / (||A|| Γ ||B||)
- Format: JSON
- Location:
data/folder in application directory - Files:
dataset.json- Training datasetclassification_history.json- Classification logs
Dataset Structure:
[
{
"id": 1,
"category": "organic",
"text": "kulit pisang",
"wordCount": 2
}
]History Structure:
[
{
"timestamp": "2024-12-22T14:30:00",
"input": "botol plastik",
"classified": "anorganic",
"scores": {
"organic": 15.5,
"anorganic": 78.2,
"b3": 6.3
}
}
]Domain-Driven Design (DDD)
presentation/ - UI components (Swing)
application/ - Application services
domain/ - Business logic and models
infrastructure/ - Persistence (JSON)
Key Components:
DatasetRepository- Manages dataset CRUD operationsClassificationHistoryRepository- Manages history logsNLPPipelineService- Handles TF-IDF and classificationPredictionUI- Main classification interfaceDatasetManagerUI- Dataset management interfaceClassificationHistoryUI- History viewer
src/
βββ main/
βββ java/
βββ org/
βββ waste_sense/
βββ application/
β βββ service/
β βββ ClassificationHistoryApplicationService.java
β βββ DatasetApplicationService.java
β βββ DatasetApplicationServiceTest.java
β βββ NLPPipelineService.java
β βββ NLPPipelineServiceTest.java
βββ domain/
β βββ dataset/
β β βββ model/
β β β βββ DatasetEntry.java
β β β βββ DatasetEntryTest.java
β β βββ repository/
β β βββ DatasetPersistenceRepository.java
β β βββ DatasetRepository.java
β βββ history_record/
β β βββ model/
β β β βββ ClassificationHistory.java
β β β βββ ClassificationHistoryTest.java
β β β βββ ClassificationScores.java
β β β βββ ClassificationType.java
β β βββ repository/
β β βββ ClassificationHistoryRepository.java
β βββ nlp/
β βββ model/
β β βββ ClassificationResult.java
β βββ service/
β βββ Classificator.java
β βββ ClassificatorTest.java
β βββ CosineSimilarity.java
β βββ CosineSimilarityTest.java
β βββ Normalizer.java
β βββ NormalizerTest.java
β βββ Tokenizer.java
β βββ TokenizerTest.java
β βββ Vectorizer.java
β βββ VectorizerTest.java
β βββ VocabularyBuilder.java
β βββ VocabularyBuilderTest.java
βββ infrastructure/
β βββ repository/
β βββ InMemoryDatasetRepository.java
β βββ InMemoryDatasetRepositoryTest.java
β βββ JsonClassificationHistoryRepository.java
β βββ JsonClassificationHistoryRepositoryTest.java
β βββ JsonPersistenceDatasetRepository.java
β βββ JsonPersistenceDatasetRepositoryTest.java
βββ Main.java
βββ presentation/
βββ ui/
βββ ClassificationHistoryUI.java
βββ DatasetManagerUI.java
βββ PredictionUI.java
This project is licensed under the MIT License - see the LICENSE file for details.