A smart waste classification system using Natural Language Processing (NLP) with TF-IDF and Cosine Similarity to categorize waste items into Organic, Anorganic, and B3 (Hazardous) categories.
- Real-time text-based waste classification
- Supports Indonesian language input
- Three categories: Organic, Anorganic, B3 (Hazardous)
- Confidence scores for each category displayed as progress bars
- Uses TF-IDF (Term Frequency-Inverse Document Frequency) with Cosine Similarity
- Add, edit, and delete training data
- View all dataset entries in a sortable table
- Search and filter by category
- Real-time word count calculation
- Persistent storage using JSON
- Re-train model with updated dataset
- View all past classifications
- Detailed view with confidence scores
- Timestamps for each classification
- Sorted by newest first
- Persistent storage using JSON
- Modern, intuitive GUI built with Java Swing
- Color-coded categories for easy identification
- Responsive design with split-pane views
- Hover effects and smooth interactions
-
Download the JAR file
- Go to the Releases page
- Download
WasteSense-1.0-SNAPSHOT.jar
-
Run the application
java -jar WasteSense-AI.jar
Or simply double-click the JAR file if your system has Java configured.
-
Clone the repository
git clone https://github.com/yourusername/wastesense-ai.git cd wastesense-ai -
Build with Maven
mvn clean package
-
Run the application
java -jar target/WasteSense-AI-1.0.jar
- Java: JDK 11 or higher
- Operating System: Windows, macOS, or Linux
- Memory: Minimum 512 MB RAM
- Disk Space: 50 MB free space
-
First Launch
- The application creates a
data/folder automatically - Two JSON files are created:
dataset.jsonandclassification_history.json
- The application creates a
-
Adding Training Data
- Click "Manage Dataset" button on the main screen
- Click "Add Entry" button
- Select category: Organic, Anorganic, or B3
- Enter waste item text (e.g., "botol plastik bekas")
- Click "Save"
-
Training the Model
- After adding dataset entries, click "Re-train Model"
- Ensure each category has at least one entry
- Model will be trained with TF-IDF vectors
-
Classifying Waste
- Enter waste description in the text area (e.g., "kaleng minuman aluminium")
- Click "Classify" button
- View the predicted category and confidence scores
- Click "Manage Dataset" → "Add Entry"
- Fill in category and text
- Word count is calculated automatically
- Double-click any row in the table, or
- Select row and click "Edit" button
- Modify and save changes
- Select row and click "Delete" button
- Confirm deletion
- Use the search bar to filter by text
- Use category dropdown to filter by category
- Click "Clear" to reset filters
- Click column headers to sort
- Click "View History" (if button exists) or access from menu
- View all past classifications sorted by newest
- Click any row to see detailed confidence scores
- Click "Refresh" to reload data
TF-IDF (Term Frequency-Inverse Document Frequency)
- Converts text into numerical vectors
- Term Frequency (TF): How often a word appears in a document
- Inverse Document Frequency (IDF): How unique a word is across all documents
Cosine Similarity
- Measures similarity between input text and category documents
- Ranges from 0 (no similarity) to 1 (identical)
- The category with highest similarity score wins
Formula:
TF-IDF(term, doc) = TF(term, doc) × IDF(term)
IDF(term) = log(N / df(term))
Cosine Similarity = (A · B) / (||A|| × ||B||)
- Format: JSON
- Location:
data/folder in application directory - Files:
dataset.json- Training datasetclassification_history.json- Classification logs
Dataset Structure:
[
{
"id": 1,
"category": "organic",
"text": "kulit pisang",
"wordCount": 2
}
]History Structure:
[
{
"timestamp": "2024-12-22T14:30:00",
"input": "botol plastik",
"classified": "anorganic",
"scores": {
"organic": 15.5,
"anorganic": 78.2,
"b3": 6.3
}
}
]Domain-Driven Design (DDD)
presentation/ - UI components (Swing)
application/ - Application services
domain/ - Business logic and models
infrastructure/ - Persistence (JSON)
Key Components:
DatasetRepository- Manages dataset CRUD operationsClassificationHistoryRepository- Manages history logsNLPPipelineService- Handles TF-IDF and classificationPredictionUI- Main classification interfaceDatasetManagerUI- Dataset management interfaceClassificationHistoryUI- History viewer
src/
└── main/
└── java/
└── org/
└── waste_sense/
├── application/
│ └── service/
│ ├── ClassificationHistoryApplicationService.java
│ ├── DatasetApplicationService.java
│ ├── DatasetApplicationServiceTest.java
│ ├── NLPPipelineService.java
│ └── NLPPipelineServiceTest.java
├── domain/
│ ├── dataset/
│ │ ├── model/
│ │ │ ├── DatasetEntry.java
│ │ │ └── DatasetEntryTest.java
│ │ └── repository/
│ │ ├── DatasetPersistenceRepository.java
│ │ └── DatasetRepository.java
│ ├── history_record/
│ │ ├── model/
│ │ │ ├── ClassificationHistory.java
│ │ │ ├── ClassificationHistoryTest.java
│ │ │ ├── ClassificationScores.java
│ │ │ └── ClassificationType.java
│ │ └── repository/
│ │ └── ClassificationHistoryRepository.java
│ └── nlp/
│ ├── model/
│ │ └── ClassificationResult.java
│ └── service/
│ ├── Classificator.java
│ ├── ClassificatorTest.java
│ ├── CosineSimilarity.java
│ ├── CosineSimilarityTest.java
│ ├── Normalizer.java
│ ├── NormalizerTest.java
│ ├── Tokenizer.java
│ ├── TokenizerTest.java
│ ├── Vectorizer.java
│ ├── VectorizerTest.java
│ ├── VocabularyBuilder.java
│ └── VocabularyBuilderTest.java
├── infrastructure/
│ └── repository/
│ ├── InMemoryDatasetRepository.java
│ ├── InMemoryDatasetRepositoryTest.java
│ ├── JsonClassificationHistoryRepository.java
│ ├── JsonClassificationHistoryRepositoryTest.java
│ ├── JsonPersistenceDatasetRepository.java
│ └── JsonPersistenceDatasetRepositoryTest.java
├── Main.java
└── presentation/
└── ui/
├── ClassificationHistoryUI.java
├── DatasetManagerUI.java
└── PredictionUI.java
This project is licensed under the MIT License - see the LICENSE file for details.