Skip to content

Latest commit

 

History

History
280 lines (230 loc) · 8.77 KB

File metadata and controls

280 lines (230 loc) · 8.77 KB

WasteSense AI - Waste Classification System

A smart waste classification system using Natural Language Processing (NLP) with TF-IDF and Cosine Similarity to categorize waste items into Organic, Anorganic, and B3 (Hazardous) categories.

📋 Table of Contents

✨ Features

1. Waste Classification

  • Real-time text-based waste classification
  • Supports Indonesian language input
  • Three categories: Organic, Anorganic, B3 (Hazardous)
  • Confidence scores for each category displayed as progress bars
  • Uses TF-IDF (Term Frequency-Inverse Document Frequency) with Cosine Similarity

2. Dataset Management

  • Add, edit, and delete training data
  • View all dataset entries in a sortable table
  • Search and filter by category
  • Real-time word count calculation
  • Persistent storage using JSON
  • Re-train model with updated dataset

3. Classification History

  • View all past classifications
  • Detailed view with confidence scores
  • Timestamps for each classification
  • Sorted by newest first
  • Persistent storage using JSON

4. User Interface

  • Modern, intuitive GUI built with Java Swing
  • Color-coded categories for easy identification
  • Responsive design with split-pane views
  • Hover effects and smooth interactions

🚀 How to Run

Option 1: Using Pre-built JAR (Recommended)

  1. Download the JAR file

    • Go to the Releases page
    • Download WasteSense-1.0-SNAPSHOT.jar
  2. Run the application

    java -jar WasteSense-AI.jar

    Or simply double-click the JAR file if your system has Java configured.

Option 2: Building from Source

  1. Clone the repository

    git clone https://github.com/yourusername/wastesense-ai.git
    cd wastesense-ai
  2. Build with Maven

    mvn clean package
  3. Run the application

    java -jar target/WasteSense-AI-1.0.jar

💻 System Requirements

  • Java: JDK 11 or higher
  • Operating System: Windows, macOS, or Linux
  • Memory: Minimum 512 MB RAM
  • Disk Space: 50 MB free space

📖 Usage Guide

Getting Started

  1. First Launch

    • The application creates a data/ folder automatically
    • Two JSON files are created: dataset.json and classification_history.json
  2. Adding Training Data

    • Click "Manage Dataset" button on the main screen
    • Click "Add Entry" button
    • Select category: Organic, Anorganic, or B3
    • Enter waste item text (e.g., "botol plastik bekas")
    • Click "Save"
  3. Training the Model

    • After adding dataset entries, click "Re-train Model"
    • Ensure each category has at least one entry
    • Model will be trained with TF-IDF vectors
  4. Classifying Waste

    • Enter waste description in the text area (e.g., "kaleng minuman aluminium")
    • Click "Classify" button
    • View the predicted category and confidence scores

Dataset Management

Adding Data

  • Click "Manage Dataset" → "Add Entry"
  • Fill in category and text
  • Word count is calculated automatically

Editing Data

  • Double-click any row in the table, or
  • Select row and click "Edit" button
  • Modify and save changes

Deleting Data

  • Select row and click "Delete" button
  • Confirm deletion

Searching & Filtering

  • Use the search bar to filter by text
  • Use category dropdown to filter by category
  • Click "Clear" to reset filters
  • Click column headers to sort

Classification History

  • Click "View History" (if button exists) or access from menu
  • View all past classifications sorted by newest
  • Click any row to see detailed confidence scores
  • Click "Refresh" to reload data

🔧 Technical Details

Classification Algorithm

TF-IDF (Term Frequency-Inverse Document Frequency)

  • Converts text into numerical vectors
  • Term Frequency (TF): How often a word appears in a document
  • Inverse Document Frequency (IDF): How unique a word is across all documents

Cosine Similarity

  • Measures similarity between input text and category documents
  • Ranges from 0 (no similarity) to 1 (identical)
  • The category with highest similarity score wins

Formula:

TF-IDF(term, doc) = TF(term, doc) × IDF(term)
IDF(term) = log(N / df(term))

Cosine Similarity = (A · B) / (||A|| × ||B||)

Data Storage

  • Format: JSON
  • Location: data/ folder in application directory
  • Files:
    • dataset.json - Training dataset
    • classification_history.json - Classification logs

Dataset Structure:

[
  {
    "id": 1,
    "category": "organic",
    "text": "kulit pisang",
    "wordCount": 2
  }
]

History Structure:

[
  {
    "timestamp": "2024-12-22T14:30:00",
    "input": "botol plastik",
    "classified": "anorganic",
    "scores": {
      "organic": 15.5,
      "anorganic": 78.2,
      "b3": 6.3
    }
  }
]

Architecture

Domain-Driven Design (DDD)

presentation/     - UI components (Swing)
application/      - Application services
domain/          - Business logic and models
infrastructure/  - Persistence (JSON)

Key Components:

  • DatasetRepository - Manages dataset CRUD operations
  • ClassificationHistoryRepository - Manages history logs
  • NLPPipelineService - Handles TF-IDF and classification
  • PredictionUI - Main classification interface
  • DatasetManagerUI - Dataset management interface
  • ClassificationHistoryUI - History viewer

📁 Project Structure

src/
└── main/
   └── java/
       └── org/
           └── waste_sense/
               ├── application/
               │   └── service/
               │       ├── ClassificationHistoryApplicationService.java
               │       ├── DatasetApplicationService.java
               │       ├── DatasetApplicationServiceTest.java
               │       ├── NLPPipelineService.java
               │       └── NLPPipelineServiceTest.java
               ├── domain/
               │   ├── dataset/
               │   │   ├── model/
               │   │   │   ├── DatasetEntry.java
               │   │   │   └── DatasetEntryTest.java
               │   │   └── repository/
               │   │       ├── DatasetPersistenceRepository.java
               │   │       └── DatasetRepository.java
               │   ├── history_record/
               │   │   ├── model/
               │   │   │   ├── ClassificationHistory.java
               │   │   │   ├── ClassificationHistoryTest.java
               │   │   │   ├── ClassificationScores.java
               │   │   │   └── ClassificationType.java
               │   │   └── repository/
               │   │       └── ClassificationHistoryRepository.java
               │   └── nlp/
               │       ├── model/
               │       │   └── ClassificationResult.java
               │       └── service/
               │           ├── Classificator.java
               │           ├── ClassificatorTest.java
               │           ├── CosineSimilarity.java
               │           ├── CosineSimilarityTest.java
               │           ├── Normalizer.java
               │           ├── NormalizerTest.java
               │           ├── Tokenizer.java
               │           ├── TokenizerTest.java
               │           ├── Vectorizer.java
               │           ├── VectorizerTest.java
               │           ├── VocabularyBuilder.java
               │           └── VocabularyBuilderTest.java
               ├── infrastructure/
               │   └── repository/
               │       ├── InMemoryDatasetRepository.java
               │       ├── InMemoryDatasetRepositoryTest.java
               │       ├── JsonClassificationHistoryRepository.java
               │       ├── JsonClassificationHistoryRepositoryTest.java
               │       ├── JsonPersistenceDatasetRepository.java
               │       └── JsonPersistenceDatasetRepositoryTest.java
               ├── Main.java
               └── presentation/
                   └── ui/
                       ├── ClassificationHistoryUI.java
                       ├── DatasetManagerUI.java
                       └── PredictionUI.java

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.