Skip to content

Latest commit

 

History

History
442 lines (348 loc) · 12.4 KB

File metadata and controls

442 lines (348 loc) · 12.4 KB

🎭 Customer Sentiment Analysis API

Production-ready sentiment analysis API powered by fine-tuned DistilBERT. Analyze customer reviews at scale with 90%+ accuracy.

Model Dataset Demo Python FastAPI License

🎯 What I learned with this project:

  • HuggingFace Expertise: Model fine-tuning, dataset creation, model hub publishing
  • Transformer Models: Fine-tuning DistilBERT for domain-specific tasks
  • Production ML: Model optimization, API deployment, monitoring
  • MLOps: Training pipelines, model versioning, A/B testing setup
  • API Development: FastAPI with async support, batch processing
  • Documentation: Comprehensive model cards, API docs, deployment guides

🌟 Features

graph LR
    A[Customer Review] --> B[FastAPI Endpoint]
    B --> C[Preprocessing]
    C --> D[DistilBERT Model]
    D --> E[Post-processing]
    E --> F[JSON Response]
    
    G[Batch Reviews] --> H[Async Processing]
    H --> D
    
    style D fill:#ffe1e1
    style F fill:#e1ffe1
Loading

API Capabilities

  • 🚀 Fast Inference: < 50ms response time
  • 📊 Batch Processing: Handle multiple reviews efficiently
  • 🎯 High Accuracy: 90.2% on test set
  • 📈 Confidence Scores: Get prediction confidence
  • 🔄 Async Support: Non-blocking requests
  • 📝 Comprehensive Logging: Track all predictions
  • 🐳 Docker Ready: One-command deployment

Model Features

  • Optimized: Quantized version available (4x smaller)
  • 🌍 Public: Published on HuggingFace Hub
  • 📚 Well-Documented: Complete model card
  • 🧪 Tested: 90+ unit and integration tests
  • 🔧 Flexible: Easy to fine-tune on your data

🚀 Quick Start

Try the Live Demo

🎮 Interactive Demo on HuggingFace Spaces

Run Locally

Docker (Recommended)

git clone https://github.com/IberaSoft/sentiment-analysis-api.git
cd sentiment-analysis-api
docker-compose up -d

# Test
curl -X POST "http://localhost:8000/api/v1/predict" \
  -H "Content-Type: application/json" \
  -d '{"text": "This product is amazing!"}'

Local Development

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
uvicorn app.main:app --reload --port 8000

Visit http://localhost:8000/docs for interactive API documentation.

Use Model Directly

from transformers import pipeline

classifier = pipeline(
    "sentiment-analysis",
    model="IberaSoft/customer-sentiment-analyzer"
)

result = classifier("This product exceeded my expectations!")
print(result)  # [{'label': 'positive', 'score': 0.9823}]

📡 API Overview

Base URL: http://localhost:8000/api/v1

Interactive Docs: Visit /docs for Swagger UI

Quick Example

# Single prediction
curl -X POST "http://localhost:8000/api/v1/predict" \
  -H "Content-Type: application/json" \
  -d '{"text": "Great product!"}'

# Response
{
  "sentiment": "positive",
  "confidence": 0.94,
  "scores": {"positive": 0.94, "negative": 0.03, "neutral": 0.03},
  "processing_time_ms": 35
}

Main Endpoints

  • POST /predict - Analyze single text
  • POST /predict/batch - Analyze multiple texts (max 100)
  • GET /model/info - Model information and metrics
  • GET /health - Health check

Full API documentation: See docs/API.md

📊 Model Performance

Metrics

Metric Score
Accuracy 90.2%
F1 Score 0.89
Precision 0.90
Recall 0.89
Inference Time 35ms (CPU)

Confusion Matrix

                Predicted
              Pos  Neu  Neg
Actual Pos  [ 728   45   27 ]
       Neu  [  38  430   32 ]
       Neg  [  22   48  630 ]

Benchmark Results

Batch Size Throughput (req/s) Latency P95 (ms)
1 28 45
8 89 120
32 156 280

Tested on Intel i7-11700K

🏗️ Architecture

flowchart TB
    subgraph Client
        A[Web/Mobile App]
        B[Backend Service]
    end
    
    subgraph API Layer
        C[FastAPI Server]
        D[Request Validation]
        E[Response Formatting]
    end
    
    subgraph ML Layer
        F[Preprocessing]
        G[DistilBERT Model]
        H[Postprocessing]
    end
    
    subgraph Storage
        I[Model Cache]
        J[Logs]
    end
    
    A --> C
    B --> C
    C --> D
    D --> F
    F --> G
    G --> H
    H --> E
    E --> C
    G <--> I
    C --> J
    
    style G fill:#ffe1e1
    style C fill:#e3f2fd
Loading

🛠️ Tech Stack

Component Technology Purpose
ML Framework HuggingFace Transformers Model training & inference
Base Model DistilBERT Pre-trained transformer
API Framework FastAPI REST API server
Web Server Uvicorn ASGI server
Validation Pydantic Request/response validation
Testing Pytest Unit & integration tests
Load Testing Locust Performance testing
Containerization Docker Deployment
CI/CD GitHub Actions Automated testing & deployment
Monitoring Prometheus Metrics collection

📁 Project Structure

sentiment-analysis-api/
├── app/
│   ├── main.py                  # FastAPI application
│   ├── config.py                # Configuration
│   ├── api/
│   │   └── endpoints/
│   │       ├── predict.py       # Prediction endpoints
│   │       ├── batch.py         # Batch processing
│   │       └── health.py        # Health checks
│   ├── core/
│   │   ├── model.py             # Model loading & inference
│   │   ├── preprocessing.py     # Text preprocessing
│   │   └── cache.py             # Response caching
│   ├── schemas/
│   │   ├── request.py           # Request models
│   │   └── response.py          # Response models
│   └── utils/
│       ├── logger.py            # Logging configuration
│       └── metrics.py           # Prometheus metrics
│
├── training/
│   ├── prepare_dataset.py       # Dataset preparation
│   ├── train.py                 # Model training
│   ├── evaluate.py              # Model evaluation
│   ├── optimize.py              # Model optimization
│   └── configs/
│       └── training_config.yaml
│
├── tests/
│   ├── unit/
│   │   ├── test_preprocessing.py
│   │   ├── test_model.py
│   │   └── test_api.py
│   ├── integration/
│   │   └── test_end_to_end.py
│   └── load/
│       └── locustfile.py
│
├── scripts/
│   ├── download_data.py
│   ├── upload_to_hf.py
│   └── benchmark.py
│
├── notebooks/
│   ├── 01_data_exploration.ipynb
│   ├── 02_model_training.ipynb
│   └── 03_error_analysis.ipynb
│
├── docs/
│   ├── API.md                  # API reference
│   ├── TRAINING.md             # Model training guide
│   ├── DEPLOYMENT.md           # Deployment options
│   ├── SPACES_GUIDE.md         # HuggingFace Spaces setup
│   ├── HF_TOKEN_GUIDE.md       # Token setup guide
│   └── TROUBLESHOOTING.md      # Common issues & solutions
│
├── .github/
│   └── workflows/
│       ├── test.yml
│       └── deploy.yml
│
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
├── requirements-dev.txt
├── .env.example
├── README.md
└── LICENSE

💻 Development

Setup

git clone https://github.com/IberaSoft/sentiment-analysis-api.git
cd sentiment-analysis-api
python -m venv venv
source venv/bin/activate
pip install -r requirements-dev.txt
pre-commit install

Train Model

cd training
python prepare_dataset.py --output-dir ./data
python train.py --config configs/training_config.yaml

python evaluate.py --model-dir ./models/customer-sentiment-v1
python ../scripts/upload_to_hf.py --model-dir ./models/customer-sentiment-v1

Full training guide: See docs/TRAINING.md

Run Tests

pytest tests/ -v                          # All tests
pytest tests/ --cov=app --cov-report=html # With coverage

🚀 Deployment

Docker

docker-compose up -d

HuggingFace Spaces

  1. Fork this repository
  2. Create Space on HuggingFace
  3. Connect GitHub repo
  4. Auto-deploys!

Spaces guide: See docs/SPACES_GUIDE.md

Cloud Deployment

Supports AWS, GCP, Azure, DigitalOcean, and more.

Full deployment guide: See docs/DEPLOYMENT.md

📈 Monitoring

Metrics: Available at /metrics (Prometheus format)

Logging: Structured JSON logs for easy parsing

🔧 Customization

Fine-tune on your own data:

python training/train.py \
  --base-model IberaSoft/customer-sentiment-analyzer \
  --dataset your-username/your-dataset \
  --output-dir ./models/custom-model

See docs/TRAINING.md for details.

📚 Documentation

🔗 Links

🗺️ Roadmap

v1.1 - Enhanced Features (Next)

  • Multi-language support (Spanish, French, German)
  • Aspect-based sentiment analysis
  • Confidence calibration improvements
  • Real-time model updates

v1.2 - Performance (Planned)

  • ONNX optimization
  • Model distillation (smaller, faster)
  • GPU batch processing
  • Response streaming

v2.0 - Advanced (Future)

  • Multi-model ensemble
  • Active learning pipeline
  • A/B testing framework
  • Explainability (SHAP, LIME)

v3.0 - Enterprise (Future)

  • Multi-tenancy support
  • Custom model training UI
  • Advanced analytics dashboard
  • SLA monitoring

🤝 Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

Ways to contribute:

  • 🐛 Report bugs
  • 💡 Suggest features
  • 📝 Improve documentation
  • 🧪 Add tests
  • 🎨 Improve UI/UX

📝 License

This project is licensed under the MIT License - see LICENSE for details.

🙏 Acknowledgments

  • HuggingFace for Transformers library and model hub
  • FastAPI team for excellent framework
  • DistilBERT authors for the efficient base model
  • Community for feedback and contributions

Project Links:


⭐ Star this repo if you find it useful!

Built with ❤️ by an aspiring AI/ML Engineer

GitHub stars GitHub forks

Try the live demo: HuggingFace Spaces