Production-grade AI evaluation system that scores, ranks, and improves AI-generated responses using machine learning and real-time APIs.
Scorix AI is a full-stack AI benchmarking platform that simulates enterprise-level LLM evaluation systems.
It allows you to:
- ⚡ Evaluate AI responses using ML models
- 🧠 Rank multiple responses intelligently
- 📊 Track evaluation logs & feedback
- 🔁 Retrain models with new datasets
-
Scores responses using:
- TF-IDF vectorization
- Cosine similarity
- Feature engineering
-
Output: 0–10 quality score
- Compare multiple responses for a prompt
- Returns ranked list based on score
- Store human feedback
- Use for retraining and improvement
- Upload CSV dataset
- Retrain model via API
TF-IDF (5000 features)
+ Cosine Similarity
+ Length Features
+ Word Count
+ Keyword Overlap
→ Gradient Boosting Regressor
- 🔥 R² Score: 0.9658
- 🔥 MAE: 0.2558
- 📊 Dataset Size: 50,000 samples
Frontend (Vanilla JS)
↓
FastAPI Backend
↓
ML Model (TF-IDF + Gradient Boosting)
↓
Evaluation + Ranking Engine
↓
Database (SQLite)
| Layer | Technology |
|---|---|
| Backend | FastAPI |
| ML Model | Scikit-learn |
| NLP | TF-IDF + Cosine Similarity |
| Database | SQLite |
| ORM | SQLAlchemy |
| Frontend | HTML, CSS, Vanilla JS |
| Server | Uvicorn |
pip install -r requirements.txtpython -m uvicorn backend.main:app --reload- App → http://localhost:8000
- Docs → http://localhost:8000/docs
POST /evaluate{
"prompt": "What is AI?",
"response": "AI is the simulation of human intelligence in machines."
}POST /rank{
"prompt": "What is AI?",
"responses": [
"AI is computer intelligence.",
"AI is the simulation of human intelligence in machines.",
"AI is random."
]
}POST /feedbackPOST /upload-datasetGET /health
✔ End-to-end ML pipeline (dataset → training → deployment) ✔ Real-time scoring & ranking APIs ✔ Feature-engineered ML model (not just black-box) ✔ RLHF-style feedback system
- 🔁 Reinforcement learning (RLHF loop)
- 📊 Model comparison leaderboard
- ⚡ Async batch evaluation
- 🧠 Deep learning-based scoring
Shivansh Thakur Linkedin
If you like this project, give it a ⭐ on GitHub 🚀