diff --git a/CLAUDE.md b/CLAUDE.md index a8846e2..92d0781 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co ## Project Overview -Bishop State Student Success Prediction - Full-stack ML + web application predicting student outcomes for Bishop State Community College. Uses 5 ML models to generate retention predictions, early warnings, time-to-credential estimates, credential type forecasts, and GPA predictions for ~4K students. +Bishop State Student Success Prediction - Full-stack ML + web application predicting student outcomes for Bishop State Community College. Uses 6 ML models to generate retention predictions, time-to-credential estimates, credential type forecasts, gateway math/English success predictions, and first-semester low-GPA predictions for ~4K students. Authoritative AI/ML inventory: `codebenders-dashboard/content/ai-transparency.ts`. ## Tech Stack @@ -15,14 +15,14 @@ Bishop State Student Success Prediction - Full-stack ML + web application predic | Charts | Recharts | | UI Components | shadcn/ui (Radix UI) | | Database | Postgres (Supabase), pg driver | -| AI Features | OpenAI (natural language query analysis) | +| AI Features | OpenAI gpt-4o-mini for: NL query → SQL analyzer (`codebenders-dashboard/app/api/analyze`), query result summarizer (`codebenders-dashboard/app/api/query-summary`), course-pairing explainer (`codebenders-dashboard/app/api/courses/explain-pairing`). Rule-based fallback at `codebenders-dashboard/lib/prompt-analyzer.ts`. Authoritative list: `codebenders-dashboard/content/ai-transparency.ts`. | | Infrastructure | Docker Compose, Vercel | ## Key Directories | Directory | Purpose | |-----------|---------| -| `ai_model/` | Python ML pipeline - 5 models (XGBoost + Random Forest) | +| `ai_model/` | Python ML pipeline - 6 models (XGBoost + Random Forest). Authoritative inventory: `codebenders-dashboard/content/ai-transparency.ts`. | | `codebenders-dashboard/` | Next.js web application | | `codebenders-dashboard/app/` | App Router pages and API routes | | `codebenders-dashboard/components/` | React components (shadcn/ui based) | @@ -87,6 +87,7 @@ Check these files for detailed information on specific topics: | Topic | File | |-------|------| +| AI / ML surface inventory (models, LLM routes, data flows) | `codebenders-dashboard/content/ai-transparency.ts` | | Architectural patterns | `.claude/docs/architectural_patterns.md` | | Project overview | `README.md` | | Quick start guide | `QUICKSTART.md` | diff --git a/README.md b/README.md index 35f124d..5768964 100644 --- a/README.md +++ b/README.md @@ -17,23 +17,26 @@ A comprehensive machine learning pipeline for predicting student success outcome ## 🎯 Overview -This project implements five machine learning models to predict various aspects of student success: +This project implements six machine learning models to predict various aspects of student success (authoritative names and data flows: [`codebenders-dashboard/content/ai-transparency.ts`](codebenders-dashboard/content/ai-transparency.ts)): 1. **Retention Prediction** - Will the student be retained? -2. **Early Warning System** - Is the student at risk? -3. **Time-to-Credential** - How long until graduation? -4. **Credential Type** - What credential will they earn? -5. **Course Success** - What will their GPA be? +2. **Time-to-Credential** - How long until credential completion? +3. **Credential Type** - What credential will they earn? +4. **Gateway Math Success** - Will the student succeed in gateway math? +5. **Gateway English Success** - Will the student succeed in gateway English? +6. **First-Semester Low-GPA Prediction** - Is the student at risk of a low first-semester GPA? The models use demographic, academic preparation, enrollment, and course performance data to generate actionable predictions for student support services. +The Next.js dashboard adds **natural language query (NLQ)** features: three OpenAI `gpt-4o-mini` API routes (`codebenders-dashboard/app/api/analyze/route.ts`, `codebenders-dashboard/app/api/query-summary/route.ts`, `codebenders-dashboard/app/api/courses/explain-pairing/route.ts`), a **rule-based fallback** in `codebenders-dashboard/lib/prompt-analyzer.ts`, and (when not using direct database mode) an **external data API** at `schools.syntex-ai.com`. See the same `ai-transparency.ts` file for the full inventory. + ## 📁 Project Structure ``` codebenders-datathon/ ├── ai_model/ # Machine learning models and scripts │ ├── __init__.py # Package initialization -│ ├── complete_ml_pipeline.py # Main ML pipeline (5 models) +│ ├── complete_ml_pipeline.py # Main ML pipeline (6 models) │ ├── generate_bishop_state_data.py # Synthetic data generation │ └── merge_bishop_state_data.py # Data merging script │ @@ -59,11 +62,11 @@ codebenders-datathon/ ### Prediction Capabilities -- **Retention Risk Assessment**: Identify students at risk of not returning -- **Early Warning Alerts**: Four-level alert system (URGENT, HIGH, MODERATE, LOW) +- **Retention Risk Assessment**: Retention probability and risk categories; dashboard alert views (URGENT / HIGH / MODERATE / LOW) are driven by these signals - **Graduation Timeline**: Predict time to credential completion - **Credential Path**: Forecast credential type (Certificate, Associate's, Bachelor's) -- **Academic Performance**: Predict expected GPA and identify over/underperformers +- **Gateway Success**: Predict gateway math and English completion outcomes +- **Early Academic Risk**: First-semester low-GPA risk prediction ### Technical Features @@ -139,7 +142,7 @@ python complete_ml_pipeline.py This will: 1. Test database connection 2. Load and preprocess data -3. Train all 5 models +3. Train all 6 models 4. Generate predictions for all students 5. Save results to **Postgres database** (or CSV files as fallback) 6. Save model performance metrics to database @@ -195,47 +198,31 @@ For more details, see [operations/README.md](operations/README.md). ## 🤖 Models -### 1. Retention Prediction Model - -**Algorithm**: XGBoost Classifier -**Target**: Binary (Retained / Not Retained) -**Features**: 40+ demographic, academic, and performance features - -**Output**: -- `retention_probability`: Probability of retention (0-1) -- `retention_prediction`: Binary prediction (0/1) -- `retention_risk_category`: Risk level (Critical/High/Moderate/Low) - -### 2. Early Warning System +Authoritative descriptions (inputs, algorithms, data flow): [`codebenders-dashboard/content/ai-transparency.ts`](codebenders-dashboard/content/ai-transparency.ts). The summaries below match that inventory. -**Algorithm**: Composite Risk Score -**Target**: Binary (At Risk / Not At Risk) -**Approach**: Combines retention probability with performance metrics +### 1. Retention Prediction -**Risk Factors**: -- Retention probability (50% weight) -- GPA performance (20% weight) -- Course completion rate (20% weight) -- Credit progress (10% weight) +**Algorithm**: XGBoost classifier (model family selected in `ai_model/complete_ml_pipeline.py`) +**Target**: Binary (Retained / Not Retained) +**Features**: Demographic, enrollment, year-one performance, and program signals -**Output**: -- `risk_score`: Comprehensive risk score (0-100) -- `at_risk_alert`: Alert level (URGENT/HIGH/MODERATE/LOW) -- `at_risk_probability`: Risk probability (0-1) -- `at_risk_prediction`: Binary prediction (0/1) +**Output** (examples): +- Retention probability and binary prediction +- Retention risk category (Critical / High / Moderate / Low) +- Dashboard risk alerts combine retention and related metrics -### 3. Time-to-Credential Model +### 2. Time-to-Credential Prediction -**Algorithm**: XGBoost Regressor +**Algorithm**: Random Forest regressor **Target**: Continuous (Years to credential) **Output**: - `predicted_time_to_credential`: Years to completion - `predicted_graduation_year`: Expected graduation year -### 4. Credential Type Model +### 3. Credential Type Prediction -**Algorithm**: Random Forest Classifier +**Algorithm**: Random Forest multi-class classifier **Target**: Multi-class (No Credential / Certificate / Associate's / Bachelor's) **Output**: @@ -243,14 +230,26 @@ For more details, see [operations/README.md](operations/README.md). - `predicted_credential_label`: Text label - `prob_no_credential`, `prob_certificate`, `prob_associate`, `prob_bachelor`: Class probabilities -### 5. Course Success Model +### 4. Gateway Math Success Prediction -**Algorithm**: Random Forest Regressor -**Target**: Continuous (GPA 0-4 scale) +**Algorithm**: XGBoost classifier +**Target**: Binary (success in gateway math) -**Output**: -- `predicted_gpa`: Expected GPA (0-4 scale) -- `gpa_performance`: Performance vs. expected (Above/Below/As Expected) +**Output**: Probability and prediction fields written to `student_predictions` (see data dictionary and pipeline outputs). + +### 5. Gateway English Success Prediction + +**Algorithm**: XGBoost classifier +**Target**: Binary (success in gateway English) + +**Output**: Probability and prediction fields written to `student_predictions`. + +### 6. First-Semester Low-GPA Prediction + +**Algorithm**: XGBoost classifier +**Target**: Binary (low first-semester GPA risk) + +**Output**: Probability and prediction fields written to `student_predictions`. ## 📊 Data @@ -299,6 +298,7 @@ If database connection fails, predictions are saved to CSV: ## 📚 Documentation +- **[codebenders-dashboard/content/ai-transparency.ts](codebenders-dashboard/content/ai-transparency.ts)**: Authoritative inventory of ML models, OpenAI/NLQ routes, rule-based fallback, and external data API surfaces - **[DATA_DICTIONARY.md](DATA_DICTIONARY.md)**: Detailed descriptions of all data fields - **[ML_MODELS_GUIDE.md](ML_MODELS_GUIDE.md)**: In-depth guide to machine learning models - **[DOCKER_SETUP.md](DOCKER_SETUP.md)**: Docker Compose setup for local Postgres diff --git a/codebenders-dashboard/DASHBOARD_README.md b/codebenders-dashboard/DASHBOARD_README.md index 1a25f44..bf013f5 100644 --- a/codebenders-dashboard/DASHBOARD_README.md +++ b/codebenders-dashboard/DASHBOARD_README.md @@ -4,6 +4,8 @@ A modern, interactive dashboard for visualizing student success metrics and predictive analytics for Bishop State Community College. +**Authoritative AI/ML inventory** (six trained models, three OpenAI `gpt-4o-mini` routes, rule-based NLQ fallback, external data API when not on direct DB): [`content/ai-transparency.ts`](content/ai-transparency.ts) (path from repo root: `codebenders-dashboard/content/ai-transparency.ts`). + ## Features ### 📊 Executive Dashboard (Home Page - `/`) @@ -38,9 +40,9 @@ Color-coded from red (critical) to green (low risk). ### 🔍 SQL Query Interface (`/query`) Advanced query interface for custom data analysis: -- Natural language to SQL conversion +- Natural language to SQL via OpenAI `gpt-4o-mini` (`app/api/analyze/route.ts`), optional result summarization (`app/api/query-summary/route.ts`), and course-pairing explanations (`app/api/courses/explain-pairing/route.ts`), with **rule-based fallback** in `lib/prompt-analyzer.ts` when the LLM path is off or disabled - Support for multiple institutions (Bishop State, University of Akron, Cal State San Bernardino, Thomas More) -- Direct database or API mode +- Direct database or **external data API** (`schools.syntex-ai.com`) when not in direct DB mode (see `content/ai-transparency.ts`) - Interactive visualizations (line, bar, pie charts, tables) - Query plan visualization @@ -154,11 +156,19 @@ Returns retention risk categories: ### Query APIs #### `POST /api/analyze` -Analyzes natural language prompts and generates SQL queries. +Analyzes natural language prompts and generates SQL (OpenAI `gpt-4o-mini`, with rule-based fallback in `lib/prompt-analyzer.ts`). + +#### `POST /api/query-summary` +Summarizes query results in natural language (OpenAI `gpt-4o-mini`). + +#### `POST /api/courses/explain-pairing` +Explains course-pairing recommendations (OpenAI `gpt-4o-mini`). #### `POST /api/execute-sql` Executes SQL queries directly against the database. +Full route list and data-flow notes: [`content/ai-transparency.ts`](content/ai-transparency.ts). + ## Environment Variables Create a `.env.local` file: @@ -252,6 +262,7 @@ See `/DASHBOARD_VISUALIZATIONS.md` for a comprehensive list of additional visual ## References +- **AI / ML surface inventory**: `codebenders-dashboard/content/ai-transparency.ts` (authoritative list of models, LLM routes, and integrations) - **Visualization Guide**: `/DASHBOARD_VISUALIZATIONS.md` - **Schema Documentation**: `codebenders-dashboard/env.example` - **Project PRD**: `/AI_Powered_Student_Success_PRD.md`