Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co

## Project Overview

Bishop State Student Success Prediction - Full-stack ML + web application predicting student outcomes for Bishop State Community College. Uses 5 ML models to generate retention predictions, early warnings, time-to-credential estimates, credential type forecasts, and GPA predictions for ~4K students.
Bishop State Student Success Prediction - Full-stack ML + web application predicting student outcomes for Bishop State Community College. Uses 6 ML models to generate retention predictions, time-to-credential estimates, credential type forecasts, gateway math/English success predictions, and first-semester low-GPA predictions for ~4K students. Authoritative AI/ML inventory: `codebenders-dashboard/content/ai-transparency.ts`.

## Tech Stack

Expand All @@ -15,14 +15,14 @@ Bishop State Student Success Prediction - Full-stack ML + web application predic
| Charts | Recharts |
| UI Components | shadcn/ui (Radix UI) |
| Database | Postgres (Supabase), pg driver |
| AI Features | OpenAI (natural language query analysis) |
| AI Features | OpenAI gpt-4o-mini for: NL query → SQL analyzer (`codebenders-dashboard/app/api/analyze`), query result summarizer (`codebenders-dashboard/app/api/query-summary`), course-pairing explainer (`codebenders-dashboard/app/api/courses/explain-pairing`). Rule-based fallback at `codebenders-dashboard/lib/prompt-analyzer.ts`. Authoritative list: `codebenders-dashboard/content/ai-transparency.ts`. |
| Infrastructure | Docker Compose, Vercel |

## Key Directories

| Directory | Purpose |
|-----------|---------|
| `ai_model/` | Python ML pipeline - 5 models (XGBoost + Random Forest) |
| `ai_model/` | Python ML pipeline - 6 models (XGBoost + Random Forest). Authoritative inventory: `codebenders-dashboard/content/ai-transparency.ts`. |
| `codebenders-dashboard/` | Next.js web application |
| `codebenders-dashboard/app/` | App Router pages and API routes |
| `codebenders-dashboard/components/` | React components (shadcn/ui based) |
Expand Down Expand Up @@ -87,6 +87,7 @@ Check these files for detailed information on specific topics:

| Topic | File |
|-------|------|
| AI / ML surface inventory (models, LLM routes, data flows) | `codebenders-dashboard/content/ai-transparency.ts` |
| Architectural patterns | `.claude/docs/architectural_patterns.md` |
| Project overview | `README.md` |
| Quick start guide | `QUICKSTART.md` |
Expand Down
90 changes: 45 additions & 45 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,23 +17,26 @@ A comprehensive machine learning pipeline for predicting student success outcome

## 🎯 Overview

This project implements five machine learning models to predict various aspects of student success:
This project implements six machine learning models to predict various aspects of student success (authoritative names and data flows: [`codebenders-dashboard/content/ai-transparency.ts`](codebenders-dashboard/content/ai-transparency.ts)):

1. **Retention Prediction** - Will the student be retained?
2. **Early Warning System** - Is the student at risk?
3. **Time-to-Credential** - How long until graduation?
4. **Credential Type** - What credential will they earn?
5. **Course Success** - What will their GPA be?
2. **Time-to-Credential** - How long until credential completion?
3. **Credential Type** - What credential will they earn?
4. **Gateway Math Success** - Will the student succeed in gateway math?
5. **Gateway English Success** - Will the student succeed in gateway English?
6. **First-Semester Low-GPA Prediction** - Is the student at risk of a low first-semester GPA?

The models use demographic, academic preparation, enrollment, and course performance data to generate actionable predictions for student support services.

The Next.js dashboard adds **natural language query (NLQ)** features: three OpenAI `gpt-4o-mini` API routes (`codebenders-dashboard/app/api/analyze/route.ts`, `codebenders-dashboard/app/api/query-summary/route.ts`, `codebenders-dashboard/app/api/courses/explain-pairing/route.ts`), a **rule-based fallback** in `codebenders-dashboard/lib/prompt-analyzer.ts`, and (when not using direct database mode) an **external data API** at `schools.syntex-ai.com`. See the same `ai-transparency.ts` file for the full inventory.

## 📁 Project Structure

```
codebenders-datathon/
├── ai_model/ # Machine learning models and scripts
│ ├── __init__.py # Package initialization
│ ├── complete_ml_pipeline.py # Main ML pipeline (5 models)
│ ├── complete_ml_pipeline.py # Main ML pipeline (6 models)
│ ├── generate_bishop_state_data.py # Synthetic data generation
│ └── merge_bishop_state_data.py # Data merging script
Expand All @@ -59,11 +62,11 @@ codebenders-datathon/

### Prediction Capabilities

- **Retention Risk Assessment**: Identify students at risk of not returning
- **Early Warning Alerts**: Four-level alert system (URGENT, HIGH, MODERATE, LOW)
- **Retention Risk Assessment**: Retention probability and risk categories; dashboard alert views (URGENT / HIGH / MODERATE / LOW) are driven by these signals
- **Graduation Timeline**: Predict time to credential completion
- **Credential Path**: Forecast credential type (Certificate, Associate's, Bachelor's)
- **Academic Performance**: Predict expected GPA and identify over/underperformers
- **Gateway Success**: Predict gateway math and English completion outcomes
- **Early Academic Risk**: First-semester low-GPA risk prediction

### Technical Features

Expand Down Expand Up @@ -139,7 +142,7 @@ python complete_ml_pipeline.py
This will:
1. Test database connection
2. Load and preprocess data
3. Train all 5 models
3. Train all 6 models
4. Generate predictions for all students
5. Save results to **Postgres database** (or CSV files as fallback)
6. Save model performance metrics to database
Expand Down Expand Up @@ -195,62 +198,58 @@ For more details, see [operations/README.md](operations/README.md).

## 🤖 Models

### 1. Retention Prediction Model

**Algorithm**: XGBoost Classifier
**Target**: Binary (Retained / Not Retained)
**Features**: 40+ demographic, academic, and performance features

**Output**:
- `retention_probability`: Probability of retention (0-1)
- `retention_prediction`: Binary prediction (0/1)
- `retention_risk_category`: Risk level (Critical/High/Moderate/Low)

### 2. Early Warning System
Authoritative descriptions (inputs, algorithms, data flow): [`codebenders-dashboard/content/ai-transparency.ts`](codebenders-dashboard/content/ai-transparency.ts). The summaries below match that inventory.

**Algorithm**: Composite Risk Score
**Target**: Binary (At Risk / Not At Risk)
**Approach**: Combines retention probability with performance metrics
### 1. Retention Prediction

**Risk Factors**:
- Retention probability (50% weight)
- GPA performance (20% weight)
- Course completion rate (20% weight)
- Credit progress (10% weight)
**Algorithm**: XGBoost classifier (model family selected in `ai_model/complete_ml_pipeline.py`)
**Target**: Binary (Retained / Not Retained)
**Features**: Demographic, enrollment, year-one performance, and program signals

**Output**:
- `risk_score`: Comprehensive risk score (0-100)
- `at_risk_alert`: Alert level (URGENT/HIGH/MODERATE/LOW)
- `at_risk_probability`: Risk probability (0-1)
- `at_risk_prediction`: Binary prediction (0/1)
**Output** (examples):
- Retention probability and binary prediction
- Retention risk category (Critical / High / Moderate / Low)
- Dashboard risk alerts combine retention and related metrics

### 3. Time-to-Credential Model
### 2. Time-to-Credential Prediction

**Algorithm**: XGBoost Regressor
**Algorithm**: Random Forest regressor
**Target**: Continuous (Years to credential)

**Output**:
- `predicted_time_to_credential`: Years to completion
- `predicted_graduation_year`: Expected graduation year

### 4. Credential Type Model
### 3. Credential Type Prediction

**Algorithm**: Random Forest Classifier
**Algorithm**: Random Forest multi-class classifier
**Target**: Multi-class (No Credential / Certificate / Associate's / Bachelor's)

**Output**:
- `predicted_credential_type`: Numeric code (0-3)
- `predicted_credential_label`: Text label
- `prob_no_credential`, `prob_certificate`, `prob_associate`, `prob_bachelor`: Class probabilities

### 5. Course Success Model
### 4. Gateway Math Success Prediction

**Algorithm**: Random Forest Regressor
**Target**: Continuous (GPA 0-4 scale)
**Algorithm**: XGBoost classifier
**Target**: Binary (success in gateway math)

**Output**:
- `predicted_gpa`: Expected GPA (0-4 scale)
- `gpa_performance`: Performance vs. expected (Above/Below/As Expected)
**Output**: Probability and prediction fields written to `student_predictions` (see data dictionary and pipeline outputs).

### 5. Gateway English Success Prediction

**Algorithm**: XGBoost classifier
**Target**: Binary (success in gateway English)

**Output**: Probability and prediction fields written to `student_predictions`.

### 6. First-Semester Low-GPA Prediction

**Algorithm**: XGBoost classifier
**Target**: Binary (low first-semester GPA risk)

**Output**: Probability and prediction fields written to `student_predictions`.

## 📊 Data

Expand Down Expand Up @@ -299,6 +298,7 @@ If database connection fails, predictions are saved to CSV:

## 📚 Documentation

- **[codebenders-dashboard/content/ai-transparency.ts](codebenders-dashboard/content/ai-transparency.ts)**: Authoritative inventory of ML models, OpenAI/NLQ routes, rule-based fallback, and external data API surfaces
- **[DATA_DICTIONARY.md](DATA_DICTIONARY.md)**: Detailed descriptions of all data fields
- **[ML_MODELS_GUIDE.md](ML_MODELS_GUIDE.md)**: In-depth guide to machine learning models
- **[DOCKER_SETUP.md](DOCKER_SETUP.md)**: Docker Compose setup for local Postgres
Expand Down
17 changes: 14 additions & 3 deletions codebenders-dashboard/DASHBOARD_README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@

A modern, interactive dashboard for visualizing student success metrics and predictive analytics for Bishop State Community College.

**Authoritative AI/ML inventory** (six trained models, three OpenAI `gpt-4o-mini` routes, rule-based NLQ fallback, external data API when not on direct DB): [`content/ai-transparency.ts`](content/ai-transparency.ts) (path from repo root: `codebenders-dashboard/content/ai-transparency.ts`).

## Features

### 📊 Executive Dashboard (Home Page - `/`)
Expand Down Expand Up @@ -38,9 +40,9 @@ Color-coded from red (critical) to green (low risk).
### 🔍 SQL Query Interface (`/query`)

Advanced query interface for custom data analysis:
- Natural language to SQL conversion
- Natural language to SQL via OpenAI `gpt-4o-mini` (`app/api/analyze/route.ts`), optional result summarization (`app/api/query-summary/route.ts`), and course-pairing explanations (`app/api/courses/explain-pairing/route.ts`), with **rule-based fallback** in `lib/prompt-analyzer.ts` when the LLM path is off or disabled
- Support for multiple institutions (Bishop State, University of Akron, Cal State San Bernardino, Thomas More)
- Direct database or API mode
- Direct database or **external data API** (`schools.syntex-ai.com`) when not in direct DB mode (see `content/ai-transparency.ts`)
- Interactive visualizations (line, bar, pie charts, tables)
- Query plan visualization

Expand Down Expand Up @@ -154,11 +156,19 @@ Returns retention risk categories:
### Query APIs

#### `POST /api/analyze`
Analyzes natural language prompts and generates SQL queries.
Analyzes natural language prompts and generates SQL (OpenAI `gpt-4o-mini`, with rule-based fallback in `lib/prompt-analyzer.ts`).

#### `POST /api/query-summary`
Summarizes query results in natural language (OpenAI `gpt-4o-mini`).

#### `POST /api/courses/explain-pairing`
Explains course-pairing recommendations (OpenAI `gpt-4o-mini`).

#### `POST /api/execute-sql`
Executes SQL queries directly against the database.

Full route list and data-flow notes: [`content/ai-transparency.ts`](content/ai-transparency.ts).

## Environment Variables

Create a `.env.local` file:
Expand Down Expand Up @@ -252,6 +262,7 @@ See `/DASHBOARD_VISUALIZATIONS.md` for a comprehensive list of additional visual

## References

- **AI / ML surface inventory**: `codebenders-dashboard/content/ai-transparency.ts` (authoritative list of models, LLM routes, and integrations)
- **Visualization Guide**: `/DASHBOARD_VISUALIZATIONS.md`
- **Schema Documentation**: `codebenders-dashboard/env.example`
- **Project PRD**: `/AI_Powered_Student_Success_PRD.md`
Expand Down
Loading