AD331 Artificial Intelligence Course Project

A comprehensive machine learning project for predicting security incident detection times using linear regression. Features data analysis, model training, evaluation, and an interactive web dashboard.

🎯 Project Overview

This project implements an end-to-end machine learning pipeline to predict the time it takes to detect security incidents. The solution includes:

Data Analysis: Comprehensive exploratory data analysis with visualizations
Model Training: Linear regression with advanced feature engineering
Model Evaluation: Cross-validation and performance metrics
Web Dashboard: Interactive Next.js frontend with visualizations
API Backend: FastAPI backend for model predictions

📊 Model Performance

R² Score: 0.8047 (80.47% variance explained)
Accuracy: 97% of predictions within 20% of actual values
Model Type: Linear Regression with advanced feature engineering
Dataset: 1,000 samples (800 training, 200 test)

🚀 Quick Start

Prerequisites

Node.js (v18+ recommended)
Python (3.13+)
npm or yarn

Installation

Clone the repository

git clone <repository-url>
cd ad331_artificial_intelligence

Install dependencies

Frontend (Next.js):
```
npm install
```
Backend (Python):
```
pip install -r backend/requirements.txt
```

Running the Application

Option 1: Start Everything at Once (Recommended)

python3 start_all.py

This script will:

Check requirements
Optionally install/update dependencies
Start both frontend and backend servers

Option 2: Manual Start

Start Frontend:

npm run dev

Frontend will be available at http://localhost:3000

Start Backend:

cd backend
python main.py

Backend will be available at http://localhost:8000

Alternative Startup Scripts

Linux/Mac: ./start_all.sh
Windows: start_all.bat

Assignment 8: LLM Evaluation

Run POST http://localhost:8000/api/assignment8/evaluate to score the Assignment 7 classifier on the held-out test set, produce accuracy/precision/recall/F1 (macro), and emit a normalized confusion matrix at public/visualizations/static/assignment8_confusion_matrix.png.
Macro-averaged F1 is the primary metric: it balances precision and recall per class so the minority label cannot hide behind majority-class accuracy. Accuracy alone can look strong even when one class (e.g., subjective statements) is frequently misclassified, so F1 better reflects real quality on imbalanced text data.

📘 Assignments Overview (Goals + Function)

Assignment 1 — Dev Setup & Iris EDA: Stand up the ML toolkit (NumPy/Pandas/Matplotlib/Seaborn) and explore the iris dataset; compute stats and plot histograms/box/scatter charts to practice basic data profiling.
Assignment 2 — Time to Detection Analysis: Inspect the security incident dataset, surface correlations, and visualize the regression target/feature relationships in the Next.js dashboard.
Assignment 3 — MNIST Classification: Train a feedforward neural net on handwritten digits and serve predictions; includes an interactive canvas to draw digits, trigger training, and view accuracy/loss curves from the backend.
Assignment 4 — Large Language Models: Load TinyLlama via Hugging Face, experiment with temperature/top-p/max-tokens, and compare generations; comes with quick test cases plus an interactive chat panel.
Assignment 5 — Retrieval-Augmented Generation: Build a lightweight RAG pipeline over a D&D 2024 rules summary; chunk and embed text (MiniLM), retrieve relevant passages, and ground TinyLlama responses with a targeted test suite.
Assignment 6 — Prompt Engineering for Structured Extraction: Craft/evaluate prompts that pull Name, Price, and Date into strict JSON under paraphrases and noisy inputs; tracks runs, compliance, and optimized prompt variants.
Assignment 7 — PEFT (LoRA) News Classifier: Fine-tune roberta-base with LoRA adapters to label news as factual vs opinion while keeping the base model frozen; includes dataset loading, training controls, and metric visualizations.
Assignment 8 — Model Evaluation: Score the Assignment 7 classifier on a held-out split with macro metrics and a confusion matrix; backend endpoint exports the plot to public/visualizations/static/assignment8_confusion_matrix.png.
Assignment 9 — Placeholder: Slot reserved for the next module; UI stub points learners to check the assignments folder as new materials land.
Assignment 10 — Final Project: Placeholder for the capstone that will synthesize course concepts into a single end-to-end AI project.

📋 Running Analysis Scripts

Complete Analysis Pipeline

python3 run_analysis.py

Interactive menu to run all scripts in sequence.

Individual Scripts

Data Analysis

# Generate comprehensive visualizations
python3 scripts/data_analysis/visualize_data.py

# Analyze failed logins patterns
python3 scripts/data_analysis/failed_logins_analysis.py

Model Training

# Train the linear regression model
python3 scripts/model_training/train_linear_regression.py

# Test model performance on test data
python3 scripts/model_training/test_model_performance.py

Utilities

# Interactive predictions
python3 scripts/utilities/predict_with_model.py

# View generated plots
python3 scripts/utilities/view_plots.py

🌐 Web Interface

Main Dashboard

Route: /assignment2
Features:
- Dataset overview and statistics
- Model performance metrics
- Feature analysis results
- Links to all visualizations

Interactive Visualizations

Scatter Matrix (/assignment2/scatter-matrix): Feature relationships
3D Scatter Plot (/assignment2/3d-scatter): 3D feature visualization
Correlation Heatmap (/assignment2/correlation): Feature correlations
Failed Logins Analysis (/assignment2/failed-logins): Failed login patterns

📁 Project Structure

ad331_artificial_intelligence/
├── assignments/                 # Weekly assignments
│   ├── week1/
│   ├── week2/
│   └── week3/
├── backend/                    # FastAPI backend
│   ├── main.py                # Backend server
│   └── requirements.txt       # Python dependencies
├── scripts/                    # Python analysis scripts
│   ├── data_analysis/         # Data exploration & visualization
│   ├── model_training/        # Model training & evaluation
│   ├── utilities/             # Helper scripts
│   └── README.md             # Script documentation
├── src/app/                    # Next.js frontend
│   ├── assignment2/          # Assignment 2 dashboard
│   │   ├── page.tsx          # Main dashboard
│   │   ├── scatter-matrix/   # Interactive scatter matrix
│   │   ├── 3d-scatter/       # 3D scatter plot
│   │   ├── correlation/      # Correlation heatmap
│   │   └── failed-logins/    # Failed logins analysis
│   └── ...                   # Other assignment pages
├── public/                     # Generated files
│   ├── models/               # Trained ML models
│   ├── reports/              # Analysis reports
│   └── visualizations/       # All plots and charts
│       ├── static/           # PNG visualizations
│       └── interactive/        # HTML interactive plots
├── test_data/                 # Dataset
│   ├── time_to_detection_train.csv
│   └── time_to_detection_test.csv
├── start_all.py              # Main startup script
├── run_analysis.py           # Analysis launcher
└── README.md                 # This file

📊 Generated Outputs

Models

best_linear_regression_model.pkl: Trained model ready for predictions

Reports

data_analysis_report.md: Comprehensive data analysis
model_results_summary.txt: Training results
model_test_performance_report.md: Test performance details
WEEK2_DATA_ANALYSIS.md: Week 2 specific analysis

Visualizations

Static (PNG):

Dataset overview
Feature analysis
Correlation heatmap
Time analysis
Model evaluation
Feature importance
Test performance
Failed logins hourly analysis

Interactive (HTML):

Scatter matrix
3D scatter plot
Correlation heatmap
Interactive 3D scatter
Failed logins heatmap and charts

🔧 Technical Stack

Frontend

Framework: Next.js 15.5.6
UI: React 19.1.0
Styling: Tailwind CSS 4
Language: TypeScript 5

Backend

Framework: FastAPI 0.115.2
Server: Uvicorn 0.30.6
Validation: Pydantic 2.9.2

Machine Learning

Framework: scikit-learn 1.5.2
Data Processing: pandas 2.2.3, numpy 2.3.4
Visualization: matplotlib 3.9.2, seaborn 0.13.2, plotly
Additional: TensorFlow 2.20.0, PyTorch 2.9.0

📈 Model Insights

Top Predictive Features

Alert Priority (0.622 correlation)
Privilege Escalations (0.599 correlation)
Average CPU Percent (0.285 correlation)
Failed Logins (0.236 correlation)
Data Transfer (0.209 correlation)

Performance Characteristics

All priority levels achieve good prediction accuracy
Higher priority alerts show slightly better R² scores
Consistent performance across different security scenarios

🎯 Use Cases

Security incident response planning
Detection time optimization
Risk assessment and prioritization
Security team resource allocation
Incident response training

📚 Documentation

Project Overview: See PROJECT_OVERVIEW.md for detailed project information
Script Documentation: See scripts/README.md for script usage
Reports: Check public/reports/ for generated analysis reports

🔍 Development

Building for Production

Frontend:

npm run build
npm start

Running Tests

python3 scripts/model_training/test_model_performance.py

🤝 Contributing

This is a course project for AD331 Artificial Intelligence. For questions or issues, please refer to the course materials or contact the instructor.

📝 License

This project is part of an academic course and is for educational purposes.

Course: AD331 Artificial Intelligence
Project: Time to Detection Prediction Model
Status: Active Development

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.npm-cache/_logs		.npm-cache/_logs
assignments		assignments
backend		backend
public		public
scripts		scripts
src/app		src/app
test_data		test_data
.gitignore		.gitignore
PROJECT_OVERVIEW.md		PROJECT_OVERVIEW.md
README.md		README.md
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
run_analysis.py		run_analysis.py
start_backend.py		start_backend.py
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

AD331 Artificial Intelligence Course Project

🎯 Project Overview

📊 Model Performance

🚀 Quick Start

Prerequisites

Installation

Running the Application

Option 1: Start Everything at Once (Recommended)

Option 2: Manual Start

Alternative Startup Scripts

Assignment 8: LLM Evaluation

📘 Assignments Overview (Goals + Function)

📋 Running Analysis Scripts

Complete Analysis Pipeline

Individual Scripts

Data Analysis

Model Training

Utilities

🌐 Web Interface

Main Dashboard

Interactive Visualizations

📁 Project Structure

📊 Generated Outputs

Models

Reports

Visualizations

🔧 Technical Stack

Frontend

Backend

Machine Learning

📈 Model Insights

Top Predictive Features

Performance Characteristics

🎯 Use Cases

📚 Documentation

🔍 Development

Building for Production

Running Tests

🤝 Contributing

📝 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages