Skip to content

BradleyCharles/ad331_artificial_intelligence

Repository files navigation

AD331 Artificial Intelligence Course Project

A comprehensive machine learning project for predicting security incident detection times using linear regression. Features data analysis, model training, evaluation, and an interactive web dashboard.

🎯 Project Overview

This project implements an end-to-end machine learning pipeline to predict the time it takes to detect security incidents. The solution includes:

  • Data Analysis: Comprehensive exploratory data analysis with visualizations
  • Model Training: Linear regression with advanced feature engineering
  • Model Evaluation: Cross-validation and performance metrics
  • Web Dashboard: Interactive Next.js frontend with visualizations
  • API Backend: FastAPI backend for model predictions

πŸ“Š Model Performance

  • RΒ² Score: 0.8047 (80.47% variance explained)
  • Accuracy: 97% of predictions within 20% of actual values
  • Model Type: Linear Regression with advanced feature engineering
  • Dataset: 1,000 samples (800 training, 200 test)

πŸš€ Quick Start

Prerequisites

  • Node.js (v18+ recommended)
  • Python (3.13+)
  • npm or yarn

Installation

  1. Clone the repository

    git clone <repository-url>
    cd ad331_artificial_intelligence
  2. Install dependencies

    Frontend (Next.js):

    npm install

    Backend (Python):

    pip install -r backend/requirements.txt

Running the Application

Option 1: Start Everything at Once (Recommended)

python3 start_all.py

This script will:

  • Check requirements
  • Optionally install/update dependencies
  • Start both frontend and backend servers

Option 2: Manual Start

Start Frontend:

npm run dev

Frontend will be available at http://localhost:3000

Start Backend:

cd backend
python main.py

Backend will be available at http://localhost:8000

Alternative Startup Scripts

  • Linux/Mac: ./start_all.sh
  • Windows: start_all.bat

Assignment 8: LLM Evaluation

  • Run POST http://localhost:8000/api/assignment8/evaluate to score the Assignment 7 classifier on the held-out test set, produce accuracy/precision/recall/F1 (macro), and emit a normalized confusion matrix at public/visualizations/static/assignment8_confusion_matrix.png.
  • Macro-averaged F1 is the primary metric: it balances precision and recall per class so the minority label cannot hide behind majority-class accuracy. Accuracy alone can look strong even when one class (e.g., subjective statements) is frequently misclassified, so F1 better reflects real quality on imbalanced text data.

πŸ“˜ Assignments Overview (Goals + Function)

  • Assignment 1 β€” Dev Setup & Iris EDA: Stand up the ML toolkit (NumPy/Pandas/Matplotlib/Seaborn) and explore the iris dataset; compute stats and plot histograms/box/scatter charts to practice basic data profiling.
  • Assignment 2 β€” Time to Detection Analysis: Inspect the security incident dataset, surface correlations, and visualize the regression target/feature relationships in the Next.js dashboard.
  • Assignment 3 β€” MNIST Classification: Train a feedforward neural net on handwritten digits and serve predictions; includes an interactive canvas to draw digits, trigger training, and view accuracy/loss curves from the backend.
  • Assignment 4 β€” Large Language Models: Load TinyLlama via Hugging Face, experiment with temperature/top-p/max-tokens, and compare generations; comes with quick test cases plus an interactive chat panel.
  • Assignment 5 β€” Retrieval-Augmented Generation: Build a lightweight RAG pipeline over a D&D 2024 rules summary; chunk and embed text (MiniLM), retrieve relevant passages, and ground TinyLlama responses with a targeted test suite.
  • Assignment 6 β€” Prompt Engineering for Structured Extraction: Craft/evaluate prompts that pull Name, Price, and Date into strict JSON under paraphrases and noisy inputs; tracks runs, compliance, and optimized prompt variants.
  • Assignment 7 β€” PEFT (LoRA) News Classifier: Fine-tune roberta-base with LoRA adapters to label news as factual vs opinion while keeping the base model frozen; includes dataset loading, training controls, and metric visualizations.
  • Assignment 8 β€” Model Evaluation: Score the Assignment 7 classifier on a held-out split with macro metrics and a confusion matrix; backend endpoint exports the plot to public/visualizations/static/assignment8_confusion_matrix.png.
  • Assignment 9 β€” Placeholder: Slot reserved for the next module; UI stub points learners to check the assignments folder as new materials land.
  • Assignment 10 β€” Final Project: Placeholder for the capstone that will synthesize course concepts into a single end-to-end AI project.

πŸ“‹ Running Analysis Scripts

Complete Analysis Pipeline

python3 run_analysis.py

Interactive menu to run all scripts in sequence.

Individual Scripts

Data Analysis

# Generate comprehensive visualizations
python3 scripts/data_analysis/visualize_data.py

# Analyze failed logins patterns
python3 scripts/data_analysis/failed_logins_analysis.py

Model Training

# Train the linear regression model
python3 scripts/model_training/train_linear_regression.py

# Test model performance on test data
python3 scripts/model_training/test_model_performance.py

Utilities

# Interactive predictions
python3 scripts/utilities/predict_with_model.py

# View generated plots
python3 scripts/utilities/view_plots.py

🌐 Web Interface

Main Dashboard

  • Route: /assignment2
  • Features:
    • Dataset overview and statistics
    • Model performance metrics
    • Feature analysis results
    • Links to all visualizations

Interactive Visualizations

  • Scatter Matrix (/assignment2/scatter-matrix): Feature relationships
  • 3D Scatter Plot (/assignment2/3d-scatter): 3D feature visualization
  • Correlation Heatmap (/assignment2/correlation): Feature correlations
  • Failed Logins Analysis (/assignment2/failed-logins): Failed login patterns

πŸ“ Project Structure

ad331_artificial_intelligence/
β”œβ”€β”€ assignments/                 # Weekly assignments
β”‚   β”œβ”€β”€ week1/
β”‚   β”œβ”€β”€ week2/
β”‚   └── week3/
β”œβ”€β”€ backend/                    # FastAPI backend
β”‚   β”œβ”€β”€ main.py                # Backend server
β”‚   └── requirements.txt       # Python dependencies
β”œβ”€β”€ scripts/                    # Python analysis scripts
β”‚   β”œβ”€β”€ data_analysis/         # Data exploration & visualization
β”‚   β”œβ”€β”€ model_training/        # Model training & evaluation
β”‚   β”œβ”€β”€ utilities/             # Helper scripts
β”‚   └── README.md             # Script documentation
β”œβ”€β”€ src/app/                    # Next.js frontend
β”‚   β”œβ”€β”€ assignment2/          # Assignment 2 dashboard
β”‚   β”‚   β”œβ”€β”€ page.tsx          # Main dashboard
β”‚   β”‚   β”œβ”€β”€ scatter-matrix/   # Interactive scatter matrix
β”‚   β”‚   β”œβ”€β”€ 3d-scatter/       # 3D scatter plot
β”‚   β”‚   β”œβ”€β”€ correlation/      # Correlation heatmap
β”‚   β”‚   └── failed-logins/    # Failed logins analysis
β”‚   └── ...                   # Other assignment pages
β”œβ”€β”€ public/                     # Generated files
β”‚   β”œβ”€β”€ models/               # Trained ML models
β”‚   β”œβ”€β”€ reports/              # Analysis reports
β”‚   └── visualizations/       # All plots and charts
β”‚       β”œβ”€β”€ static/           # PNG visualizations
β”‚       └── interactive/        # HTML interactive plots
β”œβ”€β”€ test_data/                 # Dataset
β”‚   β”œβ”€β”€ time_to_detection_train.csv
β”‚   └── time_to_detection_test.csv
β”œβ”€β”€ start_all.py              # Main startup script
β”œβ”€β”€ run_analysis.py           # Analysis launcher
└── README.md                 # This file

πŸ“Š Generated Outputs

Models

  • best_linear_regression_model.pkl: Trained model ready for predictions

Reports

  • data_analysis_report.md: Comprehensive data analysis
  • model_results_summary.txt: Training results
  • model_test_performance_report.md: Test performance details
  • WEEK2_DATA_ANALYSIS.md: Week 2 specific analysis

Visualizations

Static (PNG):

  • Dataset overview
  • Feature analysis
  • Correlation heatmap
  • Time analysis
  • Model evaluation
  • Feature importance
  • Test performance
  • Failed logins hourly analysis

Interactive (HTML):

  • Scatter matrix
  • 3D scatter plot
  • Correlation heatmap
  • Interactive 3D scatter
  • Failed logins heatmap and charts

πŸ”§ Technical Stack

Frontend

  • Framework: Next.js 15.5.6
  • UI: React 19.1.0
  • Styling: Tailwind CSS 4
  • Language: TypeScript 5

Backend

  • Framework: FastAPI 0.115.2
  • Server: Uvicorn 0.30.6
  • Validation: Pydantic 2.9.2

Machine Learning

  • Framework: scikit-learn 1.5.2
  • Data Processing: pandas 2.2.3, numpy 2.3.4
  • Visualization: matplotlib 3.9.2, seaborn 0.13.2, plotly
  • Additional: TensorFlow 2.20.0, PyTorch 2.9.0

πŸ“ˆ Model Insights

Top Predictive Features

  1. Alert Priority (0.622 correlation)
  2. Privilege Escalations (0.599 correlation)
  3. Average CPU Percent (0.285 correlation)
  4. Failed Logins (0.236 correlation)
  5. Data Transfer (0.209 correlation)

Performance Characteristics

  • All priority levels achieve good prediction accuracy
  • Higher priority alerts show slightly better RΒ² scores
  • Consistent performance across different security scenarios

🎯 Use Cases

  • Security incident response planning
  • Detection time optimization
  • Risk assessment and prioritization
  • Security team resource allocation
  • Incident response training

πŸ“š Documentation

  • Project Overview: See PROJECT_OVERVIEW.md for detailed project information
  • Script Documentation: See scripts/README.md for script usage
  • Reports: Check public/reports/ for generated analysis reports

πŸ” Development

Building for Production

Frontend:

npm run build
npm start

Running Tests

python3 scripts/model_training/test_model_performance.py

🀝 Contributing

This is a course project for AD331 Artificial Intelligence. For questions or issues, please refer to the course materials or contact the instructor.

πŸ“ License

This project is part of an academic course and is for educational purposes.


Course: AD331 Artificial Intelligence
Project: Time to Detection Prediction Model
Status: Active Development

About

Course repository for all assignments related to AD331

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages