A comprehensive machine learning project for predicting security incident detection times using linear regression. Features data analysis, model training, evaluation, and an interactive web dashboard.
This project implements an end-to-end machine learning pipeline to predict the time it takes to detect security incidents. The solution includes:
- Data Analysis: Comprehensive exploratory data analysis with visualizations
- Model Training: Linear regression with advanced feature engineering
- Model Evaluation: Cross-validation and performance metrics
- Web Dashboard: Interactive Next.js frontend with visualizations
- API Backend: FastAPI backend for model predictions
- RΒ² Score: 0.8047 (80.47% variance explained)
- Accuracy: 97% of predictions within 20% of actual values
- Model Type: Linear Regression with advanced feature engineering
- Dataset: 1,000 samples (800 training, 200 test)
- Node.js (v18+ recommended)
- Python (3.13+)
- npm or yarn
-
Clone the repository
git clone <repository-url> cd ad331_artificial_intelligence
-
Install dependencies
Frontend (Next.js):
npm install
Backend (Python):
pip install -r backend/requirements.txt
python3 start_all.pyThis script will:
- Check requirements
- Optionally install/update dependencies
- Start both frontend and backend servers
Start Frontend:
npm run devFrontend will be available at http://localhost:3000
Start Backend:
cd backend
python main.pyBackend will be available at http://localhost:8000
- Linux/Mac:
./start_all.sh - Windows:
start_all.bat
- Run
POST http://localhost:8000/api/assignment8/evaluateto score the Assignment 7 classifier on the held-out test set, produce accuracy/precision/recall/F1 (macro), and emit a normalized confusion matrix atpublic/visualizations/static/assignment8_confusion_matrix.png. - Macro-averaged F1 is the primary metric: it balances precision and recall per class so the minority label cannot hide behind majority-class accuracy. Accuracy alone can look strong even when one class (e.g., subjective statements) is frequently misclassified, so F1 better reflects real quality on imbalanced text data.
- Assignment 1 β Dev Setup & Iris EDA: Stand up the ML toolkit (NumPy/Pandas/Matplotlib/Seaborn) and explore the iris dataset; compute stats and plot histograms/box/scatter charts to practice basic data profiling.
- Assignment 2 β Time to Detection Analysis: Inspect the security incident dataset, surface correlations, and visualize the regression target/feature relationships in the Next.js dashboard.
- Assignment 3 β MNIST Classification: Train a feedforward neural net on handwritten digits and serve predictions; includes an interactive canvas to draw digits, trigger training, and view accuracy/loss curves from the backend.
- Assignment 4 β Large Language Models: Load TinyLlama via Hugging Face, experiment with temperature/top-p/max-tokens, and compare generations; comes with quick test cases plus an interactive chat panel.
- Assignment 5 β Retrieval-Augmented Generation: Build a lightweight RAG pipeline over a D&D 2024 rules summary; chunk and embed text (MiniLM), retrieve relevant passages, and ground TinyLlama responses with a targeted test suite.
- Assignment 6 β Prompt Engineering for Structured Extraction: Craft/evaluate prompts that pull Name, Price, and Date into strict JSON under paraphrases and noisy inputs; tracks runs, compliance, and optimized prompt variants.
- Assignment 7 β PEFT (LoRA) News Classifier: Fine-tune
roberta-basewith LoRA adapters to label news as factual vs opinion while keeping the base model frozen; includes dataset loading, training controls, and metric visualizations. - Assignment 8 β Model Evaluation: Score the Assignment 7 classifier on a held-out split with macro metrics and a confusion matrix; backend endpoint exports the plot to
public/visualizations/static/assignment8_confusion_matrix.png. - Assignment 9 β Placeholder: Slot reserved for the next module; UI stub points learners to check the assignments folder as new materials land.
- Assignment 10 β Final Project: Placeholder for the capstone that will synthesize course concepts into a single end-to-end AI project.
python3 run_analysis.pyInteractive menu to run all scripts in sequence.
# Generate comprehensive visualizations
python3 scripts/data_analysis/visualize_data.py
# Analyze failed logins patterns
python3 scripts/data_analysis/failed_logins_analysis.py# Train the linear regression model
python3 scripts/model_training/train_linear_regression.py
# Test model performance on test data
python3 scripts/model_training/test_model_performance.py# Interactive predictions
python3 scripts/utilities/predict_with_model.py
# View generated plots
python3 scripts/utilities/view_plots.py- Route:
/assignment2 - Features:
- Dataset overview and statistics
- Model performance metrics
- Feature analysis results
- Links to all visualizations
- Scatter Matrix (
/assignment2/scatter-matrix): Feature relationships - 3D Scatter Plot (
/assignment2/3d-scatter): 3D feature visualization - Correlation Heatmap (
/assignment2/correlation): Feature correlations - Failed Logins Analysis (
/assignment2/failed-logins): Failed login patterns
ad331_artificial_intelligence/
βββ assignments/ # Weekly assignments
β βββ week1/
β βββ week2/
β βββ week3/
βββ backend/ # FastAPI backend
β βββ main.py # Backend server
β βββ requirements.txt # Python dependencies
βββ scripts/ # Python analysis scripts
β βββ data_analysis/ # Data exploration & visualization
β βββ model_training/ # Model training & evaluation
β βββ utilities/ # Helper scripts
β βββ README.md # Script documentation
βββ src/app/ # Next.js frontend
β βββ assignment2/ # Assignment 2 dashboard
β β βββ page.tsx # Main dashboard
β β βββ scatter-matrix/ # Interactive scatter matrix
β β βββ 3d-scatter/ # 3D scatter plot
β β βββ correlation/ # Correlation heatmap
β β βββ failed-logins/ # Failed logins analysis
β βββ ... # Other assignment pages
βββ public/ # Generated files
β βββ models/ # Trained ML models
β βββ reports/ # Analysis reports
β βββ visualizations/ # All plots and charts
β βββ static/ # PNG visualizations
β βββ interactive/ # HTML interactive plots
βββ test_data/ # Dataset
β βββ time_to_detection_train.csv
β βββ time_to_detection_test.csv
βββ start_all.py # Main startup script
βββ run_analysis.py # Analysis launcher
βββ README.md # This file
best_linear_regression_model.pkl: Trained model ready for predictions
data_analysis_report.md: Comprehensive data analysismodel_results_summary.txt: Training resultsmodel_test_performance_report.md: Test performance detailsWEEK2_DATA_ANALYSIS.md: Week 2 specific analysis
Static (PNG):
- Dataset overview
- Feature analysis
- Correlation heatmap
- Time analysis
- Model evaluation
- Feature importance
- Test performance
- Failed logins hourly analysis
Interactive (HTML):
- Scatter matrix
- 3D scatter plot
- Correlation heatmap
- Interactive 3D scatter
- Failed logins heatmap and charts
- Framework: Next.js 15.5.6
- UI: React 19.1.0
- Styling: Tailwind CSS 4
- Language: TypeScript 5
- Framework: FastAPI 0.115.2
- Server: Uvicorn 0.30.6
- Validation: Pydantic 2.9.2
- Framework: scikit-learn 1.5.2
- Data Processing: pandas 2.2.3, numpy 2.3.4
- Visualization: matplotlib 3.9.2, seaborn 0.13.2, plotly
- Additional: TensorFlow 2.20.0, PyTorch 2.9.0
- Alert Priority (0.622 correlation)
- Privilege Escalations (0.599 correlation)
- Average CPU Percent (0.285 correlation)
- Failed Logins (0.236 correlation)
- Data Transfer (0.209 correlation)
- All priority levels achieve good prediction accuracy
- Higher priority alerts show slightly better RΒ² scores
- Consistent performance across different security scenarios
- Security incident response planning
- Detection time optimization
- Risk assessment and prioritization
- Security team resource allocation
- Incident response training
- Project Overview: See
PROJECT_OVERVIEW.mdfor detailed project information - Script Documentation: See
scripts/README.mdfor script usage - Reports: Check
public/reports/for generated analysis reports
Frontend:
npm run build
npm startpython3 scripts/model_training/test_model_performance.pyThis is a course project for AD331 Artificial Intelligence. For questions or issues, please refer to the course materials or contact the instructor.
This project is part of an academic course and is for educational purposes.
Course: AD331 Artificial Intelligence
Project: Time to Detection Prediction Model
Status: Active Development