Skip to content

JunHyun-K/Analyzing-Server-Connection-Patterns-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Analyzing Server Connection Patterns Using Clustering and Regression

This project was completed as part of the UBC DSCI 100: Introduction to Data Science course.
It explores how server resources can be optimized based on patterns in player connection behavior using clustering, visualization, and K-Nearest Neighbors (KNN) regression.

📁 Files

  • final.ipynb: Main notebook containing all code, visualizations, and modeling.
  • players.csv: Metadata of 196 players, including experience, subscription status, and gameplay hours.
  • sessions.csv: 1,535 session records, including session times and durations.

🎯 Project Goal

To predict and optimize server resource allocation by analyzing when players are most active.
This involves understanding the number of concurrent connections across days and hours.

🔍 Key Steps

1. Data Cleaning and Feature Engineering

  • Converted time strings to datetime objects
  • Calculated session durations
  • Extracted hour and day of week features
  • Removed sessions with 0 duration

2. Exploratory Data Analysis (EDA)

  • Plotted connections per hour and per day using Altair
  • Created bubble charts to visualize activity heatmaps
  • Identified late-night and weekend peak usage

3. Clustering with K-Means

  • Used elbow method to determine optimal k = 3
  • Grouped time slots into Low, Medium, and High connection density
  • Labeled data with cluster names for better interpretability

4. Modeling with KNN Regression

Evaluated three strategies using Root Mean Squared Percentage Error (RMSPE):

Model Average RMSPE Notes
Full dataset (no clusters) 0.42 Least accurate
Full dataset + density labels 0.23 Most accurate
Cluster-specific models 0.32 Intermediate accuracy, more modular

5. 3D Visualization

  • Created interactive 3D Plotly surfaces to show connection density across hours and days
  • Compared predicted vs actual patterns using KNN-regressed surfaces

📊 Key Findings

  • Peak activity between 11:30 PM and 4:30 AM, especially on Saturday 2:00 AM
  • Lowest activity during weekday mornings
  • Friday showed surprisingly low usage
  • Density-aware KNN models performed significantly better than unclustered models

🛠️ Tools Used

  • Python (Jupyter Notebook)
  • pandas, numpy
  • altair, plotly
  • scikit-learn (KMeans, KNN, GridSearchCV)

📌 Authors

JunHyun Kim
David Liu
Layni Janzen
Sydney Lee

This project was completed as part of UBC's DSCI 100 (Winter 2024) — Final Group Project (Group 10).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors