Analyzing Server Connection Patterns Using Clustering and Regression

This project was completed as part of the UBC DSCI 100: Introduction to Data Science course.
It explores how server resources can be optimized based on patterns in player connection behavior using clustering, visualization, and K-Nearest Neighbors (KNN) regression.

📁 Files

final.ipynb: Main notebook containing all code, visualizations, and modeling.
players.csv: Metadata of 196 players, including experience, subscription status, and gameplay hours.
sessions.csv: 1,535 session records, including session times and durations.

🎯 Project Goal

To predict and optimize server resource allocation by analyzing when players are most active.
This involves understanding the number of concurrent connections across days and hours.

🔍 Key Steps

1. Data Cleaning and Feature Engineering

Converted time strings to datetime objects
Calculated session durations
Extracted hour and day of week features
Removed sessions with 0 duration

2. Exploratory Data Analysis (EDA)

Plotted connections per hour and per day using Altair
Created bubble charts to visualize activity heatmaps
Identified late-night and weekend peak usage

3. Clustering with K-Means

Used elbow method to determine optimal k = 3
Grouped time slots into Low, Medium, and High connection density
Labeled data with cluster names for better interpretability

4. Modeling with KNN Regression

Evaluated three strategies using Root Mean Squared Percentage Error (RMSPE):

Model	Average RMSPE	Notes
Full dataset (no clusters)	0.42	Least accurate
Full dataset + density labels	0.23	Most accurate
Cluster-specific models	0.32	Intermediate accuracy, more modular

5. 3D Visualization

Created interactive 3D Plotly surfaces to show connection density across hours and days
Compared predicted vs actual patterns using KNN-regressed surfaces

📊 Key Findings

Peak activity between 11:30 PM and 4:30 AM, especially on Saturday 2:00 AM
Lowest activity during weekday mornings
Friday showed surprisingly low usage
Density-aware KNN models performed significantly better than unclustered models

🛠️ Tools Used

Python (Jupyter Notebook)
pandas, numpy
altair, plotly
scikit-learn (KMeans, KNN, GridSearchCV)

📌 Authors

JunHyun Kim
David Liu
Layni Janzen
Sydney Lee

This project was completed as part of UBC's DSCI 100 (Winter 2024) — Final Group Project (Group 10).

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
final.ipynb		final.ipynb
players.csv		players.csv
sessions.csv		sessions.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analyzing Server Connection Patterns Using Clustering and Regression

📁 Files

🎯 Project Goal

🔍 Key Steps

1. Data Cleaning and Feature Engineering

2. Exploratory Data Analysis (EDA)

3. Clustering with K-Means

4. Modeling with KNN Regression

5. 3D Visualization

📊 Key Findings

🛠️ Tools Used

📌 Authors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Analyzing Server Connection Patterns Using Clustering and Regression

📁 Files

🎯 Project Goal

🔍 Key Steps

1. Data Cleaning and Feature Engineering

2. Exploratory Data Analysis (EDA)

3. Clustering with K-Means

4. Modeling with KNN Regression

5. 3D Visualization

📊 Key Findings

🛠️ Tools Used

📌 Authors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages