Spotify Top Songs Clustering Analysis

Overview

This project analyzes the top Spotify songs dataset using exploratory data analysis (EDA), dimensionality reduction techniques (PCA, t-SNE, UMAP), and clustering algorithms (K-Means, DBSCAN, Agglomerative Clustering). The goal is to uncover trends and patterns in popular music based on features such as energy, danceability, acousticness, and valence.

The results demonstrate the strengths and weaknesses of different dimensionality reduction and clustering methods in grouping songs by their musical characteristics.

Features

Univariate and bivariate exploratory data analysis
Correlation heatmaps and pairwise relationship visualization
Comparison of dimensionality reduction methods: PCA, t-SNE, UMAP
Clustering with K-Means, DBSCAN, and Agglomerative on reduced embeddings
Cluster interpretability with feature summaries
Insights into popular music trends based on clustering results

Dataset

The dataset used is Top Spotify Songs.
Source: Kaggle Spotify Top Songs Dataset
Download the CSV file from Kaggle and place it in your working directory or upload it to your environment.

Usage

Clone this repository:

git clone https://github.com/yourusername/spotify-clustering.git cd spotify-clustering

Install dependencies:

pip install -r requirements.txt

Launch the Jupyter notebook
Follow the notebook cells to run EDA, dimensionality reduction, clustering, and visualization.

Requirements

Python 3.7+
Required Python libraries:
pandas
numpy
matplotlib
seaborn
scikit-learn
plotly
umap-learn

Methodology Summary

Exploratory Data Analysis (EDA): Understand distributions, correlations, and relationships between song features.
Dimensionality Reduction:
- PCA: Captures linear variance but limited for this dataset.
- t-SNE and UMAP: Reveal nonlinear structure with better cluster separation.
Clustering:
- K-Means: Produces the clearest clusters, especially on UMAP embeddings.
- DBSCAN: Less effective due to continuous feature variation.
- Agglomerative Clustering: Limited cluster interpretability for this dataset.
Insights:
Popular songs generally show high energy, danceability, and positive valence, while acoustic and live performances form smaller, distinct groups.

Results

PCA explains a limited amount of variance indicating non-linear relationships.
t-SNE and UMAP produce meaningful low-dimensional embeddings with more spread.
K-Means clustering on UMAP embeddings offers the most interpretable clusters.
Cluster feature summaries differentiate groups by musical characteristics, e.g., energetic/danceable vs. acoustic.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md
SpotifyMusicGenre.ipynb		SpotifyMusicGenre.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spotify Top Songs Clustering Analysis

Overview

Features

Dataset

Usage

Requirements

Methodology Summary

Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Spotify Top Songs Clustering Analysis

Overview

Features

Dataset

Usage

Requirements

Methodology Summary

Results

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages