This repository contains the Jupyter notebook used to analyze the Spotify dataset from Kaggle.
The aim of this project is to analyze the Spotify dataset and create a linear regression model to predict the reputation of the songs based on a set of given features.
The objective of this analysis is to understand the relationships between different features of the songs and their reputation, and to build a predictive model using linear regression.
The dataset used in this analysis is sourced from Kaggle and can be found here.
- Data Loading and Preprocessing: Load the dataset and perform necessary preprocessing steps such as handling missing values, encoding categorical variables, and scaling numerical features.
- Exploratory Data Analysis (EDA): Perform EDA to understand the distribution of data, identify correlations, and visualize relationships between features.
- Feature Selection: Select relevant features for the linear regression model.
- Model Building: Build a linear regression model using the selected features.
- Model Evaluation: Evaluate the model's performance using appropriate metrics and visualize the results.
- Python
- Jupyter Notebook
- Pandas
- NumPy
- Scikit-learn
- Matplotlib
- Seaborn
The linear regression model was able to predict the reputation of songs with a reasonable accuracy. The analysis helped identify the most significant features that influence a song's reputation on Spotify.