This project focuses on statistical data analysis techniques such as regression analysis, statistical inference, and descriptive analysis. The goal is to explore a dataset, gain insights, and determine the key factors that influence the rental price of apartments using Python and various data analysis tools.
- Introduction
- Dataset
- Data Exploration and Descriptive Analysis
- Regression Analysis
- Statistical Inference
- Conclusion and Recommendations
- Methodology
- Discussion
- Limitations and Future Work
- Installation
- Usage
- License
- Acknowledgements
The objective of this analysis is to assess statistical data analysis skills, focusing on multiple linear regression and statistical inference. The assignment explores the factors affecting rental prices and utilizes Python for data analysis.
The dataset used is from Kaggle: Apartments for Rent Classified Dataset.
- Sample size: 10,000 apartments
- Variables: Includes details like
id,category,title,price,bedrooms,bathrooms,square_feet,location, andamenities.
Key steps include:
- Summary statistics: Examining basic statistics for important variables such as
price,square_feet,bathrooms, andbedrooms. - Data visualizations: Analyzing rental prices across states and cities, and exploring correlations between features and price.
- Data transformations: Outlier removal and price normalization are applied for better analysis results.
- Dependent Variable: Rental price.
- Independent Variables: Number of bedrooms, bathrooms, square footage, state, and price per square foot.
- Model Used: Multiple Linear Regression using Statsmodels and sklearn.
- Results:
- Adjusted R-squared of 0.964, indicating that approximately 96.4% of the variance in rental prices is explained by the model.
- F-Test: A large F-statistic indicates that the predictors collectively explain a significant portion of the variability in the rental price.
- P-values: The model is statistically significant with a p-value of 0.0, providing strong evidence against the null hypothesis.
- Key Findings: Bedrooms, bathrooms, square footage, and location significantly impact rental prices.
- Recommendations:
- For landlords: Helps set competitive pricing.
- For renters: Provides insights into key features that affect rental costs.
- Data Preprocessing: Outliers were removed, and transformations were applied to
pricefor improved model performance. - Software Used: Python (Pandas, Statsmodels, sklearn).
- Validation: 80/20 train-test split was employed.
- Findings: Positive correlations were found between rental prices and the number of bathrooms and bedrooms. Larger apartments and the presence of amenities also influenced prices.
- Comparison with literature: The results align with previous studies on rental pricing factors.
- Limitations: Potential bias from the dataset's origin and limited granularity on the quality of amenities.
- Future Work: Expanding the dataset and incorporating more granular data could improve the robustness of the results.
- Clone the repository:
git clone https://github.com/MahmoudElMahdi/Apartment-for-Rent.git
- Install the required dependencies:
pip install -r requirements.txt
- Run the Jupyter notebook to explore the dataset and perform regression analysis.
- Use the scripts folder for additional Python scripts for data preprocessing and model estimation.
This project is licensed under the MIT License. See the LICENSE file for details.
Dataset from Kaggle.