As a Data Scientist at a housing agency in Boston, MA, I have been granted access to a dataset on housing prices from the U.S. Census Bureau. The goal is to present insights to the high management team to aid in informed decision-making. This project addresses several key questions regarding housing data.
- Is there a significant difference in the median value of houses bounded by the Charles River?
- Is there any difference in the mean values of houses based on the proportion of owner-occupied units built before 1940?
- Can we conclude that there is no relationship between nitric oxide concentrations and the proportion of non-retail business acres per town?
- What is the impact of an additional weighted distance to the five employment centers in Boston on the mean value of owner-occupied homes?
Using appropriate statistical analyses and visualizations, this project aims to provide insights into the above questions. The following sections will detail the methodology and results.
-
Statistical Analysis:
- Conduct hypothesis tests to determine differences in median and mean values.
- Perform correlation analysis to explore relationships between variables.
- Use regression analysis to assess the impact of distances on housing values.
-
Visualizations:
- Generate relevant plots and charts to illustrate findings.
- Each visualization will include explanations to provide context and insights.
- The analysis results will be presented in this section, providing clear and actionable insights for the high management team.
The project details are broken down into the following sections:
- Data Collection and Preparation: Steps taken to collect, clean, and prepare the data for analysis.
- Exploratory Data Analysis (EDA): Initial analysis to understand the data distribution and identify patterns.
- Statistical Tests: Detailed description of the tests conducted and their results.
- Conclusion: Summary of findings and their implications for decision-making.