Project Proposal: Estimating heating and cooling loads based on building characteristics
Dataset Link: https://archive.ics.uci.edu/dataset/242/energy+efficiency
The dataset for this analysis comes from the UCI Machine Learning Repository and contains heating load and cooling load requirements of buildings (that is, energy efficiency) as a function of building parameters. The dataset has 8 variables, 2 target values, and 768 observations with no missing values. The data was collected from performing energy analysis using 12 different building shapes simulated in Ecotect. The target variables are heating load (HL) and cooling load (CL), representing the energy requirements for maintaining thermal comfort within buildings.
Variables:
X1 - Relative Compactness: A measure of the building’s shape efficiency
X2 - Surface Area: The total surface area of the building
X3 - Wall Area: The area of the walls, contributing to heat transfer
X4 - Roof Area: The area of the roof, affecting thermal insulation
X5 - Overall Height: Building height, impacting air flow and heat transfer
X6 - Orientation: Cardinal direction of the buildings facade
X7- Glazing Area: Total window area, influencing natural light and insulation
X8 - Glazing Area Distribution: Spread of window area on each facade
Y1 (Response Variable) - Heating Load: Energy required for Heating.
Y2 (Response Variable) - Cooling Load: Energy required for Cooling.
The main objective of this project is to analyze the dataset in terms of the variables in order to develop a predictive model that will depict the most efficient heating and cooling loads based on the building characteristics. Identify and interpret the influence of each building feature on energy efficiency, providing insights that can inform sustainable design practices. Optimize model performance by experimenting with different regression techniques and feature selection methods.
First, explore the dataset to determine whether the dataset is normally distributed or not. Furthermore, evaluate regression models by splitting the data into training, test sets, and computing prediction errors in order to assess model performance. Utilizing the following:
- Multiple Linear Regression
- Multinomial Logistic Regression
- Decision Tree Regression
- Correlation Matrix
For data analysis to find the accuracy within each model, we will use:
- Mean Absolute Error (MAE)
- RMSE (Root Mean Squared Error)
Given that there are 2 response targets, separate models will be ran in order to compare r2 and prediction errors.