-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathOverview.qmd
More file actions
128 lines (92 loc) · 3.06 KB
/
Overview.qmd
File metadata and controls
128 lines (92 loc) · 3.06 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
---
title: "Building Stock Analysis"
format: html
execute:
echo: False
---
## Load Libraries
```{python}
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('AggData.csv')
print('Total number of buildings recognised: ', 1961)
print("Number of buildings with footprints:", len(df))
df["Atemp"] = pd.to_numeric(df["Atemp"], errors="coerce")
df["BldYear"] = pd.to_numeric(df["BldYear"], errors="coerce")
df["PrimEnergy"] = pd.to_numeric(df["PrimEnergy"], errors="coerce")
df["nbFloor"] = pd.to_numeric(df["nbFloor"], errors="coerce")
print('Number of missing construction year: ', df["BldYear"].isna().sum()+(df["BldYear"] == 0).sum())
print('Number of missing Primary Energy: ', df["PrimEnergy"].isna().sum())
print('Number of missing number of floor: ', df["nbFloor"].isna().sum())
print('Number of missing heated area: ', df["Atemp"].isna().sum())
# remove invalid construction years
df = df[(df["BldYear"].notna()) & (df["BldYear"] != 0)]
# drop rows with other missing values
df = df.dropna(subset=["Atemp","PrimEnergy","nbFloor"])
df = df.dropna(subset=["Atemp","PrimEnergy","nbFloor"])
print("Number of buildings after filtering out nan values:", len(df))
print("Oldest building:", df["BldYear"].min())
print("Newest building:", df["BldYear"].max())
```
## Construction Year Distribution
```{python}
plt.figure()
plt.hist(df["BldYear"], bins=20)
plt.xlabel("Construction Year")
plt.ylabel("Number of Buildings")
plt.title("Distribution of Building Construction Years")
plt.show()
```
## Energy Performance vs Construction Year
```{python}
plt.figure()
plt.scatter(df["BldYear"], df["PrimEnergy"])
plt.xlabel("Construction Year")
plt.ylabel("Primary Energy ((kWh/m².y))")
plt.title("Energy Performance vs Building Age")
plt.show()
```
## Heated Area Distribution
```{python}
plt.figure()
plt.hist(df["Atemp"], bins=20)
plt.xlabel("Heated Area (m²)")
plt.ylabel("Number of Buildings")
plt.title("Distribution of Heated Area")
plt.show()
```
## Building Floors Distribution
```{python}
floor_counts = df["nbFloor"].value_counts().sort_index()
plt.figure()
plt.bar(floor_counts.index, floor_counts.values)
plt.xlabel("Number of Floors")
plt.ylabel("Number of Buildings")
plt.title("Building Height Distribution")
plt.show()
```
## Construction Period Analysis
```{python}
bins = [1900,1960,1975,1990,2005,2025]
labels = ["<1960","1960–75","1976–90","1991–2005","2006+"]
df["period"] = pd.cut(df["BldYear"], bins=bins, labels=labels)
period_counts = df["period"].value_counts().sort_index()
plt.figure()
plt.bar(period_counts.index, period_counts.values)
plt.xlabel("Construction Period")
plt.ylabel("Number of Buildings")
plt.title("Building Stock by Construction Period")
plt.show()
```
## Energy declaration by Building Age
```{python}
plt.figure()
for ec in sorted(df["EnergyClass"].unique()):
subset = df[df["EnergyClass"] == ec]
plt.scatter(subset["BldYear"], subset["PrimEnergy"], label=ec)
plt.legend(title="Energy Class")
plt.xlabel("Construction Year")
plt.ylabel("Primary Energy (kWh/m².y)")
plt.title("Energy Performance by Building Age")
plt.show()
```