Skip to content

Commit 4d35f23

Browse files
committed
[Example] REDD example and update doc
1 parent e0f0245 commit 4d35f23

6 files changed

Lines changed: 563 additions & 0 deletions

File tree

docs/zh/examples/REDD.md

Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
# 企业碳排放等级分类模型 Classification model for carbon emission levels of enterprises
2+
3+
## 背景简介
4+
5+
本项目基于 PaddlePaddle 构建了一个企业碳排放等级分类模型,融合了遥感卫星观测、气象信息与企业地理属性数据,使用加权 FocalLoss 抵抗类别不平衡,并引入 Dropout 机制与多层感知器结构提升模型表达能力和泛化能力。模型支持对企业在不同空间、时间、环境条件下的 CO₂ 排放等级进行预测,并提供训练过程多维指标可视化,适用于碳排放配额管理、碳达峰路线推演、排放结构治理等典型应用场景。经训练本模型分类的准确率可达75%+
6+
7+
本项目具备以下特性:
8+
- 融合卫星反演数据、地面排放数据与风场特征进行多源建模;
9+
- 采用分层抽样保证训练集分布均衡,增强泛化能力;
10+
- 使用加权 FocalLoss 加强对中高、高等级的学习;
11+
- 引入梯度范数等训练指标进行稳定性监控;
12+
- 支持分类性能多角度可视化展示与结果导出。
13+
14+
## 企业碳排放等级分类模型原理说明
15+
16+
本模型旨在利用多源融合数据,构建对企业单位时间内 CO₂ 排放等级的智能分类器,支撑精准监管与排放评估。
17+
18+
---
19+
20+
## 🌐 模型理论基础
21+
22+
### ✅ 多因子驱动假设
23+
企业碳排放受多重因素影响,包括:
24+
- 空间位置(纬度、经度)
25+
- 气象条件(风向、风速)
26+
- 时序因素(小时、月份)
27+
- 卫星遥感观测值(xco₂)
28+
29+
### ✅ 分级建模思路
30+
将连续型 CO₂ 排放值通过区间划分,映射为四类等级,实现多分类问题建模。
31+
32+
### ✅ 不平衡处理原则
33+
使用 FocalLoss 及类别权重策略,提升对中高、高等级样本的辨识能力,抑制主导类干扰。
34+
35+
### ✅ 时空特征建模思想
36+
引入企业的空间位置与采样时间,提取 `hour``month` 信息,捕捉排放时空异质性。
37+
38+
---
39+
40+
## 📊 字段与特征说明
41+
42+
| 字段名 | 含义 | 类型 | 说明 |
43+
|------------------|--------------|--------|------------------------------------|
44+
| 企业CO₂排放量 (kg) | 企业碳排放值 | 数值型 | 模型目标分类依据(4类) |
45+
| 卫星CO₂浓度 (xco2) | 卫星观测值 | 数值型 | 区域背景 CO₂ 浓度参考 |
46+
| 风向、风速 | 气象参数 | 数值型 | 空间传输扩散影响因子 |
47+
| 纬度、经度 | 企业地理位置 | 数值型 | 区域空间特征 |
48+
| 企业省份 | 所属行政区域 | 类别型 | OneHot 编码用于嵌入处理 |
49+
| 匹配时间 | 数据采集时间 | 时间型 | 提取 hour 与 month 作为时间特征 |
50+
51+
---
52+
53+
## 🏷 标签分类规则
54+
55+
| CO₂ 排放范围 (kg) | 等级标签 |
56+
|------------------|--------|
57+
| < 1500 | 低 (0) |
58+
| 1500 – 7800 | 中低 (1) |
59+
| 7800 – 40000 | 中高 (2) |
60+
| ≥ 40000 | 高 (3) |
61+
62+
---
63+
## 模型构建
64+
本模型采用经典多层感知器(MLP)结构,适用于低维、融合型特征构建分类任务。其构建流程如下:
65+
66+
### 🔍 特征预处理
67+
68+
- **数值特征**:使用 `StandardScaler` 标准化
69+
- **类别特征**:使用 `OneHotEncoder` 编码
70+
- **整合方式**:通过 `ColumnTransformer` 管道统一预处理流程
71+
72+
### 🏗 模型结构设计(MLP)
73+
74+
```text
75+
输入特征 → Linear(256) → ReLU → Dropout(0.3)
76+
→ Linear(128) → ReLU → Dropout(0.2)
77+
→ Linear(64) → ReLU
78+
→ Linear(32) → ReLU
79+
→ Linear(4) → Softmax 输出分类概率
80+
```
81+
82+
## 训练与评估
83+
84+
85+
### 数据划分策略
86+
87+
- **方法**`StratifiedShuffleSplit`
88+
- **目的**:保持各类别比例稳定,避免训练偏斜
89+
90+
### 训练参数设置
91+
92+
- Epoch 数:500
93+
- 批量训练:全量训练(后续可扩展 mini-batch)
94+
- 学习率:0.001(Adam 优化)
95+
- 验证频率:每 20 轮评估一次
96+
97+
### 核心监控指标
98+
99+
- 📈 **Loss 曲线**:训练损失随 epoch 变化趋势
100+
-**Accuracy 曲线**:验证集分类准确率
101+
- 📊 **Confusion Matrix**:评估误判情况
102+
- 🔍 **Gradient Norm**:追踪每轮梯度大小监控训练稳定性
103+
- 📉 **各类 Recall 曲线**:检测模型对不同等级的学习表现
104+
105+
---
106+
107+
## 📌 模型评价指标
108+
109+
| 指标 | 含义 |
110+
|-----------|--------------------------|
111+
| Accuracy | 全部预测样本中正确分类的比例 |
112+
| Precision | 各类别预测为正样本中正确的比例 |
113+
| Recall | 各类别中被成功预测出的比例 |
114+
| F1-score | Precision 与 Recall 的调和平均 |
115+
| 混淆矩阵 | 观测各类别间的预测错配情况 |
116+
117+
---
118+
119+
## 结果可视化
120+
121+
训练过程展示以下图表:
122+
- 📉 Loss 与 Accuracy 曲线
123+
![Loss](https://www.craes-air.cn/official/REDD_Loss.png)
124+
125+
- 📊 Recall 柱状图
126+
![Recall](https://www.craes-air.cn/official/REDD_Recall.png)
127+
128+
- 混淆矩阵热图
129+
![混淆矩阵热图](https://www.craes-air.cn/official/REDD_Confusion.png)
130+
131+
132+
## 结果展示
133+
134+
示例输出格式如下:
135+
136+
```csv
137+
企业名称,实际等级,预测等级
138+
企业A,2,2
139+
企业B,3,2
140+
企业C,1,1
141+
```
142+
143+
## 完整代码
144+
145+
确保数据文件在当前目录后运行:
146+
147+
``` py linenums="1" title="examples/REDD/REDD.py"
148+
--8<--
149+
examples/REDD/REDD.py
150+
--8<--
151+
```
152+
153+
154+
## 参考资料
155+
156+
- https://github.com/PaddlePaddle/PaddleSlim
157+
- https://scikit-learn.org/stable/modules/classes.html
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
stationIdC,year,mon,day,hour,prs,winDAvg2mi,winSAvg2mi,tem,rhu,pre3h,date,city,station
2+
53392,2024,1,1,0,857.8,335,0.3,-23.7,78,0,2024/1/1 0:00,蠟소왯,영괏
3+
53392,2024,1,1,3,857.6,175,0.6,-12.1,85,0,2024/1/1 3:00,蠟소왯,영괏
4+
53392,2024,1,1,6,855.9,217,1.4,-8.8,67,0,2024/1/1 6:00,蠟소왯,영괏
5+
53392,2024,1,1,9,856.1,276,0.6,-12.3,72,0,2024/1/1 9:00,蠟소왯,영괏
6+
53392,2024,1,1,12,856.4,147,1.4,-17.8,85,0,2024/1/1 12:00,蠟소왯,영괏
7+
53392,2024,1,1,15,856.2,256,0.9,-17.4,84,0,2024/1/1 15:00,蠟소왯,영괏
8+
53392,2024,1,1,18,856.1,245,1.3,-18.4,86,0,2024/1/1 18:00,蠟소왯,영괏

examples/REDD/20240101_data.xlsx

10 KB
Binary file not shown.

examples/REDD/Fusion_Data.xlsx

74.2 KB
Binary file not shown.

examples/REDD/REDD.py

Lines changed: 201 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,201 @@
1+
import matplotlib
2+
import matplotlib.pyplot as plt
3+
import numpy as np
4+
import paddle
5+
import paddle.nn as nn
6+
import paddle.nn.functional as F
7+
import pandas as pd
8+
from sklearn.compose import ColumnTransformer
9+
from sklearn.metrics import ConfusionMatrixDisplay
10+
from sklearn.metrics import accuracy_score
11+
from sklearn.metrics import classification_report
12+
from sklearn.metrics import confusion_matrix
13+
from sklearn.model_selection import StratifiedShuffleSplit
14+
from sklearn.preprocessing import OneHotEncoder
15+
from sklearn.preprocessing import StandardScaler
16+
17+
matplotlib.rcParams["font.family"] = "SimHei"
18+
matplotlib.rcParams["axes.unicode_minus"] = False
19+
20+
# ==== Parameter Configuration (Customizable) ====
21+
EPOCHS = 500
22+
LR = 0.01
23+
CLASS_WEIGHTS = [3.5, 3.5, 2.0, 2.5] # Class order: Low, Mid-Low, Mid-High, High
24+
25+
# ==== Classification Label Function ====
26+
def classify_emission(value):
27+
if value < 1500:
28+
return 0
29+
elif value < 7800:
30+
return 1
31+
elif value < 40000:
32+
return 2
33+
else:
34+
return 3
35+
36+
37+
# ==== FocalLoss Definition ====
38+
class FocalLoss(nn.Layer):
39+
def __init__(self, gamma=2, weight=None):
40+
super(FocalLoss, self).__init__()
41+
self.gamma = gamma
42+
self.weight = weight
43+
44+
def forward(self, input, target):
45+
logpt = F.cross_entropy(input, target, weight=self.weight, reduction="none")
46+
pt = paddle.exp(-logpt)
47+
loss = ((1 - pt) ** self.gamma) * logpt
48+
return loss.mean()
49+
50+
51+
# ==== Load and Clean Data ====
52+
df = pd.read_excel("./Fusion_Data.xlsx")
53+
df = df.dropna(
54+
subset=[
55+
"企业CO₂排放量 (kg)",
56+
"匹配时间",
57+
"企业省份",
58+
"卫星中心纬度",
59+
"卫星中心经度",
60+
"卫星CO₂浓度 (xco2)",
61+
"风向",
62+
"风速",
63+
]
64+
)
65+
df["匹配时间"] = pd.to_datetime(df["匹配时间"])
66+
df["hour"] = df["匹配时间"].dt.hour
67+
df["month"] = df["匹配时间"].dt.month
68+
69+
# ==== Feature Processing ====
70+
numeric_features = ["卫星中心纬度", "卫星中心经度", "卫星CO₂浓度 (xco2)", "风向", "风速", "hour", "month"]
71+
categorical_features = ["企业省份"]
72+
X_raw = df[numeric_features + categorical_features]
73+
y_raw = df["企业CO₂排放量 (kg)"].values.reshape(-1, 1)
74+
labels = np.vectorize(classify_emission)(y_raw.flatten())
75+
enterprise_names = df["企业名称"].values
76+
77+
ct = ColumnTransformer(
78+
[
79+
("num", StandardScaler(), numeric_features),
80+
("cat", OneHotEncoder(handle_unknown="ignore"), categorical_features),
81+
]
82+
)
83+
X = ct.fit_transform(X_raw)
84+
85+
# ==== Stratified Sampling ====
86+
sss = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=42)
87+
for train_idx, test_idx in sss.split(X, labels):
88+
X_train, X_test = X[train_idx], X[test_idx]
89+
y_train, y_test = labels[train_idx], labels[test_idx]
90+
name_train, name_test = enterprise_names[train_idx], enterprise_names[test_idx]
91+
92+
X_train = paddle.to_tensor(X_train, dtype="float32")
93+
y_train = paddle.to_tensor(y_train, dtype="int64")
94+
X_test = paddle.to_tensor(X_test, dtype="float32")
95+
y_test = paddle.to_tensor(y_test, dtype="int64")
96+
97+
# ==== Network Architecture ====
98+
class EmissionClassifier(nn.Layer):
99+
def __init__(self, input_dim):
100+
super().__init__()
101+
self.shared = nn.Sequential(
102+
nn.Linear(input_dim, 256),
103+
nn.ReLU(),
104+
nn.Dropout(0.3),
105+
nn.Linear(256, 128),
106+
nn.ReLU(),
107+
nn.Dropout(0.2),
108+
nn.Linear(128, 64),
109+
nn.ReLU(),
110+
)
111+
self.classifier = nn.Sequential(nn.Linear(64, 32), nn.ReLU(), nn.Linear(32, 4))
112+
113+
def forward(self, x):
114+
x = self.shared(x)
115+
return self.classifier(x)
116+
117+
118+
# ==== Model Training ====
119+
model = EmissionClassifier(input_dim=X.shape[1])
120+
optimizer = paddle.optimizer.Adam(parameters=model.parameters(), learning_rate=LR)
121+
loss_fn = FocalLoss(gamma=2, weight=paddle.to_tensor(CLASS_WEIGHTS, dtype="float32"))
122+
123+
train_loss_record = []
124+
val_acc_record = []
125+
126+
for epoch in range(EPOCHS):
127+
model.train()
128+
logits = model(X_train)
129+
loss = loss_fn(logits, y_train)
130+
loss.backward()
131+
optimizer.step()
132+
optimizer.clear_grad()
133+
train_loss_record.append(loss.numpy())
134+
135+
if (epoch + 1) % 20 == 0:
136+
model.eval()
137+
with paddle.no_grad():
138+
val_logits = model(X_test)
139+
preds = paddle.argmax(val_logits, axis=1)
140+
acc = accuracy_score(y_test.numpy(), preds.numpy())
141+
val_acc_record.append(acc)
142+
print(f"[Epoch {epoch+1}] loss={loss.numpy():.4f}, acc={acc:.4f}")
143+
144+
# ==== Model Evaluation ====
145+
model.eval()
146+
X_all_tensor = paddle.to_tensor(X, dtype="float32")
147+
with paddle.no_grad():
148+
preds = paddle.argmax(model(X_all_tensor), axis=1).numpy()
149+
150+
print("\n🎯 Overall Accuracy: {:.2f}%".format(accuracy_score(labels, preds) * 100))
151+
print("\n📊 Classification Report:")
152+
report = classification_report(
153+
labels, preds, target_names=["Low", "Mid-Low", "Mid-High", "High"], output_dict=True
154+
)
155+
print(
156+
classification_report(
157+
labels, preds, target_names=["Low", "Mid-Low", "Mid-High", "High"]
158+
)
159+
)
160+
161+
# ==== 📈 Training Loss Curve ====
162+
plt.figure()
163+
plt.plot(train_loss_record, label="Training Loss")
164+
plt.title("Training Loss Curve")
165+
plt.xlabel("Epoch")
166+
plt.ylabel("Loss")
167+
plt.grid(True)
168+
plt.legend()
169+
plt.tight_layout()
170+
plt.show()
171+
172+
# ==== 📊 Recall per Class Bar Chart ====
173+
plt.figure()
174+
target_names = ["Low", "Mid-Low", "Mid-High", "High"]
175+
recalls = [report[name]["recall"] for name in target_names]
176+
plt.bar(target_names, recalls)
177+
plt.title("Recall per Class")
178+
plt.ylabel("Recall")
179+
plt.ylim(0, 1)
180+
plt.grid(axis="y")
181+
plt.tight_layout()
182+
plt.show()
183+
184+
# ==== Confusion Matrix ====
185+
cm = confusion_matrix(labels, preds)
186+
ConfusionMatrixDisplay(
187+
confusion_matrix=cm, display_labels=["Low", "Mid-Low", "Mid-High", "High"]
188+
).plot(cmap="Blues")
189+
plt.title("Predicted vs Actual Class")
190+
plt.tight_layout()
191+
plt.show()
192+
193+
# ==== Export Results ====
194+
pd.DataFrame(
195+
{
196+
"Enterprise Name": enterprise_names,
197+
"Actual Class": labels,
198+
"Predicted Class": preds,
199+
}
200+
).to_csv("carbon_emission_prediction_results.csv", index=False)
201+
print("✅ Results exported to: carbon_emission_prediction_results.csv")

0 commit comments

Comments
 (0)