轻量级人脸检测与识别系统
SCRFD-500M + MobileFaceNet/ArcFace | PyTorch Implementation
| 模块 | 模型 | 输入尺寸 | 参数量 | 说明 |
|---|---|---|---|---|
| 检测 | SCRFD-500M | 640×640 | ~0.57M | Anchor-free 单阶段检测器,支持多尺度训练 |
| 识别 | MobileFaceNet | 112×112 | ~1.0M | 移动端骨干,支持 ArcFace + 知识蒸馏 |
- 🚀 纯 PyTorch 实现,无重型框架依赖
- 🎯 Anchor-free 检测头 (QFL + DFL + GIoU),代码简洁可读
- 🧠 知识蒸馏:InsightFace buffalo_l 作为教师模型,余弦相似度 + 关系蒸馏
- ⚡ 混合精度训练 (AMP) + Cosine Warmup + Early Stopping
- 📦 ONNX 导出,支持 onnxruntime / TensorRT 推理
- 📊 完整的训练日志和实验记录模板
| 实验 | 模型 | AP50 | Epochs | 输入尺寸 | 数据增强 |
|---|---|---|---|---|---|
| DET_BASE_001 | SCRFD-500M | 0.5458 | 165 (early stop) | 640 | Multi-scale + HSV + Blur |
| 实验 | 模型 | Acc | TAR@FAR=1e-3 | TAR@FAR=1e-4 | EER | Epochs |
|---|---|---|---|---|---|---|
| REC_BASE_001 | MobileFaceNet + ArcFace | 0.7502 | — | — | — | 45 (early stop) |
| REC_DISTILL_001 | MobileFaceNet + ArcFace + Distill | 0.9561 | 0.0878 | 0.0416 | 0.2646 | 26 (early stop) |
⚠️ 注意:以上为基线实验指标。检测器 AP50 偏低主要受限于训练数据规模;识别蒸馏模型在 LFW 上 95.6% 仍低于生产级 SOTA(~99.8%),适用于研究对比和轻量化部署探索场景。
📈 训练曲线(点击展开)
训练日志位于 runs/train/ 下,使用以下脚本生成可视化曲线:
# 检测训练曲线
python tools/plot_detection_training_curves.py \
--log-dir runs/train/detection/DET_BASE_001 \
--out-dir runs/figures/
# 识别训练曲线
python tools/plot_recognition_training_curves.py \
--log-dir runs/train/recognition/REC_DISTILL_001 \
--out-dir runs/figures/face-light/
├── models/ # 模型定义
│ ├── scrfd.py # SCRFD-500M 检测器 (backbone + FPN + head)
│ ├── mobilefacenet.py # MobileFaceNet 识别骨干
│ ├── losses.py # QFL / DFL / GIoU / ArcFace 损失函数
│ └── datasets.py # 检测 + 识别数据集加载器
├── configs/
│ └── train/ # 训练配置文件
│ ├── det_scrfd_base.yaml
│ ├── rec_mobilefacenet_arcface_base.yaml
│ └── rec_mobilefacenet_arcface_distill.yaml
├── tools/
│ ├── data_prep/ # 数据预处理 (WIDER Face / 自建数据集 → YOLO 格式)
│ ├── data_qc/ # 数据质量检查 (损坏图 / 类别不均 / 标注越界)
│ ├── eval/ # 评估工具 (AP / TAR@FAR / EER / 阈值扫描)
│ ├── deploy/ # 部署工具 (ONNX 导出流水线 / 实时边缘推理)
│ ├── report/ # 报告生成 (训练曲线 / 对比图)
│ └── plot_*.py # 训练曲线可视化
├── runs/
│ └── train/ # 训练日志 (training_log.json + 实验记录)
├── train_detector.py # 检测训练入口
├── train_recognizer.py # 识别训练入口 (基线)
├── train_recognizer_distill.py # 识别训练入口 (知识蒸馏)
├── export_onnx.py # ONNX 导出
└── requirements.txt
- Python 3.10+
- PyTorch 2.0+ (CUDA 推荐)
- 8GB+ GPU 显存 (训练),4GB+ (推理)
git clone https://github.com/lechan775/face-light.git
cd face-light
pip install -r requirements.txt检测数据(YOLO 格式)
data/processed/detection/
├── train/
│ ├── images/ # *.jpg
│ └── labels/ # *.txt (每行: class cx cy w h, 归一化)
└── val/
├── images/
└── labels/
从 WIDER Face 转换:
python tools/data_prep/convert_widerface.py \
--wider-root /path/to/WIDERFace \
--out-dir data/processed/detection识别数据(身份文件夹)
data/processed/recognition/
└── train/
├── id_000001/
│ ├── 000.jpg
│ └── 001.jpg
├── id_000002/
│ └── ...
└── ...
LFW 验证集(pairs.txt 协议):
data/processed/recognition/eval/
├── lfw/ # LFW 对齐人脸图片
└── protocols/
└── lfw/
└── pairs.txt # 3000 positive + 3000 negative pairs
python train_detector.py \
--config configs/train/det_scrfd_base.yaml \
--seed 42 \
--exp-id DET_MY_EXP \
--log-dir runs/train/detection/DET_MY_EXP关键超参(configs/train/det_scrfd_base.yaml):
| 参数 | 默认值 | 说明 |
|---|---|---|
batch_size |
128 | 单卡 batch size |
epochs |
200 | 总训练轮数 |
lr |
0.01 | SGD 初始学习率 |
warmup_epochs |
5 | 学习率 warmup |
multi_scale_train |
[512,544,576,608,640,672] | 多尺度训练尺寸 |
early_stop_patience |
20 | 早停耐心值 |
python train_recognizer.py \
--config configs/train/rec_mobilefacenet_arcface_base.yaml \
--seed 42 \
--exp-id REC_MY_EXP \
--log-dir runs/train/recognition/REC_MY_EXP# 需要先下载 InsightFace buffalo_l 教师模型:
# 默认路径: ~/.insightface/models/buffalo_l/w600k_r50.onnx
python train_recognizer_distill.py \
--config configs/train/rec_mobilefacenet_arcface_distill.yaml \
--seed 42 \
--exp-id REC_DISTILL_MY_EXP \
--log-dir runs/train/recognition/REC_DISTILL_MY_EXP蒸馏损失组件:
L_total = 1.0 * L_cls(ArcFace) + 18.0 * L_feat(cosine) + 32.0 * L_rel(smooth_l1)
python train_recognizer.py \
--config configs/train/rec_mobilefacenet_arcface_base.yaml \
--exp-id REC_RESUME --log-dir runs/train/recognition/REC_RESUME \
--resume runs/train/recognition/REC_PREV/last.pt# 检测器评估 (AP50)
python tools/eval/eval_detector.py \
--model-path runs/train/detection/DET_BASE_001/best.pt \
--backend pytorch \
--dataset-root data/processed/detection \
--split val --anno-format yolo \
--out-dir runs/eval/detection/DET_EVAL
# 识别器评估 (LFW accuracy / TAR@FAR / EER)
python tools/eval/eval_recognizer.py \
--model-path runs/train/recognition/REC_DISTILL_001/best.pt \
--embed-dim 512 \
--pairs-file data/processed/recognition/eval/protocols/lfw/pairs.txt \
--lfw-dir data/processed/recognition/eval/lfw \
--out-dir runs/eval/recognition/REC_EVAL# 导出检测器
python export_onnx.py \
--task detection \
--weights runs/train/detection/DET_BASE_001/best.pt \
--imgsz 640 640 \
--opset 13 \
--out models/export/detector.onnx
# 导出识别器
python export_onnx.py \
--task recognition \
--weights runs/train/recognition/REC_DISTILL_001/best.pt \
--imgsz 112 112 \
--opset 13 \
--out models/export/recognizer.onnximport onnxruntime as ort
import numpy as np
# Detection
det_sess = ort.InferenceSession("detector.onnx")
cls, reg, ltrb = det_sess.run(None, {"images": img_np})
# Recognition
rec_sess = ort.InferenceSession("recognizer.onnx")
emb = rec_sess.run(["embedding"], {"images": face_np})[0] # (1, 512)python tools/deploy/realtime_edge_runtime.py \
--det-engine runs/train/detection/DET_BASE_001/best.pt \
--rec-engine runs/train/recognition/REC_DISTILL_001/best.pt \
--source 0 \
--run-seconds 30Input (3×640×640)
│
▼
Stem: ConvBNReLU(3→16, s=2) → 320×320
│
▼
Backbone (4-stage depthwise separable)
Stage1: DSConv(16→32, s=2) + IRB×1 → 160×160 (C2)
Stage2: DSConv(32→64, s=2) + IRB×2 → 80×80 (C3) ◄── FPN P3
Stage3: DSConv(64→128, s=2) + IRB×2 → 40×40 (C4) ◄── FPN P4
Stage4: DSConv(128→128, s=2) + IRB×1 → 20×20 (C5) ◄── FPN P5
│
▼
FPN (top-down + lateral, out=64ch)
P5 → P4 → P3
│
▼
Shared Detection Head (per level)
├── cls: DSConv×2 → Conv(64→1) → QFL loss
└── reg: DSConv×2 → Conv(64→36) → DFL decode → GIoU loss
(4 × (reg_max+1) where reg_max=8)
Input (3×112×112)
│
▼
ConvBN(3→64, s=2) → 56×56
DWConv(64→64, s=1) → 56×56
│
▼
Bottleneck Stages
Stage1: IRB(64→64, s=2) + IRB×3 → 28×28
Stage2: IRB(64→128, s=2) + IRB×5 → 14×14
Stage3: IRB(128→128, s=2) + IRB×1 → 7×7
│
▼
ConvBN(128→512, 1×1) → 7×7
GDConv(512→512, k=7, groups=512) → 1×1
│
▼
Flatten → BatchNorm1d → L2-Normalize → 512-d embedding
│
▼
ArcFace Head
cos(θ+m)×s → CrossEntropy Loss
每次训练自动生成结构化日志 (training_log.json):
{
"epoch": 25,
"lr": 0.007842,
"elapsed_s": 121.8,
"train": {
"loss": 28.45,
"loss_cls": 21.30,
"loss_feat": 0.95,
"loss_rel": 0.005,
"cls_top1": 0.023,
"cls_top5": 0.089
},
"val": {
"accuracy": 0.9561,
"tar_at_far_1e_3": 0.0878,
"tar_at_far_1e_4": 0.0416,
"eer": 0.2646
}
}整理实验记录:
python tools/eval/freeze_baseline.py \
--task recognition \
--exp-id REC_DISTILL_001 \
--train-dir runs/train/recognition/REC_DISTILL_001 \
--eval-dir runs/eval/recognition/REC_DISTILL_001- SCRFD: Sample and Computation Redistribution for Efficient Face Detection (Guo et al., 2021)
- MobileFaceNet: MobileFaceNets: Efficient CNNs for Accurate Real-Time Face Verification (Chen et al., 2018)
- ArcFace: ArcFace: Additive Angular Margin Loss for Deep Face Recognition (Deng et al., 2019)
- DFL: Generalized Focal Loss (Li et al., 2020)
- InsightFace: InsightFace: 2D and 3D Face Analysis Project
MIT License — 详见 LICENSE 文件。
Built with ❤️ for the open-source face recognition community