Summary
We want a very minimal fusion layer that combines per-image predictions from our existing backbones (e.g., FasterViT, EfficientCNN, etc.) into a single, more accurate score. The emphasis is on fast, lightweight integration with our small models. We can add a compact meta learner (logistic regression, XGBoost, CatBoost, or something similar) if we find it delivers a clear accuracy gain without too much latency overhead. This is likely a longer term integration challenge.
Minimal fusion design
- Inputs and outputs: accept per-model logits and probabilities for each face image; return one fused probability.
- Methods (incremental):
- simple mean and weighted mean,
- calibrated averaging
- stacking with tiny meta learners (logistic regression first, then XGBoost or CatBoost with shallow depth and few trees).
- Training data protocol: build out-of-fold (OOF) predictions on the training split to avoid leakage when fitting the meta learner.
- Evaluation: compare AUC against the best single model and simple averaging.
Goals
- Create a small fusion API (inputs: per-model scores; output: fused score).
- Implement simple and calibrated averaging; add logistic regression baseline.
- Add XGBoost or CatBoost stacking behind a flag (using OOF training data).
- Provide a minimal script to generate OOF predictions and train the meta model.
Summary
We want a very minimal fusion layer that combines per-image predictions from our existing backbones (e.g., FasterViT, EfficientCNN, etc.) into a single, more accurate score. The emphasis is on fast, lightweight integration with our small models. We can add a compact meta learner (logistic regression, XGBoost, CatBoost, or something similar) if we find it delivers a clear accuracy gain without too much latency overhead. This is likely a longer term integration challenge.
Minimal fusion design
Goals