Improving Calibration for Long-Tailed Recognition (CVPR2021)
-
Updated
Nov 10, 2021 - Python
Improving Calibration for Long-Tailed Recognition (CVPR2021)
PyTorch implementation of our ECCV 2022 paper "Rethinking Confidence Calibration for Failure Prediction"
[ICCV 2025 CVAMD] The official implementation of the paper "Prompt4Trust: A Reinforcement Learning Prompt Augmentation Framework for Clinically-Aligned Confidence Calibration in Multimodal Large Language Models".
[ACL 2025] Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models' Uncertainty?.
[IEEE Trans. Med. Imaging] The official implementation of the paper "Improving Robustness and Reliability in Medical Image Classification with Latent-Guided Diffusion and Nested-Ensembles".
Investigation of how noise perturbations impact neural network calibration and generalisation
Service to examine data processing pipelines (e.g., machine learning or deep learning pipelines) for uncertainty consistency (calibration), fairness, and other safety-relevant aspects.
[ACL 2026 Main] VL-Calibration: Decoupled Confidence Calibration for Large Vision-Language Models Reasoning
[MICCAI 2025] The official implementation of the paper "Exposing and Mitigating Calibration Biases and Demographic Unfairness in MLLM Few-Shot In-Context Learning for Medical Image Classification".
Python framework for high quality confidence estimation of deep neural networks, providing methods such as confidence calibration and ordinal ranking
Code for enhancing Conformal Prediction using Temperature Scaling. Explore more of our work at:
Adaptive movement rehabilitation with confidence calibration — novel Movement Calibration Gap metric combining real-time pose estimation with metacognitive self-assessment
Evaluate high school math reasoning in LLMs with baseline and Chain-of-Thought (CoT) prompts. Includes confidence calibration metrics, JSON output parsing, and reliability analysis.
Multimodal deepfake detection with explainable AI, robustness validation, and calibrated trust scoring for real-world media.
FADE: AI that deliberately forgets like humans do, using memory degradation as intrinsic confidence signal. Reduces hallucinations, enables epistemic humility, solves stateful deployment. Conceptual proposal seeking implementation and validation.
The repository of our paper about confidence calibration on RAG.
Here’s a complete Streamlit app scaffold that lets you: Enter your Gemini API key in the sidebar Upload up to four MRI images Invoke Gemini’s advanced image‐analysis (labels, objects, text) View the raw JSON analytics directly in the app
yuragi — LLM Confidence Fragility Analyzer. Perturbation-driven hallucination detection with workshop-grade real benchmarks (TruthfulQA n=412 ensemble AUC 0.73, TriviaQA n=200 confidence-inversion AUC 0.75).
🔍 Analyze the mathematical reasoning abilities of the Mistral-7B model using diverse prompting techniques on multi-step math problems.
Add a description, image, and links to the confidence-calibration topic page so that developers can more easily learn about it.
To associate your repository with the confidence-calibration topic, visit your repo's landing page and select "manage topics."