diff --git a/_layouts/project.html b/_layouts/project.html index 9b18b0d..7d0ac8b 100644 --- a/_layouts/project.html +++ b/_layouts/project.html @@ -15,7 +15,7 @@
- {{ page.title }} + {{ page.title }}

{{ page.title }}

diff --git a/_projects/llm-validity.md b/_projects/llm-validity.md new file mode 100644 index 0000000..e92b5c1 --- /dev/null +++ b/_projects/llm-validity.md @@ -0,0 +1,32 @@ +--- +title: "LLM Validity via Enhanced Conformal Prediction: Conditional Guarantees, Score Learning, and Response-Level Calibration Under Dependent Claims" +type: Article +status: completed +date: 2026-02-07 +cover: /assets/img/proj/llm_validity.png +header_image: /assets/img/proj/llm_validity_cropped.png +authors: + - name: "Davide Beltrame" + avatar: "/assets/img/team/davide_beltrame.jpg" + link: "https://www.linkedin.com/in/davide-beltrame/" + desc: "MSc in Artificial Intelligence" +--- + +Finite-sample validity guarantees for large language model (LLM) outputs are attractive because they are post-hoc and model-agnostic, but they are fragile when prompts are heterogeneous and factuality signals are noisy. + +Cherian et al. (2024) propose enhanced conformal methods for *factuality filtering* that (i) replace marginal guarantees with function-class conditional guarantees and (ii) improve utility via level-adaptive calibration and conditional boosting that differentiates through conditional conformal cutoffs. + +This discussion paper reviews the conformal prediction and LLM factuality context, presents the selected paper and proposes a future direction: *response-level conformalization under dependent claims*. +The key idea is to treat an entire response as the exchangeable unit, use blocked calibration that keeps all claims from a response together, and calibrate response-level tail losses conditionally on prompt/response features to align guarantees with user-facing risk when claim errors are dependent. + +
+ Full Paper + Code (Conformal Safety) + Code (Conditional Conformal) +
+
+ +**Notes:** +- This work is a discussion paper of Cherian et al. (2024). +- The authors released a [filtered `MedLFQA` benchmark](https://github.com/jjcherian/conformal-safety) with non-health-related prompts removed, as well as the generated/parsed text and experiment notebooks. +- Their [conditional conformal inference Python package](https://github.com/jjcherian/conditional-conformal) now supports level-adaptive conformal prediction. diff --git a/assets/img/proj/llm_validity.png b/assets/img/proj/llm_validity.png new file mode 100644 index 0000000..a34f6a1 Binary files /dev/null and b/assets/img/proj/llm_validity.png differ diff --git a/assets/img/proj/llm_validity_cropped.png b/assets/img/proj/llm_validity_cropped.png new file mode 100644 index 0000000..2b39431 Binary files /dev/null and b/assets/img/proj/llm_validity_cropped.png differ diff --git a/assets/img/team/davide_beltrame.jpg b/assets/img/team/davide_beltrame.jpg index 0a18f49..57554a9 100644 Binary files a/assets/img/team/davide_beltrame.jpg and b/assets/img/team/davide_beltrame.jpg differ diff --git a/assets/reports/llm_validity_paper.pdf b/assets/reports/llm_validity_paper.pdf new file mode 100644 index 0000000..15a0057 Binary files /dev/null and b/assets/reports/llm_validity_paper.pdf differ