Skip to content

Commit 20aa55f

Browse files
Feat/project llm validity (#46)
* feat(project): add llm validity project * fix(propic): update davide beltrame's assets/img/team/ file * feat(ui): add alternative (full) picture for project cover and support for distinct cover (all projects page) and header img (single project page) * fixes --------- Co-authored-by: giacomo-ciro <giacomociro02@gmail.com>
1 parent 6d9e12c commit 20aa55f

File tree

6 files changed

+33
-1
lines changed

6 files changed

+33
-1
lines changed

_layouts/project.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
<article class="article">
1616

1717
<div class="post-img">
18-
<img src="{{ page.cover }}" alt="{{ page.title }}">
18+
<img src="{{ page.header_image | default: page.cover }}" alt="{{ page.title }}">
1919
</div>
2020

2121
<h2 id="project-title" class="title text-center">{{ page.title }}</h2>

_projects/llm-validity.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
---
2+
title: "LLM Validity via Enhanced Conformal Prediction: Conditional Guarantees, Score Learning, and Response-Level Calibration Under Dependent Claims"
3+
type: Article
4+
status: completed
5+
date: 2026-02-07
6+
cover: /assets/img/proj/llm_validity.png
7+
header_image: /assets/img/proj/llm_validity_cropped.png
8+
authors:
9+
- name: "Davide Beltrame"
10+
avatar: "/assets/img/team/davide_beltrame.jpg"
11+
link: "https://www.linkedin.com/in/davide-beltrame/"
12+
desc: "MSc in Artificial Intelligence"
13+
---
14+
15+
Finite-sample validity guarantees for large language model (LLM) outputs are attractive because they are post-hoc and model-agnostic, but they are fragile when prompts are heterogeneous and factuality signals are noisy.
16+
17+
Cherian et al. (2024) propose enhanced conformal methods for *factuality filtering* that (i) replace marginal guarantees with function-class conditional guarantees and (ii) improve utility via level-adaptive calibration and conditional boosting that differentiates through conditional conformal cutoffs.
18+
19+
This discussion paper reviews the conformal prediction and LLM factuality context, presents the selected paper and proposes a future direction: *response-level conformalization under dependent claims*.
20+
The key idea is to treat an entire response as the exchangeable unit, use blocked calibration that keeps all claims from a response together, and calibrate response-level tail losses conditionally on prompt/response features to align guarantees with user-facing risk when claim errors are dependent.
21+
22+
<div class="d-flex align-items-center justify-content-around">
23+
<a href="/assets/reports/llm_validity_paper.pdf" class="btn-custom">Full Paper</a>
24+
<a href="https://github.com/jjcherian/conformal-safety" class="btn-custom">Code (Conformal Safety)</a>
25+
<a href="https://github.com/jjcherian/conditional-conformal" class="btn-custom">Code (Conditional Conformal)</a>
26+
</div>
27+
<br>
28+
29+
**Notes:**
30+
- This work is a discussion paper of Cherian et al. (2024).
31+
- The authors released a [filtered `MedLFQA` benchmark](https://github.com/jjcherian/conformal-safety) with non-health-related prompts removed, as well as the generated/parsed text and experiment notebooks.
32+
- Their [conditional conformal inference Python package](https://github.com/jjcherian/conditional-conformal) now supports level-adaptive conformal prediction.

assets/img/proj/llm_validity.png

213 KB
Loading
558 KB
Loading
1.25 MB
Loading
447 KB
Binary file not shown.

0 commit comments

Comments
 (0)