-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathindex.html
More file actions
188 lines (162 loc) · 10.1 KB
/
index.html
File metadata and controls
188 lines (162 loc) · 10.1 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Automatically Labeling Clinical Trial Outcomes: A Large-Scale Benchmark for Drug Development</title>
<style>
/* Remove or minimize custom styles as Bootstrap will handle most of the styling */
body {
font-family: sans-serif;
color: #333;
padding-top: 70px; /* Add padding to body to account for fixed navbar height */
}
html {
scroll-padding-top: 70px; /* Account for fixed navbar height during smooth scroll */
}
#wrapper {
margin: 3em auto;
max-width: 1000px;
}
h1 {
font-weight: bold;
color: #FF5F0F;
text-align: center;
}
h2 {
font-weight: bold;
color: #FF5F0F;
}
blockquote {
border-left: 5px solid #CCC;
padding-left: 20px;
margin-left: 0;
}
li {
margin: 6px;
}
ul {
list-style: square;
}
.latest {
font-weight: bold;
color: #FF5F0F;
}
</style>
</head>
<body>
<nav class="navbar navbar-default navbar-fixed-top">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false"
aria-controls="navbar">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
</div>
<div id="navbar" class="navbar-collapse collapse">
<ul class="nav navbar-nav" id="header-links">
<li><a href="#abstract">Abstract</a></li>
<li><a href="#data-viewer">Dataset Viewer</a></li>
<li><a href="#usage-instructions">Usage Instructions</a></li>
<li><a href="#citation">Citation</a></li>
<li><a href="#other-material">Other Material and Related Work</a></li>
<li><a href="#acknowledgements">Special Thanks</a></li>
<li><a href="#license">License</a></li>
</ul>
</div>
</div>
</nav>
<div class="container" id="wrapper">
<h1 class="text-center">Automatically Labeling Clinical Trial Outcomes: A Large-Scale Benchmark for Drug Development</h1>
<p class="text-muted text-center">Chufan Gao, Jathurshan Pradeepkumar, Trisha Das, Shivashankar Thati, Jimeng Sun</p>
<p class="text-muted text-center">University of Illinois Urbana-Champaign</p>
<div style="margin:2em 0" class="text-center">
<p>
<a href="https://huggingface.co/datasets/chufangao/CTO" class="btn btn-primary button">Download Dataset</a>
<a href="https://github.com/chufangao/ctod" class="btn btn-primary button">Code</a>
<a href="https://arxiv.org/abs/2406.10292" class="btn btn-primary button">Paper</a>
<a href="https://chufangao.github.io/CTOD/tutorials/resources/CTO_slides.pdf" class="btn btn-primary button">Slides</a>
<a href="https://github.com/chufangao/CTOD/tree/main/tutorials" class="btn btn-primary button">Tutorial Notebooks</a>
<a href="https://chufangao.github.io/CTOD/tutorials/resources/CTO_intro.mp4" class="btn btn-primary button">Introduction Video</a>
</p>
</div>
<!-- <p class="text-muted"><em>Note: The paper is from a previous version and will be updated as soon as possible to reflect the changes we have made.</em></p> -->
<hr>
<h2 id="abstract">Abstract</h2>
<p><strong>Background</strong> The cost of drug discovery and development is substantial, with clinical trial outcomes playing a critical role in regulatory approval and patient care. However, access to large-scale, high-quality clinical trial outcome data remains limited, hindering advancements in predictive modeling and evidence-based decision-making.</p>
<p><strong>Methods</strong> We present the Clinical Trial Outcome (CTO) benchmark, a fully reproducible, large-scale repository encompassing approximately 125,000 drug and biologics trials. CTO integrates large language model (LLM) interpretations of publications, trial phase progression tracking, sentiment analysis from news sources, stock price movements of trial sponsors, and additional trial-related metrics. Furthermore, we manually annotated a dataset of clinical trials conducted between 2020 and 2024 to enhance the quality and reliability of outcome labels.</p>
<p><strong>Results</strong>
The trial outcome labels in the CTO benchmark agree strongly with expert annotations, achieving an F1 score of 94 for Phase 3 trials and 91 across all phases. Additionally, benchmarking standard machine learning models on our manually annotated dataset revealed distribution shifts in recent trials, underscoring the necessity of continuously updated labeling approaches.</p>
<p><strong>Conclusions</strong> By analyzing CTO's performance on recent clinical trials, we demonstrate the ongoing need for high-quality, up-to-date trial outcome labels. We publicly release the CTO knowledge base and annotated labels at <a href="https://chufangao.github.io/CTOD">https://chufangao.github.io/CTOD</a>, with regular updates to support research on clinical trial outcomes and inform data-driven improvements in drug development.</p>
<h3>Definition: Trial Success</h3> Clinical trial outcomes are multifaceted and have diverse implications. These outcomes can involve meeting the primary endpoint as defined in the study, advancing to the next phase of the trial, obtaining regulatory approval, impacting the financial outcome for the sponsor (either positively or negatively), and influencing patient outcomes such as adverse events and trial dropouts.
<br><br>
<b>Our paper defines the trial outcome as a binary indicator (0 for Failure and 1 for Success)</b>, inidcating whether the trial achieves its primary endpoints and can progress to the next stage of drug development. For example, for Phase 1 and 2 trials, success may mean moving to the next phase, such as from Phase 1 to Phase 2, and from Phase 2 to Phase 3. In Phase 3, success is measured by regulatory approval.
<h2 id="data-viewer">Dataset Viewer</h2>
<h3>Example Views</h3>
<ul>
<li> View human labels' study dates and overall status
<pre>SELECT
nct_id,
study_first_submitted_date,
study_first_posted_date,
completion_date,
overall_status,
labels
FROM
human_labels</pre>
<iframe src="https://huggingface.co/datasets/chufangao/CTO/embed/sql-console/epU7yTZ" frameborder="0" width="100%" height="280px"></iframe>
</li>
<li> Get interesting positive CTO predictions. I.e. Phase 3 trials with less than 100% predicted probability of success
<pre>SELECT *
FROM phase3_cto_preds
WHERE pred_proba != 1
ORDER BY pred_proba DESC
LIMIT 10</pre>
<iframe src="https://huggingface.co/datasets/chufangao/CTO/embed/sql-console/dMo9vS5" frameborder="0" width="100%" height="280px"></iframe>
We see that these trials are likely to succeed given that each trial had a positive effect on stock price and were able to be linked to a previous trial, despite there not being an explicit p-value.
</li>
</ul>
<p><b>Below is a preview of the full, raw, dataset. The full dataset + descriptions can be accessed <a href="https://huggingface.co/datasets/chufangao/CTO">here</a>.</b></p>
<iframe src="https://huggingface.co/datasets/chufangao/CTO/embed/viewer/human_labels/test" frameborder="0" width="100%"
height="560px"></iframe>
<h2 id="usage-instructions">Usage Instructions</h2>
<ul>
<li><strong>The latest version will always be shown in <a href="https://huggingface.co/datasets/chufangao/CTO">Huggingface</a>. Instructions in obtaining the full dataset is shown there as well.</strong></li>
<li>You can also load specific files using the Python <a href="https://pandas.pydata.org/docs/index.html">Pandas library</a>
<pre>import pandas as pd
CTO_phase1_preds = pd.read_csv("https://huggingface.co/datasets/chufangao/CTO/raw/main/phase1_CTO_rf.csv")
CTO_phase2_preds = pd.read_csv("https://huggingface.co/datasets/chufangao/CTO/raw/main/phase2_CTO_rf.csv")
CTO_phase3_preds = pd.read_csv("https://huggingface.co/datasets/chufangao/CTO/raw/main/phase3_CTO_rf.csv")</pre>
</li>
<li>Please see <a href="https://github.com/chufangao/CTOD/tree/main/tutorials">Tutorials</a> for examples getting started with this dataset. This includes off-the-shelf Google Collab notebooks!</li>
</ul>
<h2 id="citation">Citation</h2>
<blockquote>
<pre>@article{gao2024automatically,
title={Automatically Labeling Clinical Trial Outcomes: A Large-Scale Benchmark for Drug Development},
author={Gao, Chufan and Pradeepkumar, Jathurshan and Das, Trisha and Thati, Shivashankar and Sun, Jimeng},
journal={arXiv preprint arXiv:2406.10292},
year={2024}
}</pre>
</blockquote>
<h2 id="other-material">Other Material and Related Work</h2>
<ul>
<li><a href=https://www.linkedin.com/posts/jimengsun_automatically-labeling-200b-life-saving-activity-7221928418169212931-bFq-/>LinkedIn Post by Professor Jimeng Sun</a></li>
<li><a href=https://aiscientist.substack.com/p/musing-53-automatically-labeling>External Blog Post: Musing 53: Automatically Labeling $200B Life-Saving Datasets: A Large Clinical Trial Outcome Benchmark</a></li>
</ul>
<h2 id="acknowledgements">Special Thanks</h2>
<ul>
<li>A huge thanks to <a href="https://serpapi.com/">SerpApi</a> for their powerful news search API--an invaluable resource for scalably gathering clinical trial news, making our research faster and more efficient.</li>
</ul>
<h2 id="license">License</h2>
<p>The dataset is licensed under the <a href="https://github.com/chufangao/CTOD/blob/main/LICENSE">MIT</a> license.</p>
</div>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.12.4/jquery.min.js"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js"></script>
</body>
</html>