You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/content/projects/hiperrag-literature-extraction.md
+15-14Lines changed: 15 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
---
2
2
title: "HiPerRAG for Literature-based Data Extraction on Priority Pathogens"
3
-
description: "Leveraging high-performance retrieval-augmented generation to extract and curate structured biological data for CEPI priority pathogens from scientific literature"
3
+
description: "Leveraging high-performance retrieval-augmented generation to extract and curate structured biological data for priority pathogens from scientific literature"
| Blessy Antony | Virginia Tech (Non-BRC) | Data integration and validation |
30
+
| James McFeeters | CViSB (Non-BRC) | Scientific Advisor |
31
+
| Maliha Aziz | George Washington University (Non-BRC) | Data integration and validation |
32
+
| Ozan Gokdemir | Argonne National Laboratory, BV-BRC/ANL | AI/ML Engineer, Data Curation Lead |
33
+
| Yitian Chen | Scripps Research (Non-BRC) | Priority Pathogen Data Alignment |
33
34
34
35
**Project Summary**
35
36
36
37
This project leverages HiPerRAG—a high-performance retrieval-augmented generation system optimized for large scientific corpora—to extract and curate structured data for priority pathogens. By targeting key relationship types such as protein–protein interactions (PPIs), host–pathogen interactions, and drug–protein binding data, the project aims to produce curated, machine-readable datasets for integration with BV-BRC knowledgebases.
37
38
38
39
**Goals and Objectives**
39
40
40
-
- Goal 1: Define target data types relevant to CEPI and BV-BRC (e.g., PPIs, drug-protein interactions)
41
+
- Goal 1: Define target data types relevant to BV-BRC (e.g., PPIs, drug-protein interactions)
41
42
- Goal 2: Deploy HiPerRAG on relevant literature corpora to extract structured relationships
@@ -56,11 +57,11 @@ HiPerRAG will be configured to parse biomedical literature and extract relations
56
57
57
58
**Expected Outcomes / Deliverables**
58
59
59
-
Curated datasets of structured biological relationships for CEPI priority pathogens, integrated into BV-BRC pipelines.
60
+
Curated datasets of structured biological relationships for priority pathogens, integrated into BV-BRC pipelines.
60
61
61
62
**Potential Impact and Next Steps**
62
63
63
-
This project demonstrates scalable AI-driven literature mining for infectious disease research. It will enable automated knowledge enrichment and accelerate understanding of pathogen biology, supporting CEPI's 100-day mission and BV-BRC's informatics goals.
64
+
This project demonstrates scalable AI-driven literature mining for infectious disease research. It will enable automated knowledge enrichment and accelerate understanding of pathogen biology, supporting BV-BRC's informatics goals.
| David Moi | University of Lausanne | Project leader (model development) / AI, ML, snakemake, HPC, docker, structural biology, phylogenetics |
31
-
| Dongwook Kim | University of Lausanne | Project leader (validation and integration) / Phylogenetics, sequence analysis, protein structures, Protein Language Models |
32
-
| TBD | - | Model development part / AI expert(s) on transformer models, biological language models, sequence embeddings |
| Alex Partin | Argonne National Laboratory, BV-BRC/ANL | Model development part / AI expert(s) on transformer models, biological language models, sequence embeddings |
32
+
| Christian Zmasek | J. Craig Venter Institute, BV-BRC | Validation part / Biology and/or Bioinformatics expert(s) on virology, viral taxonomy, sequence analysis, sequence database |
33
+
| Dave Moi | University of Lausanne / SIB, PDN | Project leader (model development) / AI, ML, snakemake, HPC, docker, structural biology, phylogenetics |
34
+
| Dongwook Kim | University of Lausanne / SIB, PDN | Project leader (validation and integration) / Phylogenetics, sequence analysis, protein structures, Protein Language Models |
35
+
| Jamie Overbeek | Argonne National Laboratory, BV-BRC/ANL | Model development part / AI expert(s) on transformer models, biological language models, sequence embeddings |
0 commit comments