-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathARCHITECTURE.txt
More file actions
98 lines (88 loc) · 5.57 KB
/
ARCHITECTURE.txt
File metadata and controls
98 lines (88 loc) · 5.57 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
================================================================================
THREE-TIER VALIDATION ARCHITECTURE
================================================================================
┌─────────────────┐
│ User Question │
│ "Is PQQ needed │
│ for M. ext?" │
└────────┬────────┘
│
↓
┌────────────────────────┐
│ SQLAgent │
│ (with 3-tier support) │
└────────────────────────┘
│
┌────────────────────────────┼────────────────────────────┐
│ │ │
↓ ↓ ↓
┌───────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ TIER 1: │ │ TIER 2: │ │ TIER 3: │
│ KG-Microbe │ │ Evidence │ │ Genome │
│ │ │ Retrieval │ │ Function │
└───────────────┘ └──────────────────┘ └──────────────────┘
│ │ │
│ │ │
↓ ↓ ↓
┌───────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ MediaDive: │ │ PDF Search: │ │ Genome Query: │
│ 0/3141 media │ │ 13 citations │ │ 5 genes found │
│ contain PQQ │ │ found for PQQ │ │ (pqqA-E) │
└───────────────┘ └──────────────────┘ └──────────────────┘
│ │ │
└────────────────────────────┼────────────────────────────┘
│
↓
┌────────────────────────┐
│ KGValidationHelper │
│ (combines all tiers) │
└────────────────────────┘
│
↓
┌────────────────────────┐
│ Validation Result │
├────────────────────────┤
│ Confidence: HIGH │
│ Evidence: 3 tiers │
│ │
│ • KG: 0 in MediaDive │
│ • PDF: 13 citations │
│ • Genome: 5 genes │
│ │
│ Decision: Include PQQ │
│ Reason: Biosynthesized │
│ but trace │
│ amounts needed │
└────────────────────────┘
DATA FLOW:
==========
1. TIER 1 (KG-Microbe): DuckDB → MediaDive stats
↓
2. TIER 2 (Evidence): If KG inconclusive → Query PDFs/web
↓
3. TIER 3 (Genome): If biosynthesis claim → Query genome annotations
↓
4. KGValidationHelper: Combine all evidence → Final decision
CONFIDENCE SCORING:
===================
Tier 1 Only: LOW (KG data alone)
Tier 1 + 2: HIGH (KG + citations)
Tier 1 + 2 + 3: VERY HIGH (KG + citations + genome)
BACKWARDS COMPATIBILITY:
========================
Old Code (still works):
validator = KGValidationHelper() # No services
# Uses hardcoded TRACE_COFACTORS, STRONG_CHELATORS
New Code (Tier 2):
evidence_service = EvidenceRetrievalService()
validator = KGValidationHelper(evidence_service=evidence_service)
# Uses KG + PDF evidence
New Code (Tier 3):
evidence_service = EvidenceRetrievalService()
genome_agent = GenomeFunctionAgent()
validator = KGValidationHelper(
evidence_service=evidence_service,
genome_agent=genome_agent
)
# Uses KG + PDF + Genome
================================================================================