Skip to content

Commit 10686c1

Browse files
committed
docs: update README with current architecture diagram
UPDATED ARCHITECTURE: - Replaced outdated simple architecture diagram with comprehensive layered architecture showing all current components - Added visual representation of 5 architectural layers: * Frontend Layer (React/Next.js) * API Layer (FastAPI) * Agent & Pipeline Layer (Deep, Document, Synthesis, Q&A) * Intelligence Layer (LLM, Memory, Retrieval, Skills) * Processing Layer (Parsers, Analyzers, Processors, Guardrails) * Storage Layer (Postgres+PGVector, MinIO, Cache) - Documented new modules added in recent phases: * analyzers/ - Quality analysis & benchmarking * conversation/ - Conversational AI & context management * exploration/ - Interactive document exploration * processors/ - Document & text processors * qa/ - Question-answering systems * synthesis/ - Document synthesis & generation MODULE STRUCTURE: - Added complete src/ directory structure with 22 modules - Clearly marked NEW modules from recent development - Shows relationships between layers and data flows DOCUMENTATION NOTES: - Added note about archived historical documentation - Linked to doc/.archive/README.md for archived content index - Fixed markdown linting issues (blank lines around lists) This update ensures README accurately reflects the current state of the codebase after Phase 1-3 implementations and recent quality enhancements. Related: Root cleanup commit 5f1d7ad
1 parent 5f1d7ad commit 10686c1

File tree

2 files changed

+257
-22
lines changed

2 files changed

+257
-22
lines changed

README.md

Lines changed: 94 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -97,28 +97,97 @@ The Unstructured Data RAG Platform provides:
9797

9898
## Architecture
9999

100+
### System Overview
101+
100102
```
101-
┌───────────────┐
102-
│ Frontend │
103-
│ (React/Next) │
104-
└───────▲───────┘
105-
106-
107-
┌──────────────┐
108-
│ Backend │
109-
│ (FastAPI) │
110-
└──────▲───────┘
111-
112-
┌───────────────────┼───────────────────┐
113-
▼ ▼ ▼
114-
[Parsers] [Postgres+PGVector] [MinIO]
115-
(PDF/Word/ (JSON + embeddings) (raw files,
116-
PlantUML/Drawio) binaries, images)
117-
118-
┌───────────────────────────────────┐
119-
│ LangChain DeepAgent │
120-
│ Retrieval + Generation + Judge │
121-
└───────────────────────────────────┘
103+
┌─────────────────────────────────────────────────────────────────┐
104+
│ Frontend Layer │
105+
│ (React/Next.js UI) │
106+
└──────────────────────────────┬──────────────────────────────────┘
107+
108+
109+
┌─────────────────────────────────────────────────────────────────┐
110+
│ API Layer (FastAPI) │
111+
└──────────────────────────────┬──────────────────────────────────┘
112+
113+
114+
┌─────────────────────────────────────────────────────────────────┐
115+
│ Agent & Pipeline Layer │
116+
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
117+
│ │ Deep │ │Document │ │Synthesis │ │ Q&A │ │
118+
│ │ Agent │ │ Agent │ │ Agent │ │ Agent │ │
119+
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
120+
│ │
121+
│ ┌──────────────────────────────────────────────────────────┐ │
122+
│ │ Pipelines: Chat Flow │ Document │ Conversation │ │
123+
│ └──────────────────────────────────────────────────────────┘ │
124+
└────────┬────────────┬────────────┬────────────┬─────────────────┘
125+
│ │ │ │
126+
▼ ▼ ▼ ▼
127+
┌─────────────────────────────────────────────────────────────────┐
128+
│ Intelligence Layer │
129+
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
130+
│ │ LLM │ │ Memory │ │Retrieval │ │ Skills │ │
131+
│ │ Clients │ │ (Short/ │ │ (Vector │ │ (Web, │ │
132+
│ │(OpenAI/ │ │ Long) │ │ Search) │ │ Code) │ │
133+
│ │Anthropic)│ └──────────┘ └──────────┘ └──────────┘ │
134+
│ └──────────┘ │
135+
└────────┬────────────┬────────────┬─────────────────────────────┘
136+
│ │ │
137+
▼ ▼ ▼
138+
┌─────────────────────────────────────────────────────────────────┐
139+
│ Processing Layer │
140+
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
141+
│ │ Parsers │ │Analyzers │ │Processors│ │Guardrails│ │
142+
│ │(PDF/DOCX/│ │ (Quality │ │ (Doc/ │ │ (PII/ │ │
143+
│ │PlantUML) │ │ Checks) │ │ Text) │ │ Safety) │ │
144+
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
145+
└────────┬────────────┬─────────────────────────────────────┬─────┘
146+
│ │ │
147+
▼ ▼ ▼
148+
┌─────────────────────────────────────────────────────────────────┐
149+
│ Storage Layer │
150+
│ ┌──────────────────┐ ┌──────────────────┐ ┌────────────────┐│
151+
│ │ Postgres + │ │ MinIO │ │ Cache/Log ││
152+
│ │ PGVector │ │ (Raw Files) │ │ (Redis) ││
153+
│ │(JSON+Embeddings) │ │ │ │ ││
154+
│ └──────────────────┘ └──────────────────┘ └────────────────┘│
155+
└─────────────────────────────────────────────────────────────────┘
156+
157+
▲ ▲
158+
│ │
159+
└──────────┬───────────────┘
160+
161+
┌──────────────────────┐
162+
│ Prompt Engineering │
163+
│ & Conversation │
164+
│ Management │
165+
└──────────────────────┘
166+
```
167+
168+
### Core Module Structure
169+
170+
```
171+
src/
172+
├── agents/ → Agent implementations (Deep, Document, Synthesis, Q&A)
173+
├── analyzers/ → Quality analysis & benchmarking (NEW)
174+
├── conversation/ → Conversational AI & context management (NEW)
175+
├── exploration/ → Interactive document exploration (NEW)
176+
├── fallback/ → LLM fallback & recovery logic
177+
├── guardrails/ → PII filtering, safety, validation
178+
├── handlers/ → Input/output processing, error handling
179+
├── llm/ → Multi-provider LLM clients (OpenAI, Anthropic, etc.)
180+
├── memory/ → Short-term & long-term memory
181+
├── parsers/ → Document parsers (PDF, DOCX, PlantUML, Mermaid, DrawIO)
182+
├── pipelines/ → Workflow orchestration
183+
├── processors/ → Document & text processors (NEW)
184+
├── prompt_engineering/ → Template management, few-shot learning
185+
├── qa/ → Question-answering systems (NEW)
186+
├── retrieval/ → Vector search & document retrieval
187+
├── skills/ → Agent capabilities (web search, code execution)
188+
├── synthesis/ → Document synthesis & generation (NEW)
189+
├── utils/ → Logging, caching, rate limiting, tokens
190+
└── vision_audio/ → Multimodal processing
122191
```
123192

124193
---
@@ -255,11 +324,14 @@ Detailed documentation for specific features:
255324
- **[Architecture Details](doc/architecture/)** - System architecture templates and domain context
256325
- **[Business Documentation](doc/business/)** - Stakeholder analysis and differentiation strategy
257326
- **[Specifications](doc/specs/)** - Feature specs and templates
258-
- **[Historical Docs](doc/.archive/)** - Implementation reports and phase summaries
327+
- **[Historical Docs](doc/.archive/)** - Archived implementation reports and phase summaries from previous development cycles
328+
329+
> **Note:** Historical documentation from previous implementation phases has been archived to `doc/.archive/` to maintain a clean root directory. See the [archive README](doc/.archive/README.md) for a complete index of archived files.
259330
260331
### Contributing to Documentation
261332

262333
If you would like to contribute to the documentation, please:
334+
263335
1. Follow the templates in `doc/specs/`
264336
2. Ensure code examples are tested and work
265337
3. Submit a pull request with clear description

ROOT_CLEANUP_COMPLETE.md

Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
# Root Directory Cleanup - COMPLETE ✅
2+
3+
## Summary
4+
5+
Successfully cleaned up the root directory by archiving 37 historical markdown files, reducing root clutter from 44+ files to just 8 core project files.
6+
7+
## What Was Done
8+
9+
### 1. File Categorization and Review
10+
- Identified all 44+ markdown files in root directory
11+
- Created categorization plan (ROOT_CLEANUP_PLAN.md)
12+
- Reviewed key files for unique content before archiving
13+
- Integrated valuable information into proper documentation
14+
15+
### 2. Content Integration
16+
**Updated: `doc/developer-guide/development-setup.md`**
17+
- Extracted test script setup workflow from AGENTS.md
18+
- Added "Option A: Using Test Script" section
19+
- Documented `.venv_ci` isolated testing environment
20+
- Preserved unique information before archiving AGENTS.md
21+
22+
### 3. Archive Organization
23+
Created organized archive structure at `doc/.archive/`:
24+
25+
```
26+
doc/.archive/
27+
├── README.md # Comprehensive index and navigation
28+
├── phase1/ # Phase 1 implementation docs (1 file)
29+
├── phase2/ # Phase 2 implementation docs (2 files)
30+
├── phase3/ # Phase 3 implementation docs (2 files)
31+
└── working-docs/ # Operational documents (32+ files)
32+
├── Summary reports (10)
33+
├── Analysis & status (13)
34+
├── Quick reference (4)
35+
├── Completion reports (5)
36+
└── Planning docs (2)
37+
```
38+
39+
### 4. Files Moved (37 total)
40+
41+
**Phase Documentation:**
42+
- `PHASE_1_IMPLEMENTATION_SUMMARY.md` → phase1/
43+
- `PHASE_2_COMPLETION_STATUS.md` → phase2/
44+
- `PHASE_2_IMPLEMENTATION_SUMMARY.md` → phase2/
45+
- `PHASE_3_COMPLETE.md` → phase3/
46+
- `PHASE_3_PLAN.md` → phase3/
47+
48+
**Working Documents (32 files):**
49+
50+
*Summary Reports:*
51+
- AGENT_CONSOLIDATION_SUMMARY.md
52+
- CONFIG_UPDATE_SUMMARY.md
53+
- DELIVERABLES_SUMMARY.md
54+
- DOCLING_REORGANIZATION_SUMMARY.md
55+
- DOCUMENT_PARSER_ENHANCEMENT_SUMMARY.md
56+
- ITERATION_SUMMARY.md
57+
- REORGANIZATION_SUMMARY.md
58+
- TEST_FIXES_SUMMARY.md
59+
- TEST_RESULTS_SUMMARY.md
60+
- TEST_VERIFICATION_SUMMARY.md
61+
62+
*Analysis & Status:*
63+
- BENCHMARK_RESULTS_ANALYSIS.md
64+
- CEREBRAS_ISSUE_DIAGNOSIS.md
65+
- CI_PIPELINE_STATUS.md
66+
- CODE_QUALITY_IMPROVEMENTS.md
67+
- CONSISTENCY_ANALYSIS.md
68+
- DEPLOYMENT_CHECKLIST.md
69+
- INTEGRATION_ANALYSIS_requirements_agent.md
70+
- PR_UPDATE.md
71+
- PRE_TASK4_ENHANCEMENTS.md
72+
- DOCUMENT_AGENT_CONSOLIDATION.md
73+
- EXAMPLES_FOLDER_REORGANIZATION.md
74+
- STREAMLIT_UI_IMPROVEMENTS.md
75+
- TEST_EXECUTION_REPORT.md
76+
77+
*Quick Reference & Setup:*
78+
- QUICK_REFERENCE.md
79+
- DOCUMENTAGENT_QUICK_REFERENCE.md
80+
- STREAMLIT_QUICK_START.md
81+
- OLLAMA_SETUP_COMPLETE.md
82+
83+
*Completion Reports:*
84+
- API_MIGRATION_COMPLETE.md
85+
- CONSOLIDATION_COMPLETE.md
86+
- DOCUMENTATION_CLEANUP_COMPLETE.md
87+
- PARSER_CONSOLIDATION_COMPLETE.md
88+
- REORGANIZATION_COMPLETE.md
89+
90+
*Planning & Tracking:*
91+
- GIT_COMMIT_SUMMARY.md
92+
- ROOT_CLEANUP_PLAN.md
93+
94+
## Final State
95+
96+
### Root Directory (8 core files only) ✅
97+
```
98+
AGENTS.md # Agent system documentation
99+
CODE_OF_CONDUCT.md # Community guidelines
100+
CONTRIBUTING.md # Contribution guide
101+
LICENSE.md # MIT License
102+
NOTICE.md # Legal notices
103+
README.md # Project overview
104+
SECURITY.md # Security policy
105+
SUPPORT.md # Support information
106+
```
107+
108+
### Archive (64 total files)
109+
- Comprehensive archive index: `doc/.archive/README.md`
110+
- Organized by category with full descriptions
111+
- All file history preserved via git mv
112+
- Easy search and navigation instructions
113+
114+
## Git Operations
115+
116+
All moves performed with `git mv` to preserve file history:
117+
- 37 files successfully archived
118+
- Version control history intact
119+
- Single comprehensive commit created
120+
121+
**Commit:** `5f1d7ad` - "docs: clean root directory and archive historical documentation"
122+
123+
## Benefits Achieved
124+
125+
1. **Professional Appearance**: Root directory now clean and organized
126+
2. **Better Navigation**: Easy to find core project files
127+
3. **History Preserved**: All historical docs available in organized archive
128+
4. **Content Integrated**: Unique information moved to proper documentation
129+
5. **Searchable Archive**: Comprehensive index with search guidance
130+
6. **Maintainable**: Clear structure for future documentation
131+
132+
## Access Archived Content
133+
134+
```bash
135+
# View archive index
136+
cat doc/.archive/README.md
137+
138+
# List all archived files
139+
find doc/.archive -name "*.md" | sort
140+
141+
# Search archived content
142+
grep -r "search term" doc/.archive/
143+
144+
# View specific file
145+
cat doc/.archive/working-docs/FILENAME.md
146+
```
147+
148+
## Next Steps
149+
150+
- [x] Root directory cleaned (8 core files)
151+
- [x] Archive organized (64 files indexed)
152+
- [x] Content integrated into documentation
153+
- [x] All changes committed
154+
- [ ] Push to remote (optional - ready when needed)
155+
- [ ] Update any external links (if applicable)
156+
157+
---
158+
159+
**Completed:** December 2024
160+
**Total Files Archived:** 37 (64 total in archive)
161+
**Root Files Remaining:** 8 core project files
162+
**Archive Location:** `doc/.archive/`
163+
**Status:** ✅ COMPLETE

0 commit comments

Comments
 (0)