Toward Next-Generation Mental Health Applications: An Adaptive Approach to Idiographic Modelling by Leveraging Ontology-based Agentic AI
The PHOENIX engine conceptualises mental health support as a closed-loop workflow that iteratively optimizes the digital intervention proposal based on multi-modal data from previous collection cycles.
(Personalized Hierarchical Optimization Engine for Navigating Insightful eXplorations)
- 🏛️ Academic Context
- 🧭 PHOENIX Scope
- 🔁 End-to-End Stage Map
- 🐦🔥 PHOENIX Ontology
- 🏗️ Technical Architecture
- 🚀 Quick Setup
- 🗂️ Repository Structure
- 💻 Run from CLI
- 🖥️ Run from Frontend
- 📦 Outputs and Validation
- 📊 PHOENIX Engine Evaluation Framework
- ✅ Quality Assurance and CI/CD
- 📜 License
This research-grade software is being created for a Ghent University master's thesis that aims to enhance the clinical translation abilities of longitudinal mental health applications: toward an adaptive approach for idiographic modelling by using ontology-based multi-agentic workflows.
| Field | Value |
|---|---|
| Institution | Ghent University |
| Author | Stijn Van Severen |
| Supervisors | Geert Crombez, Annick De Paepe |
PHOENIX separates two concerns:
- Core engine flow: clinical/analytic decision flow from intake to iterative model carry-over.
- Research support flow: visualization, QA, and research reporting for validation and communication.
This separation keeps scientific validation transparent without mixing support tasks into core decision logic.
PHOENIX is a modular, multi-agent system that starts from a free-text mental-health complaint, builds an initial observation model, analyses idiographic time-series dynamics through the Hierarchical Updating Algorithm (HUA), proposes biopsychosocially-balanced treatment targets, generates a HAPA-grounded digital intervention, and packages iterative updates for the next cycle. Every generative actor stage is paired with a critic agent that issues a bounded PASS / REVISE decision on a weighted composite score, which gives the full pipeline an auditable trail without sacrificing generative flexibility.
Five sub-ontologies constrain all reasoning and output structure across the PHOENIX pipeline: (1) CRITERION (i.e., mental health problem space: DSM-5TR, ICD-10, RDoC-701, non-clinical wellbeing), (2) PREDICTOR (i.e., intervention solution space: BIO / PSYCHO / SOCIAL branches), (3) PERSON (i.e., stable individual-level attributes across 18 domains), (4) CONTEXT (i.e., dynamic situational states: internal and external environment), and (5) HAPA (i.e., behaviour change scaffold: motivation phase, volition phase, barriers taxonomy, coping strategy library). See src/backend/SystemComponents/PHOENIX_ontology/ for the full structured breakdown.
All stages are constrained by five stable ontologies that enforce structural guarantees across the full pipeline:
| Ontology | Role | Source |
|---|---|---|
| CRITERION | Operationalized mental health variables (DSM-5-TR, RDoC) | src/backend/SystemComponents/PHOENIX_ontology/separate/CRITERION/ |
| PREDICTOR | Hierarchically structured treatment-solution entities used to model actionable intervention pathways and candidate refinement space | src/backend/SystemComponents/PHOENIX_ontology/separate/PREDICTOR/ |
| PERSON | Individual characteristics (demographics, comorbidity, history) | src/backend/SystemComponents/PHOENIX_ontology/separate/PERSON/ |
| CONTEXT | Situational and environmental factors | src/backend/SystemComponents/PHOENIX_ontology/separate/CONTEXT/ |
| HAPA | Health Action Process Approach (barriers, coping, phases) | src/backend/SystemComponents/PHOENIX_ontology/separate/HAPA/ |
In the real integrated pipeline, all decision-making stages use live LLM reasoning in normal operation, but they do so with different scaffolds. Step 01 combines always-on complaint decomposition, a local LLM critic loop, hybrid ontology retrieval, and optional final leaf adjudication, while later stages use explicit actor-critic loops with bounded iterations and heuristic fallback only for deterministic or degraded runs.
| Integrated Step | Runtime Component | Live LLM Use | Critic Dimensions | Core Runtime Method |
|---|---|---|---|---|
| 01 | Complaint Operationalization Agent | Yes, always-on for free-text decomposition and local critic review; optional again for final leaf adjudication | schema_validity, coverage_grounding, atomicity_nonoverlap, granularity_fit, current_actionability | LLM decomposition + local LLM critic refinement + HTSSF retrieval (dense + BM25 + token overlap + fuzzy) |
| 02 | Initial Observation Model Constructor | Yes, real-time HyDE generation and structured model construction | predictor_grounding, criterion_continuity, ontology_strictness, evidence_quality | HyDE-based predictor RAG + actor-critic refinement |
| 03 | Treatment Target Identifier | Yes, real-time structured actor output when enabled | safety, domain_boundary, lineage_consistency | BFS candidate selector + idiographic-nomothetic fusion |
| 04 | Updated Observation Model Constructor | Yes, real-time structured actor output when enabled | safety, domain_boundary, lineage_consistency | BFS-guided hierarchical model update with ontology-constrained refinement |
| 05 | HAPA Intervention Mapper | Yes, real-time structured intervention generation when enabled | reasoning_quality, evidence_grounding, hapa_consistency, medical_safety | Barrier scoring: 0.60·predictor + 0.20·profile + 0.15·context + 0.05·complaint |
Runtime interpretation:
- Step 01 is not just retrieval: it always begins with LLM-based complaint decomposition, with no non-LLM fallback for that decomposition phase, and then runs a local structured LLM critic loop to review complaint coverage, granularity, overlap, and present-state actionability before mapping to ontology leaves.
- Step 01 still uses hybrid retrieval for ontology grounding, and can optionally run an additional LLM adjudication pass to choose the final leaf or return
UNMAPPED. - Steps 02, 03, 04, and 05 use live LLM calls in normal operation, with component-specific actor-critic prompts, bounded iterations, and heuristic fallback paths for deterministic or degraded runs.
- The intervention module is Step 05 in the actual integrated pipeline, even if some earlier summaries compressed it into a four-stage abstraction.
Optional DAG orchestrator (src/backend/orchestrator.py): for complex tasks, a flexible orchestrator creates DAG-based parallel/sequential execution plans — otherwise the pipeline runs sequentially (primary evaluation path).
Quantitative backbone bridging EMA data to adaptive model weighting:
- Readiness classifier — stationarity (ADF/KPSS), collinearity, effective sample size → tier selection (tv-gVAR / gVAR / GGM / correlation / descriptives)
- Network time-series analyst — kernel-smoothed VAR(1), L1-penalized stationary gVAR, partial correlations (Ledoit-Wolf shrinkage), time-varying GIF animations
- Momentary impact quantifier — leave-one-predictor-out MSE delta + coefficient magnitude composite
- BFS candidate selector —
score = 0.45·mapping + 0.25·HyDE + 0.20·idiographic_anchor + 0.10·domain_bonus
Adaptive idiographic-nomothetic weighting per cycle:
idiographic_weight = clamp(0.30 + 0.50 × readiness_score / 100)
nomothetic_weight = 1.0 - idiographic_weight
PHOENIX implements a breadth-first iterative algorithm across cycles:
- Cycle N produces: criterion leaf, initial model, pseudodata, HUA results, treatment targets, HAPA intervention
- Cycle N+1 seeds from Cycle N via a history ledger: impact scores →
idiographic_anchorin BFS; prior cycle scores modulatedomain_bonus;composite_score = 0.35·similarity + 0.25·impact[N] + 0.15·target_scores + 0.10·priority_scores + 0.15·quality_scores
git clone https://github.com/stvsever/ThesisMaster.git
cd MASTERPROEFpython3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txtCreate or update .env in repository root:
OPENROUTER_API_KEY=<your_openrouter_key>
OPENAI_BASE_URL=https://openrouter.ai/api/v1Runtime behavior:
OPENROUTER_API_KEYis primary.- Runtime mirrors it to
OPENAI_API_KEYfor backward-compatible scripts. - Default model is
gpt-5-nano(resolved asopenai/gpt-5-nanowhen routed via OpenRouter).
If you want to quickly validate the integrated pipeline on a single profile with minimal iterations, you can run the smoke test:
make pipeline-smokePHOENIX ships with a ready-to-use Docker configuration for reproducible execution without a local Python environment:
git clone https://github.com/stvsever/ThesisMaster.git
cd MASTERPROEF
# Optional for LLM-enabled runs; deterministic mode can skip this.
cat > .env <<'EOF'
OPENROUTER_API_KEY=<your_openrouter_key>
OPENAI_BASE_URL=https://openrouter.ai/api/v1
EOF
cd docker
docker compose up --buildThis starts the Flask frontend on http://127.0.0.1:5050. The setup bundles all dependencies, mounts pipeline outputs back to the host, and supports CLI runs through the phoenix-cli service. See docker/README.md for the full workflow.
A client-side graph creator (GitNexus) was used to generate a comprehensive knowledge graph of the entire codebase; its component interactions are provided below:
The main codebase is organized around src/ and evaluation/. Inside src/, the canonical runtime split is now src/frontend/ for the Flask application and src/backend/ for the engine, ontologies, shared runtime utilities, and architecture assets.
MASTERPROEF/
├── src/ # Canonical application source tree
│ ├── backend/ # Engine runtime, SystemComponents, utils, orchestrator, overview assets
│ ├── frontend/ # Flask app, UI routes, runtime workspace integration
│ ├── __init__.py # Package root for `src.frontend` and `src.backend`
│ └── README.md # Architecture overview for the `src/` tree
├── evaluation/ # Sequential scripts + integrated pipeline + QA/research
│ ├── sequential/ # Stage-wise run_step.py scripts (00..08)
│ ├── integrated_pipeline/ # run_pipeline.py and run_engine_pipeline.py
│ ├── survey_analysis/ # Double-blind LLM-as-judge evaluation (10 HCP cases, 5 parts, 38 dimensions)
│ └── quality_and_research/ # pytest suites, schema contracts, research reporting
├── docker/ # Dockerfile + docker-compose for reproducible deployment
├── .github/ # CI/CD workflows
├── pyproject.toml # Python package metadata and constraints
├── requirements.txt # Dependency baseline
└── README.md # Root documentation
The following command executes the full PHOENIX pipeline with default settings, processing the synthetic_v1 dataset through all stages and generating comprehensive outputs:
python evaluation/integrated_pipeline/run_pipeline.py --mode synthetic_v1The following command runs the pipeline on the synthetic_v1 dataset but limits the execution to a single profile matching the pattern pseudoprofile_FTC_ID001. This allows for focused testing and debugging on a specific case:
python evaluation/integrated_pipeline/run_pipeline.py --mode synthetic_v1 \
--pattern pseudoprofile_FTC_ID001 \
--max-profiles 1The following command executes the PHOENIX pipeline for 2 complete cycles, allowing you to observe how the system iteratively refines its outputs based on previous cycle data. The --profile-memory-window 3 flag enables the system to retain information from the last 3 profiles for informed decision-making in subsequent cycles:
python evaluation/integrated_pipeline/run_pipeline.py --mode synthetic_v1 \
--cycles 2 \
--profile-memory-window 3python evaluation/integrated_pipeline/run_pipeline.py --mode synthetic_v1 --disable-llmRuntime note:
- If a cycle is
readiness_alignedand only contemporaneous correlation analysis is feasible, PHOENIX now applies a correlation-baseline impact fallback so downstream Step-03/04/05 and communication stages still execute and persist outputs. - If Step-02 model generation fails (for example provider/network failure), PHOENIX now builds complaint-grounded fallback Step-02 artifacts directly from Step-01 operationalization output, instead of copying unrelated historical profile artifacts.
- For iterative cycles started via
--start-from-pseudodata, PHOENIX now resolvesinitial_model_runs_rootfrom the active run lineage (same run id) so Step-03/04 stay anchored to the current cycle history.
⚠️ The Flask frontend (src/frontend/) is currently kept out of this public repository while it is under active development. Once the UI layer reaches a stable, demonstration-ready state, it will be pushed alongside the backend. Until then, the section below documents the intended entry point and feature set, but the corresponding source files are not part of the tracked codebase. Reach out if you need early access for evaluation purposes.
Use the following command (once the frontend is published) to start the Flask UI:
python src/frontend/app.py
# or
python evaluation/integrated_pipeline/run_pipeline.py --uiOpen http://127.0.0.1:5050.
Planned frontend features:
- Intake for complaint/person/environment context
- Live component status and streaming logs
- One-click full end-to-end run from free-text complaint (with iterative cycle controls)
- Step-level run controls and advanced configuration toggles
- Wizard-style iterative execution: INTAKE → MODEL → DATA → ANALYSIS → INTERVENTION → MODEL (cycle N+1)
- Interactive Chart.js dashboard — all visualizations are dynamic (no static PNGs in UI)
- Canvas-based animated network visualization with per-frame scrubbing
- Session persistence and cohort batch execution
Integrated outputs are saved under:
evaluation/integrated_pipeline/runs/<run_id>/
Key artifacts to inspect:
00_operationalization/through10_research_reports/pipeline_summary.jsonllm_startup_health_check.json- Stage logs (
stage.log,stage_events.jsonl,stage_trace.json) - Profile-specific JSON/CSV outputs per step
- Profile-specific human-readable summaries:
07_hapa_digital_intervention/<profile_id>/step05_hapa_intervention.md08_treatment_translation_communication/<profile_id>/treatment_translation_communication.md
- Time-varying network animation:
04_time_series_analysis/<profile_id>/tv_network_animation.gif - Publication-ready PNGs:
09_impact_visualizations/<profile_id>/(for human healthcare expert comparison)
The evaluation framework assesses PHOENIX output quality across five clinical tasks using a double-blind LLM-as-judge design. For each of 10 clinical cases, PHOENIX and HCP outputs are independently rated on a bipolar −10 to +10 absolute quality scale across 38 dimensions by a google/gemini-3.1-flash-lite-preview judge in three separate runs — without knowledge of source identity — producing 2,340 quality ratings. Effects are quantified via linear mixed-effects models (quality_score ~ entity + (1|case) + (1|judge_run)), standardized as Cohen's dz, and tested for equivalence against a ±1.5-point margin.
The evaluation/survey_analysis/ directory implements a five-part double-blind evaluation of PHOENIX against 10 licensed HCPs (one per case). Both sources receive identical Qualtrics-derived inputs and complete the same five clinical tasks; outputs are rated anonymously by an LLM judge across 38 dimensions on a bipolar −10 to +10 scale.
Per-part effects are estimated with quality_score ~ entity_ec + (1|case_id) + (1|judge_run) (PHOENIX = +0.5, HCP = −0.5), standardized as Cohen's dz, and tested for equivalence against a ±1.5-point margin (TOST). A cross-part holistic synthesis and supplementary ICC, calibration, and sensitivity diagnostics are run automatically.
Run the full pipeline:
export OPENROUTER_API_KEY=...
python evaluation/survey_analysis/pipeline.py --mode real --judge openrouter --n-runs 3Results are saved under evaluation/survey_analysis/results/ with publication-ready figures and statistical reports.
Run locally:
make qa-unit
make qa-integration
make qa-smoke
make qa-allAutomated workflows:
.github/workflows/ci.yml.github/workflows/smoke_pipeline.yml
Schema/contract validation entrypoint:
evaluation/quality_and_research/quality_assurance/validate_contract_schemas.py
Contract validation: 7 JSON schemas enforce structural guarantees on every stage output: readiness_report, network_comparison_summary, momentary_impact, step03_target_selection, step04_updated_model, step05_hapa_intervention, pipeline_summary.
This project is licensed under GNU General Public License v3.0. See LICENSE.
What this means in practice:
- You may use, study, modify, and redistribute this code.
- If you distribute modified versions (or software that includes GPL-covered parts), you must:
- keep it under GPL-compatible terms,
- provide corresponding source code,
- preserve copyright and license notices,
- document meaningful changes.
- The software is provided without warranty.
For academic reuse, cite the thesis context appropriately and keep provenance of methodological changes explicit.
Caution
EU MDR / PRE-CLINICAL DISCLAIMER PHOENIX is a Clinical Decision Support System (CDSS) prototype designed for research purposes. It is NOT a certified medical device under the EU Medical Device Regulation (MDR 2017/745) or FDA guidelines. Do not use for primary diagnostic decisions. All outputs must be verified by a qualified clinician.



