PredomicsApp-Web — Roadmap

Last updated: 2026-02-18

High Impact

1. Export & Reports ✅

Priority: HIGH | Effort: Medium | Status: Done

Add ability to download results as CSV/HTML report from the Results tab.

CSV exports: coefficients, population, jury, generation tracking
HTML report: self-contained with embedded charts and metrics
Backend endpoints for CSV sections and full HTML report
Frontend: Export dropdown in Results tab header

2. Job Comparison View ✅

Priority: HIGH | Effort: Medium | Status: Done

Enhance the existing Comparative sub-tab with detailed side-by-side analysis.

Side-by-side metrics tables for 2+ selected jobs
Diff highlighting for changed parameters
Config diff viewer showing only differing parameters
Performance delta visualization

3. Real-time Job Progress ✅

Priority: HIGH | Effort: High | Status: Done

Live progress tracking during training.

ConsolePanel with ANSI-rendered log output and HTTP polling
Real-time progress bar: current generation / max_epochs
Generation, k-value, and language display
Minimizable console with status badge

4. Notebook Integration ✅

Priority: HIGH | Effort: Medium | Status: Done

Generate downloadable Python/R notebooks from completed job results.

Python notebook (.ipynb): loads data, runs gpredomicspy, displays results with matplotlib
R notebook (.Rmd): loads data with gpredomicsR, generates ggplot2 visualizations
Pre-filled with actual parameter values from completed jobs
Download buttons in Results tab export dropdown

Medium Impact

5. Onboarding Tour ✅

Priority: MEDIUM | Effort: Low | Status: Done

Interactive first-use walkthrough highlighting key features.

Lightweight modal-based tour (OnboardingTour.vue) — no external dependencies
6 steps: Welcome → Create project → Upload data → Configure → Launch → Results
"Don't show again" checkbox stored in localStorage
Reset tour option in Profile → Preferences
Auto-shows for first-time users after login

6. Browser Notifications ✅

Priority: MEDIUM | Effort: Low | Status: Done

Notify user when a long-running job completes or fails.

Uses browser Notification API with permission request on project dashboard load
Triggers on job status transition: running → completed/failed
Shows AUC and k in notification body for completed jobs
Enable/disable toggle in Profile → Preferences
Auto-close after 10 seconds, click-to-focus

7. Batch Runs ✅

Priority: MEDIUM | Effort: Medium | Status: Done

Launch multiple analysis jobs with parameter sweeps.

Batch Mode toggle in Parameters tab with sweep parameter grid builder
7 sweepable params: seed, algo, language, data_type, population_size, max_epochs, k_max
Cartesian product of sweep values (max 50 jobs per batch)
batch_id column on jobs with migration v10
Backend: POST /batch creates N jobs, GET /batches returns batch summaries with best AUC
Jobs named [Batch] param=value ... for easy identification
Real-time job count preview and max-50 validation

8. Dataset Tagging & Search ✅

Priority: MEDIUM | Effort: Low | Status: Done

Organize datasets with tags and enable search/filter.

Added tags: JSON field to Dataset model with migration v9
Tag CRUD endpoints (create, update, suggestions, filter)
Search bar and tag filter dropdown in Dataset Library
Clickable tag chips, inline tag editor with datalist autocomplete
Pre-defined tags: benchmark, clinical, metagenomic, 16S, shotgun, WGS, etc.

Landing Page

9. Animated Workflow Diagram ✅

Priority: LOW | Effort: Medium | Status: Done

Interactive CSS animation showing the Predomics pipeline on the landing page.

5-step pipeline: Data Input → Feature Selection → Evolutionary Search → Model Evaluation → Jury Voting
CSS keyframe fadeSlideIn animation with staggered delays
Arrow connectors between steps, responsive layout
Each step has icon, label, and description

10. Use Case Examples ✅

Priority: LOW | Effort: Low | Status: Done

Real-world use cases with results on the landing page.

4 use cases: Cirrhosis (Qin 2014), Cancer (Zeller 2014), Treatment Response (Gopalakrishnan 2018), Metabolic Disease (Karlsson 2013)
Each card has icon, description, key metrics (AUC, features, samples), and publication reference
Hover effects, responsive grid layout

Technical Debt

11. Test Coverage (55% → 86%) ✅

Priority: MEDIUM | Effort: High | Status: Done

Expanded test suite to cover critical paths — 86% coverage on testable backend code.

Backend: 207 tests (69 new) covering all 11 routers — datasets (94%), projects (97%), sharing (99%), admin (98%), auth (100%), export (93%), health (100%), analysis (73%), data_explore (90%), samples (97%)
Frontend: 59 tests (all new) — Vitest 4 + Vue Test Utils + jsdom: parameter definitions (13), notification utility (13), Pinia stores (14), Vue components (14), router config (5)
Edge cases: concurrent jobs, batch runs, error recovery, export helpers
Coverage config: pyproject.toml with concurrency = ["greenlet", "thread"] for accurate async tracking
Remaining uncovered: DB migrations (main.py, PostgreSQL-specific), subprocess worker (worker.py), gpredomicspy internals

12. API Documentation ✅

Priority: LOW | Effort: Low | Status: Done

Auto-generate OpenAPI documentation.

Fixed SPA catch-all to not intercept /docs, /redoc, /openapi.json paths
FastAPI Swagger UI available at /docs, ReDoc at /redoc
OpenAPI JSON schema at /openapi.json

13. Production Deployment Guide ✅

Priority: MEDIUM | Effort: Low | Status: Done

Comprehensive DEPLOYMENT.md covering all deployment scenarios.

Docker Compose single-server deployment with production overrides
Full environment variable reference table
NGINX reverse proxy configuration with security headers
SSL/TLS setup with Let's Encrypt and certbot
PostgreSQL configuration and managed database migration
Backup & restore procedures (database + files) with automated cron script
Kubernetes deployment manifests with Ingress and cert-manager
Health check and monitoring guidance
Troubleshooting section

Enhancements (Phase 2)

14. Database Backup & Import ✅

Priority: HIGH | Effort: Medium | Status: Done

Full system backup and restore via portable tar.gz archives.

Backend service: JSON export of all DB tables in FK dependency order
Archive includes: database JSON, dataset files, job results, admin defaults, manifest
Restore modes: replace (wipe + restore) or merge (skip existing records)
Admin UI: create backup, list/download/delete backups, upload & restore
5 new admin API endpoints

15. WebSocket Live Job Logs ✅

Priority: HIGH | Effort: Medium | Status: Done

Real-time log streaming via WebSocket with HTTP polling fallback.

WebSocket endpoint: /ws/jobs/{project_id}/{job_id}?token=JWT
Tail-f style log streaming with 0.5s refresh
JWT authentication via query parameter
Automatic fallback to HTTP polling if WebSocket fails
Status change broadcasts (running → completed/failed)

16. Dataset File Preview ✅

Priority: MEDIUM | Effort: Low | Status: Done

Preview dataset file contents without downloading.

Backend: CSV/TSV parsing with first N rows, column types, basic stats (min/max/mean/std)
Frontend modal with scrollable table, sticky headers, row numbers
Stats footer row, file metadata badges
Preview buttons in Dataset Library file rows

17. Performance Optimizations ✅

Priority: MEDIUM | Effort: Low | Status: Done

Improve response times and initial load performance.

In-memory TTL cache decorator for expensive backend endpoints
GZip response compression middleware (min 1KB)
Dynamic import for Plotly.js (~3MB) — loaded on demand in Results tab

18. Error Handling & UX Polish ✅

Priority: MEDIUM | Effort: Low | Status: Done

Structured error responses and user-facing notifications.

Backend: structured {error: {code, message}} JSON responses for all HTTP errors
Generic catch-all 500 handler with server-side logging
Toast notification system: composable + ToastContainer component
Axios response interceptor for global error display
Exponential backoff retry utility for transient failures

Phase 3: Security, Integrations & Data Management

19. Feature Importance Visualization ✅

Priority: HIGH | Effort: Low | Status: Done

Richer charts in the Best Model sub-tab for exploring feature contributions.

Coefficient direction chart: horizontal bars colored by sign (green=positive, red=negative)
Feature contribution waterfall: cumulative sorted contributions using Plotly waterfall trace
Per-sample contribution heatmap: button-triggered, computes coefficient × feature values matrix
All data sourced from existing results JSON — no backend changes needed

20. Project Templates ✅

Priority: MEDIUM | Effort: Low | Status: Done

Admin-managed parameter presets available to all users.

File-based JSON storage at data/templates.json (same pattern as admin defaults)
CRUD endpoints: GET/POST/PUT/DELETE + public GET (no auth)
Admin UI: create/delete templates from current default config
Parameters tab: "Load Template" dropdown applies preset config values

21. Audit Log ✅

Priority: HIGH | Effort: Medium | Status: Done

Track all user actions with timestamps, queryable by admins.

AuditLog model: user_id, action, resource_type, resource_id, details (JSON), ip_address
14 action constants: login, register, job.launch/delete, dataset.upload/delete, project.create/delete, share.create/revoke, admin operations
Instrumented across 6 routers (auth, analysis, datasets, projects, sharing, admin)
Admin UI: paginated audit log table with action filter
Migration v11

22. Password Reset & Email ✅

Priority: HIGH | Effort: Medium | Status: Done

Self-service password reset with optional SMTP email delivery.

PasswordResetToken model with bcrypt-hashed tokens and 1-hour expiry
email_verified field on User model (migration v12)
aiosmtplib integration (optional dep — graceful ImportError handling)
Dev mode: returns reset token directly when no SMTP configured
Frontend: ForgotPasswordView + ResetPasswordView with router guard
Admin: direct password reset for any user

23. API Key Management ✅

Priority: MEDIUM | Effort: Medium | Status: Done

Programmatic access via API keys as alternative to JWT Bearer tokens.

ApiKey model with bcrypt-hashed keys, 8-char prefix, last_used_at tracking
Keys shown only once on creation
Dual auth: get_current_user accepts both Authorization: Bearer and X-API-Key header
Frontend: create/list/revoke API keys in Profile view
Migration v13

24. Webhook Notifications ✅

Priority: MEDIUM | Effort: Medium | Status: Done

HTTP POST callbacks to external URLs when jobs complete or fail.

Webhook model with HMAC-SHA256 signing via secret
Delivery with configurable retries and exponential backoff (httpx)
CRUD + test endpoint (send test payload)
Fired from background job runner on completion/failure
Frontend: create/list/delete/test webhooks in Profile view
Migration v14

25. Dataset Versioning ✅

Priority: MEDIUM | Effort: Medium | Status: Done

Automatic snapshots on file changes with restore capability.

DatasetVersion model: version_number, files_snapshot (JSON), created_by, note
Auto-snapshot on file upload and file delete
Version history endpoint with restore to any previous version
Frontend: "History" button per dataset with version list and restore
Migration v15

26. Rate Limiting ✅

Priority: HIGH | Effort: Low | Status: Done

Per-user and per-IP rate limits using slowapi (in-memory, no Redis required).

User-or-IP key extraction from JWT, API key prefix, or client IP
Configurable limits: auth (10/min), API (100/min), upload (20/min), admin (30/min)
Limiter decorators on auth, upload, and admin endpoints
429 toast handling in frontend Axios interceptor
Global enable/disable via config setting

Phase 4: Analytics, Collaboration & Infrastructure

27. External Validation ✅

Priority: HIGH | Effort: Medium | Status: Done

Score new samples against trained models without re-training.

Shared prediction service (services/prediction.py): coefficient extraction, data_type transform, score computation, threshold classification
Upload X_valid.tsv + optional Y_valid.tsv via multipart form
Validation metrics: AUC, accuracy, sensitivity, specificity, confusion matrix
Per-sample prediction table with scores and classifications
Backend: POST /api/analysis/{id}/jobs/{jid}/validate endpoint
Frontend: ValidateModal in Best Model sub-tab with file upload and results display

28. Model Deployment API ✅

Priority: HIGH | Effort: Medium | Status: Done

Serve trained models as live prediction endpoints.

POST /api/predict/{job_id} — JSON body with {features: {name: [values]}, sample_names: [...]}
Returns scores, predicted_classes, threshold, matched/missing features
Reuses shared prediction.py service from F27
API key authentication (X-API-Key header) for programmatic access
Frontend: "Prediction API" section in Best Model sub-tab with endpoint URL, curl example, and copy button

29. Biomarker Discovery Report ✅

Priority: HIGH | Effort: Medium | Status: Done

Auto-generate publication-ready PDF summarizing analysis findings.

3-page PDF report using reportlab (services/pdf_report.py)
Page 1: Performance metrics (AUC, accuracy, sensitivity, specificity), jury summary
Page 2: Feature table with coefficients, taxonomy annotations, functional properties
Page 3: Configuration summary, generation tracking highlights
Backend: GET /api/export/{pid}/jobs/{jid}/pdf endpoint
Frontend: "PDF Biomarker Report" option in Export dropdown

30. Multi-Cohort Meta-Analysis ✅

Priority: HIGH | Effort: High | Status: Done

Compare models trained on different datasets for the same phenotype.

Backend: GET /api/meta-analysis/searchable-jobs — cross-project completed job search
Backend: POST /api/meta-analysis/compare — feature overlap, concordance, meta-AUC
Frontend: new "Meta-Analysis" top-level view with job picker (chip-based, 2-10 jobs)
Metrics comparison table with best-value highlighting
Feature overlap chart (horizontal bars by cohort count)
Concordance matrix: feature × job grid colored by coefficient sign (green/red/grey)
Meta-AUC card (weighted average across cohorts)
"Meta-Analysis" link in navbar

31. SHAP-Style Feature Explanations ✅

Priority: MEDIUM | Effort: Medium | Status: Done

Per-sample feature contribution breakdown for model interpretability.

useShapValues composable: computeShapMatrix() computes SHAP values (feature × coef × sample value)
Beeswarm plot: horizontal strip per feature, x=SHAP value, color=feature value (Viridis)
Force plot: waterfall for single sample showing cumulative feature contributions
Dependence plot: scatter of feature value vs SHAP value, colored by class
Feature importance ordering by mean |SHAP|
Entirely client-side using existing barcode-data API
"Feature Explanations" section in Best Model sub-tab with 3 switchable views

32. Project Comments & Activity Feed ✅

Priority: MEDIUM | Effort: Low | Status: Done

Threaded notes/discussion per project.

ProjectComment model: id, project_id, user_id, content (Text), created_at, updated_at
CRUD endpoints: create, list, update (author only), delete (author or project owner)
CommentsSidebar component: slide-out panel with comment list, add/edit/delete
User initials avatar, timestamps, Ctrl+Enter submit shortcut
"Notes" button in project dashboard header
Migration v17

33. Public Sharing Links ✅

Priority: MEDIUM | Effort: Low | Status: Done

Share results via read-only public links (no login required).

PublicShare model: id, project_id, token (64-char, unique indexed), created_by, expires_at, is_active
Authenticated endpoints: create, list, revoke links (project owner only)
Unauthenticated endpoints: GET /api/public/{token} for project info + jobs, GET /api/public/{token}/jobs/{jid}/results for full results
PublicShareModal: create links with expiry options (7/30/90 days or never), copy URL, revoke
PublicShareView: guest-accessible page with project summary, job cards, metrics grid, feature display
"Public Link" button in project dashboard header
Router guard allows both authenticated and unauthenticated access
Migration v18

34. Dashboard Overview ✅

Priority: MEDIUM | Effort: Low | Status: Done

Global view of all projects with summary statistics and recent activity.

Backend: GET /api/dashboard/ — aggregates counts (projects, datasets, running/completed/failed jobs, shared)
Active jobs section with status badges and links to project console
Recent completions mini-table (project name, AUC, k, date)
Activity feed from audit_logs with action icons, resource links, timeAgo formatting
Summary cards grid: Projects, Datasets, Running, Completed, Failed, Shared
"Dashboard" link in navbar (before Projects)

35. E2E Integration Tests ✅

Priority: MEDIUM | Effort: Medium | Status: Done

Automated end-to-end testing using Playwright.

playwright.config.mjs at repo root with dark theme, 1440×900 viewport
10 E2E test cases in tests/e2e/e2e.spec.mjs:
1. Landing page loads and shows brand
2. User registration flow
3. Login with credentials
4. Create project
5. Upload dataset files
6. Dashboard displays summary cards
7. Meta-analysis page accessible
8. Public share page loads for guests
9. Health API endpoint returns ok
10. Swagger docs accessible
Root package.json with test:e2e script and @playwright/test dependency
CI integration: e2e-test job in GitHub Actions after Docker build

36. CI/CD Pipeline & Docker Registry ✅

Priority: LOW | Effort: Low | Status: Done

Auto-deploy on tag push with container registry publishing.

.github/workflows/release.yml triggered on v* tags
Multi-repo checkout (predomicsapp, gpredomics, gpredomicspy)
Login to GHCR, build and push with docker/build-push-action
Tags: ghcr.io/{owner}/predomicsapp:latest + :{version}
GitHub Actions cache (type=gha) for faster rebuilds
OCI metadata labels in Dockerfile (title, description, source, license)

37. Internationalization (i18n) ✅

Priority: LOW | Effort: High | Status: Done

Multi-language support starting with French.

vue-i18n@9 integration with JSON locale files
frontend/src/i18n/ module with createI18n() setup
English (en.json) and French (fr.json) locale files (~100 strings each)
Namespaces: nav, home, login, dashboard, projects, results, meta, common
Language selector button (EN/FR toggle) in navbar next to theme toggle
localStorage.locale persistence across sessions
Navbar links translated via $t('nav.dashboard') etc.
Backend: Accept-Language header parsing in core/errors.py for translated error messages

Phase 5: Advanced Analytics & Population Mining

38. t-SNE & UMAP Ordination ✅

Priority: MEDIUM | Effort: Medium | Status: Done

Add t-SNE and UMAP as alternative dimensionality reduction methods alongside PCoA.

Unified /api/data-explore/{pid}/ordination endpoint with method param (pcoa, tsne, umap)
Backend: compute_tsne() and compute_umap() in data_analysis.py using precomputed distance matrices
Method-specific parameters: perplexity (t-SNE), n_neighbors/min_dist (UMAP)
Shared _load_and_prepare() helper with subsampling (max 1000 samples)
Dynamic axis labels: "PCo1 (X%)" for PCoA, "t-SNE 1" / "UMAP 1" for others
PERMANOVA and 95% confidence ellipses for all methods
Added to DataTab, ResultsTab (Population), and DataExploreTab
Dependencies: scikit-learn>=1.0, umap-learn>=0.5

39. CI-based FBM Selection (Blaise Method) ✅

Priority: MEDIUM | Effort: Low | Status: Done

Add the Blaise confidence interval method as alternative to standard Wald CI for FBM filtering.

Dropdown in Population and Co-Presence tabs to choose between Standard CI and Blaise CI
Blaise formula: threshold = r - (0.5/n + 1.96 * sqrt(r*(1-r)/n)) — adds continuity correction
Shared fbmMethod ref between Population and Co-Presence tabs
i18n: "Standard CI" / "Blaise CI" in EN/FR

40. Accuracy-Weighted Co-Presence Network ✅

Priority: MEDIUM | Effort: Medium | Status: Done

Enhance co-presence analysis with model fitness weighting (Shasha Cui internship, 2017).

Toggle checkbox in Co-Presence controls for accuracy-weighted mode
Per-node mean accuracy: meanAccuracy[feature] = sum(model.fit) / count
Per-edge accuracy-weighted counts: wObs = sum(model.fit for co-occurring pairs)
Weighted expected: wExp = (accSum_i * accSum_j) / totalAccSum
Weighted ratio: wRatio = wObs / wExp
Prevalence chart: secondary axis with mean accuracy diamonds
Network: node size by accuracy, edge width by weighted ratio, accuracy in hover
Stats table: "W. Obs" and "W. Ratio" columns when weighted mode is on
Hypergeometric test unchanged (stays on binary integer counts for statistical validity)

41. Model Basket ✅

Priority: MEDIUM | Effort: Medium | Status: Done

Bookmark individual models from any job's population into a persistent curated collection.

useModelBasket.js composable with localStorage persistence per project (max 50 models)
Star toggle (★/☆) in Population table to add/remove models
Basket sub-tab with count badge in Results tab nav bar
Bookmarked models table with expandable feature chips, remove button
Metrics comparison grouped bar chart (AUC, Accuracy, Sensitivity, Specificity per model)
Feature overlap horizontal stacked bar chart with positive/negative direction
Consensus features display (features common to all basket models)
Feature coefficient heatmap (features × models, blue-white-red colorscale)
Cross-job support: basket items store data snapshots, survive job deletion

42. Multiple Community Detection Algorithms ✅

Priority: MEDIUM | Effort: Low | Status: Done

Add dropdown to choose between community detection methods in the Ecosystem network.

Backend: _detect_communities(G, method, seed) dispatcher in coabundance.py
Supported algorithms: Louvain (default), Greedy modularity maximization, Label propagation
community_method parameter added to compute_coabundance_network() and cache key
Endpoint validation: rejects unknown methods with HTTP 400
Frontend: dropdown in EcosystemTab controls, auto-refetch on change
Graceful fallback to singleton communities on algorithm failure
i18n: "Community" / "Greedy modularity" / "Label propagation" in EN/FR

43. Module Niche Zoom ✅

Priority: MEDIUM | Effort: Medium | Status: Done

Select a network module and zoom into a higher-resolution sub-network for detailed niche analysis.

Zoom button per module in the sidebar (magnifying glass icon)
Extracts member species, re-calls /coabundance-network with features=members and halved thresholds
displayedNetwork computed property for seamless switching between full and zoomed views
Breadcrumb navigation bar showing module ID, member count, and "Back to full network" button
zoomOut() restores the full network; parameter changes auto-clear zoom state
Updated renderNetwork() to use displayedNetwork.value throughout
Stats bar, chart visibility, and taxonomy legend all react to zoom state
i18n: "Zoom into this module" / "Back to full network" in EN/FR

44. Aberrant Correlation Diagnostic ✅

Priority: MEDIUM | Effort: Medium | Status: Done

QC scatter plot comparing per-class Spearman correlations to detect prevalence filtering artifacts.

Backend: compute_aberrant_correlations() in data_analysis.py — computes full Spearman correlation matrices per class, extracts upper-triangle pairs
Prevalence filtering (default 30%), subsampling to 500 features if needed, max 5000 pairs
Returns {pairs: [{f1, f2, r0, r1}], n_features, n_pairs, feature_names}
New endpoint: GET /{project_id}/aberrant-correlations with min_prevalence_pct param
Frontend: new "Aberrant Correlation Diagnostic" sub-tab in DataExploreTab
Plotly scatter: x = Class 0 rho, y = Class 1 rho, continuous colorscale by deviation from diagonal
Dashed y=x reference line, equal aspect ratio axes [-1.05, 1.05]
i18n: aberrant correlation labels in EN/FR

45. Dual-Network Comparison ✅

Priority: HIGH | Effort: Medium | Status: Done

Side-by-side patient vs. control co-abundance ecosystems highlighting common and condition-specific interactions.

New endpoint: GET /{project_id}/dual-network — computes both class networks in one call
Edge comparison: identifies common edges (present in both), class-0-specific, and class-1-specific
Each edge annotated with shared boolean flag for coloring
Frontend: "Dual network" checkbox toggle in EcosystemTab controls
Side-by-side Plotly panels in a CSS grid (responsive: stacks on mobile)
Comparison stats header: common edge count + specific counts per class
Edge coloring: shared = gray (subtle), class-0-specific = teal, class-1-specific = orange
Auto-refetch on parameter changes when dual mode is active
i18n: "Dual network" / "Common edges" / "Controls" / "Patients" in EN/FR

46. Alluvial/Sankey Module Correspondence ✅

Priority: MEDIUM | Effort: Medium | Status: Done

Visualize how network modules reorganize between patient and control ecosystems via an alluvial/Sankey diagram.

Backend: module correspondence computation added to get_dual_network endpoint in data_explore.py
Species→module mapping per class, intersection of common species, flow counting between module pairs
sankey_links array of {source_module, target_module, value} in dual-network response
Frontend: Plotly Sankey trace in EcosystemTab.vue below dual-network panels
Node labels combine class label, module ID, and dominant phylum
Link colors: semi-transparent source module color (handles both rgb() and hex)
Rendered automatically when dual mode is active and sankey_links has data
i18n: "Module Correspondence" / "Correspondance des modules" in EN/FR

47. External Network Import ✅

Priority: MEDIUM | Effort: Medium | Status: Done

Upload and display external network JSON files (e.g. from SCAPIS) with FBM signature annotation.

Backend: 4 new endpoints in data_explore.py:
- POST /{project_id}/external-networks — upload JSON (max 10MB, validates nodes/edges arrays)
- GET /{project_id}/external-networks — list uploaded networks
- GET /{project_id}/external-networks/{network_id} — retrieve full network data
- DELETE /{project_id}/external-networks/{network_id} — delete (editor role)
Storage: data/projects/{project_id}/networks/{uuid}.json
Frontend: external network section in EcosystemTab with select dropdown, upload button, delete button
Rendering: uses fixed positions (x, y from JSON) if available, organic layout fallback
FBM annotation: species matching FBM population features highlighted with gold color, diamond shape, orange border
Edge styling: positive = solid gray, negative = dashed red, width scaled by |weight|
i18n: "External Network" / "Réseau externe" in EN/FR

48. Ecosystem-guided FBM Filtering ✅

Priority: MEDIUM | Effort: Medium | Status: Done

Filter the Family of Best Models by network module membership to keep only ecologically coherent models.

Backend: POST /{project_id}/fbm-module-filter endpoint in data_explore.py
Loads job results population, computes co-abundance network to get module assignments
Per-model metrics: module coverage (fraction in selected modules), module coherence (fraction in same module)
Filters models with ≥50% features in selected modules, sorted by coverage then fitness
Frontend: "FBM Module Filter" section in EcosystemTab with module checkboxes, apply/clear buttons
Results table with rank, fit, k, language, coverage bar, coherence, in-module count
CSV export of filtered models
Event integration: emits module-filter-applied to parent ResultsTab
Population table: blue highlight + "M" badge for filtered models
i18n: 12 keys in EN/FR (filter title, description, coverage, coherence, export, etc.)

49. Signature Zoo ✅

Priority: MEDIUM | Effort: High | Status: Done

Curated, searchable database of published biomarker signatures for cross-study comparison and reuse.

Backend: new signature_zoo.py router with 7 endpoints:
- GET /api/signature-zoo/ — list all signatures with optional disease/method/search filters
- GET /api/signature-zoo/compare — compare 2+ signatures (Jaccard overlap, performance, common features)
- GET /api/signature-zoo/{id} — get single signature
- POST /api/signature-zoo/ — create signature (auth required)
- PUT /api/signature-zoo/{id} — update (auth required)
- DELETE /api/signature-zoo/{id} — delete (admin only)
- POST /api/signature-zoo/import-from-job — create from completed job's best model
File-based JSON storage at data/signature_zoo.json with fcntl file locking
4 seed signatures: Cirrhosis (Qin 2014), CRC (Zeller 2014), Obesity (Le Chatelier 2013), T2D (Karlsson 2013)
Frontend: SignatureZooView.vue with card grid, filters (disease/method/text), detail modal, compare mode
Compare mode: Plotly performance bars, Jaccard overlap matrix, common features, feature presence chart
Route /signature-zoo in router.js, navbar link in App.vue
35 i18n keys in EN/FR

Phase 6: gpredomics v1.0.0 Integration & Regression

Updated: 2026-03-31

50. Regression Support ✅ gpredomics#15, gpredomics#79

Priority: HIGH | Effort: High | Status: Done

Data.y changed from Vec to Vec. 4 regression fitness functions (spearman, pearson, rmse, mutual_information). Regression-aware display, test metrics, CV fold splitting, voting skip.

51. 4 New Optimization Algorithms ✅ gpredomics#61, #62, #65, #54

Priority: HIGH | Effort: High | Status: Done

SA (Simulated Annealing), ILS (Iterated Local Search), LASSO/Elastic Net, ACO (Ant Colony Optimization). All integrated in web app.

52. MCMC Gibbs Variable Selection ✅ gpredomics#73, #70

Priority: HIGH | Effort: High | Status: Done

Joint feature+coefficient sampling replacing SBS. LASSO prescreen, parallel chains, golden section optimizer. MCMC from OOM/minutes to 5-11 seconds.

53. Code Audit & Fixes ✅ gpredomics#75

Priority: HIGH | Effort: Medium | Status: Done

12 critical + 14 medium issues fixed. HashMap→BTreeMap for determinism. ClassificationMetrics refactor. -280 lines dead code removed.

54. Metadata Upload & Variable Selection ✅ predomicsapp#1

Priority: HIGH | Effort: Medium | Status: Done

Upload metadata TSV, select numeric column as regression y, auto-switch to regression mode. Backend APIs for metadata column parsing and y extraction.

55. Feature Selection in UI ✅

Priority: MEDIUM | Effort: Low | Status: Done

Exposed prevalence, adj_pvalue, selection method in Parameters tab. Documented adaptive FDR relaxation.

56. Adaptive BH-FDR ✅

Priority: MEDIUM | Effort: Low | Status: Done

When strict FDR selects < 10 features, alpha relaxes progressively (0.05→0.1→0.2→0.5) with warnings. Fallback to top 10 by raw p-value.

57. Wetlab Protocol Dataset ✅ gpredomics#60

Priority: MEDIUM | Effort: Low | Status: Done

Paired study (459 subjects, 2 extraction protocols). 1,981 MSPs. Subject-level train/test split. Metadata with age, sex, BMI, Gram+/- counts, gene_count.

58. Samples/Data Separation ✅

Priority: MEDIUM | Effort: Low | Status: Done

Bundled demos in samples/ (baked in image). User workspace in data/ (persistent volume for K8s).

59. Documentation ✅

Priority: MEDIUM | Effort: Medium | Status: Done

17-page PDF documentation. Vignette tutorial. All 7 algorithm docs with references. Fully documented param.yaml.

Phase 7: Upcoming

60. Multi-user Workspace Management predomicsapp#4

Priority: HIGH | Effort: High | Status: Open

Job concurrency limits (semaphore), per-user disk quotas, job timeouts, dataset deduplication, admin dashboard.

61. Data Scanning Console Feedback predomicsapp#5

Priority: MEDIUM | Effort: Low | Status: Open

Show scanning progress during dataset load: feature count, class distribution, prevalence stats, warnings.

62. Optuna Hyperparameter Optimization gpredomics#77

Priority: MEDIUM | Effort: Medium | Status: Open

Bayesian hyperparameter tuning via Optuna in gpredomicspy. Search space for k_penalty, population_size, algorithm choice, etc.

63. Multiclass Classification (OVO/OVA) gpredomics#52

Priority: HIGH | Effort: High | Status: Open

One-vs-All and One-vs-One strategies for multi-class problems. Orchestrate K binary gpredomics runs.

64. Clinical Data Integration gpredomics#51

Priority: MEDIUM | Effort: Medium | Status: Open

Stacking, calibration, stratification of omics scores with clinical variables.

65. MCMC SBS Fix gpredomics#81

Priority: MEDIUM | Effort: Medium | Status: Open

SBS Bayesian evaluation produces inverted AUC after optimization changes.

66. Auto-discover Feature Weights gpredomics#78

Priority: LOW | Effort: Medium | Status: Open

Compare feature distributions between folds to auto-discover sampling weights.

67. More Heuristics gpredomics#63, #64, #66

Priority: LOW | Effort: Medium | Status: Open

PSO (Particle Swarm), EDA/UMDA, Bayesian Optimization.

68. Island Model GA gpredomics#33

Priority: LOW | Effort: High | Status: Open

Separate environments with periodic migration for diversity preservation.

FilesExpand file tree

ROADMAP.md

Latest commit

History

ROADMAP.md

File metadata and controls

PredomicsApp-Web — Roadmap

High Impact

1. Export & Reports ✅

2. Job Comparison View ✅

3. Real-time Job Progress ✅

4. Notebook Integration ✅

Medium Impact

5. Onboarding Tour ✅

6. Browser Notifications ✅

7. Batch Runs ✅

8. Dataset Tagging & Search ✅

Landing Page

9. Animated Workflow Diagram ✅

10. Use Case Examples ✅

Technical Debt

11. Test Coverage (55% → 86%) ✅

12. API Documentation ✅

13. Production Deployment Guide ✅

Enhancements (Phase 2)

14. Database Backup & Import ✅

15. WebSocket Live Job Logs ✅

16. Dataset File Preview ✅

17. Performance Optimizations ✅

18. Error Handling & UX Polish ✅

Phase 3: Security, Integrations & Data Management

19. Feature Importance Visualization ✅

20. Project Templates ✅

21. Audit Log ✅

22. Password Reset & Email ✅

23. API Key Management ✅

24. Webhook Notifications ✅

25. Dataset Versioning ✅

26. Rate Limiting ✅

Phase 4: Analytics, Collaboration & Infrastructure

27. External Validation ✅

28. Model Deployment API ✅

29. Biomarker Discovery Report ✅

30. Multi-Cohort Meta-Analysis ✅

31. SHAP-Style Feature Explanations ✅

32. Project Comments & Activity Feed ✅

33. Public Sharing Links ✅

34. Dashboard Overview ✅

35. E2E Integration Tests ✅

36. CI/CD Pipeline & Docker Registry ✅

37. Internationalization (i18n) ✅

Phase 5: Advanced Analytics & Population Mining

38. t-SNE & UMAP Ordination ✅

39. CI-based FBM Selection (Blaise Method) ✅

40. Accuracy-Weighted Co-Presence Network ✅

41. Model Basket ✅

42. Multiple Community Detection Algorithms ✅

43. Module Niche Zoom ✅

44. Aberrant Correlation Diagnostic ✅

45. Dual-Network Comparison ✅

46. Alluvial/Sankey Module Correspondence ✅

47. External Network Import ✅

48. Ecosystem-guided FBM Filtering ✅

49. Signature Zoo ✅

Phase 6: gpredomics v1.0.0 Integration & Regression

50. Regression Support ✅ gpredomics#15, gpredomics#79

51. 4 New Optimization Algorithms ✅ gpredomics#61, #62, #65, #54

52. MCMC Gibbs Variable Selection ✅ gpredomics#73, #70

53. Code Audit & Fixes ✅ gpredomics#75

54. Metadata Upload & Variable Selection ✅ predomicsapp#1

55. Feature Selection in UI ✅

56. Adaptive BH-FDR ✅

57. Wetlab Protocol Dataset ✅ gpredomics#60

58. Samples/Data Separation ✅

59. Documentation ✅

Phase 7: Upcoming

60. Multi-user Workspace Management predomicsapp#4

61. Data Scanning Console Feedback predomicsapp#5

62. Optuna Hyperparameter Optimization gpredomics#77

63. Multiclass Classification (OVO/OVA) gpredomics#52