Skip to content

WhitehatD/leyoda-platform

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Leyoda

Leyoda

AI-powered investor–startup matching platform for the European venture ecosystem.
Swipe-based discovery • Structured profiles • AI signal intelligence • Enterprise-grade infrastructure

Java 21 Spring Boot 3.4 Next.js 16 React 19 Python 3.12 FastAPI PostgreSQL 16 Docker Compose


📋 Case Study — This repository is a technical case study. Source code is proprietary and not included. The documentation below showcases the architecture, engineering decisions, and technical depth of the platform.


Table of Contents


Overview

Leyoda is a full-stack, production-grade platform that connects investors with early-stage startups across Europe. It combines a swipe-based discovery engine with structured startup profiles and real-time analytics to streamline the fundraising lifecycle.

The European early-stage investment landscape is fragmented and opaque — founders spend months cold-emailing investors with no signal on fit, while investors sift through thousands of unqualified decks. Leyoda replaces the noise with structured, card-based profiles where both sides evaluate fit through traction data, sectors, check sizes, and geography — then match with a single swipe.

What makes it technically interesting:

  • A proprietary AI intelligence pipeline (Signal Engine) that transforms university research papers into ranked, investment-grade startup concepts
  • Security-hardened two-phase authentication with OpenID Connect (LinkedIn) and intelligent redirection
  • Decoupled enterprise-grade blue-green deployments spanning multiple repositories with zero-downtime hotswapping and automated rollback
  • Industrial-grade verification framework featuring over 13,000 lines of test code, singleton testcontainers, and adversarial E2E validation
  • Three-tier input validation spanning frontend schemas, pre-submission guards, and backend annotations

Architecture

graph TB
    subgraph Internet
        Client["Browser / Mobile<br/>(Mechanical Luxury UI)"]
        LinkedIn["LinkedIn OIDC<br/>Provider"]
    end

    subgraph Reverse Proxy
        Nginx["Nginx<br/>TLS Termination<br/>Blue-Green Router"]
    end

    subgraph Application Layer
        FE["Frontend<br/>Next.js 16 · React 19<br/>Tailwind CSS 4"]
        BE["Backend<br/>Spring Boot 3.4 · Java 21<br/>19 REST Controllers"]
        SE["Signal Engine Service<br/>FastAPI · Python 3.12<br/>AI Pipeline"]
    end

    subgraph Data Layer
        PG["PostgreSQL 16<br/>+ PostGIS"]
        MIO["MinIO<br/>S3 Object Storage"]
        SQ["SQLite<br/>Pipeline State"]
    end

    subgraph Security Layer
        AV["ClamAV<br/>Antivirus Scanner"]
    end

    Client --> Nginx
    Client --> LinkedIn
    LinkedIn --> FE
    Nginx --> FE
    Nginx --> BE
    FE -- "36 BFF proxy routes" --> BE
    BE --> PG
    BE --> MIO
    BE --> AV
    BE -- "Synchronous Proxy" --> SE
    SE --> PG
    SE --> SQ

    style Nginx fill:#1A7F74,color:#fff
    style FE fill:#000000,color:#fff
    style BE fill:#6DB33F,color:#fff
    style SE fill:#009688,color:#fff
    style PG fill:#4169E1,color:#fff
    style MIO fill:#C72C48,color:#fff
    style AV fill:#394D54,color:#fff
    style SQ fill:#003B57,color:#fff
Loading

Service Dependency Chain

Services follow a strict healthcheck policy — nothing starts until its dependencies report healthy. Cross-service domain separation guarantees Zero Trust machine identity for internal communications:

Frontend  →  Backend  →  Database (PostgreSQL)
                      →  MinIO (Object Storage)
                      →  ClamAV (Antivirus)
                      →  Signal Engine (AI Pipeline)

Tech Stack

Backend

Layer Technology Why
Runtime Java 21 Modern LTS with virtual threads support
Framework Spring Boot 3.4 REST API, dependency injection, security, data access
ORM Hibernate 6 + Spring Data JPA Type-safe data access with spatial extensions
Database PostgreSQL 16 + PostGIS Relational storage with geo-spatial query support
Migrations Flyway 41 versioned, repeatable schema migrations
Auth Spring Security + JWT (HS256) Stateless auth via HttpOnly cookies; Hardened request matchers
OAuth LinkedIn (OIDC) · X (Twitter OAuth 2.0) Social sign-in with intelligent redirection and mandatory profile setup
Storage MinIO (S3-compatible) Owner-isolated binary asset management
Antivirus ClamAV File upload scanning before persistence
Rate Limiting Bucket4j In-memory token bucket algorithm

Frontend

Layer Technology Why
Framework Next.js 16 (App Router) SSR, ISR, API routes, middleware
UI React 19 Server Components, concurrent features
Styling Tailwind CSS 4 Utility-first CSS with custom design tokens for "Mechanical Luxury" aesthetics
Components shadcn/ui + Radix Accessible, headless component primitives with zero-latency tactile feedback
Forms React Hook Form + Zod Performant forms with schema validation
Data Fetching SWR Stale-while-revalidate caching strategy
Linting Biome Unified lint and format (replaces ESLint + Prettier)

Signal Engine (AI Pipeline)

Layer Technology Why
Runtime Python 3.12 AI/ML pipeline execution
API FastAPI + Uvicorn Async REST API for pipeline orchestration
Orchestration Trajector 6-stage structured forward-looking signal extraction
Embeddings sentence-transformers (all-MiniLM-L6-v2) 384-dim vectors for semantic clustering (CPU)
PDF Parsing PyMuPDF · pypdf Scientific paper text extraction
Pipeline State SQLite (WAL mode) Checkpoint/resume for long-running pipelines

Infrastructure

Layer Technology Why
Orchestration Docker Compose Multi-service local and production environment
Reverse Proxy Nginx TLS termination, routing, blue-green upstream switching
Deployment Blue-Green + Zero-Downtime Atomic switchover with health-check gating across decoupled repos
Latency Mitigation Warm Sleep Unified patterns to mitigate container cold-start delays

Services & Components

Service Responsibility
Backend (Spring Boot) 19 REST controllers, 22 business services, JWT auth, rate limiting, file validation, unified proxy for Signal Engine
Frontend (Next.js) 113 TSX components, 36 BFF proxy routes, SSR, App Router, design system
Signal Engine (FastAPI) Extracted standalone service; runs the 6-stage AI pipeline, venture memos, calendar CRUD, and pipeline status endpoints
PostgreSQL + PostGIS Relational data store with geo-spatial extensions, 41 Flyway migrations including autonomous table management
MinIO S3-compatible object storage with owner-based access isolation
Nginx TLS termination, reverse proxy, blue-green upstream routing

Engineering Highlights

1. Industrial-Grade Verification Framework

The platform boasts a rigorous 209+ test suite comprising over 35,000 lines of test code. Testing goes far beyond standard unit coverage to encompass adversarial E2E verification (verify_flow.ps1) and contract testing for all components:

  • Singleton Testcontainers: Database and external dependencies are managed as static singletons in tests, drastically cutting down context load times to achieve testing cadences upwards of 500+ LOC/hr during active hardening.
  • Mandatory Schema Validation: Every payload across the frontend API, backend proxy, and Python Signal Engine bounds check inputs rigidly, ensuring no malformed requests permeate the execution context.
  • Contract & Gap Closure: A specialized Feb 2026 test gap closure initiative fortified structural endpoints involving complex multi-step state mutations (like user invites joining companies).

2. Security-Hardened Two-Phase Authentication

We built a complete OpenID Connect (OIDC) integration for LinkedIn, seamlessly bridged to a mandatory internal Step 0 Profile Setup, governed by a robust, zero-trust perimeter:

sequenceDiagram
    actor User as Investor/Founder
    participant BFF as Next.js BFF
    participant Auth as Spring Security Firewall
    participant OIDC as LinkedIn OIDC
    participant DB as PostgreSQL

    User->>BFF: Click "Continue with LinkedIn"
    BFF->>Auth: Request OAuth Redirect URI
    Auth-->>BFF: 302 Found (LinkedIn Auth URL)
    BFF-->>User: Redirect to LinkedIn
    User->>OIDC: Authenticate & Consent
    OIDC-->>Auth: Callback with Authorization Code
    Auth->>OIDC: Exchange Code for Access Token
    OIDC-->>Auth: ID Token & Profile Data
    
    rect rgb(20, 40, 20)
        Note over Auth,DB: Zero-Trust Security Perimeter
        Auth->>DB: Upsert User Profile
        Auth->>Auth: Sign HS256 Stateless JWT
    end
    
    Auth-->>BFF: Set-Cookie: JWT (HttpOnly, Secure, SameSite=Strict)
    
    alt Mandatory Profile Missing
        BFF-->>User: 302 Redirect to /onboarding (Step 0)
    else Profile Complete
        BFF-->>User: 302 Redirect to /dashboard
    end
Loading
  • Intelligent Redirection: The OAuth callback handler correctly routes new vs. returning users. If a user authenticates via LinkedIn but lacks mandatory profile details (like a profile picture or role), they are gated inside the Profile Setup screen.
  • Strict Request Matchers: The Spring Security firewall enforces strict filter ordering, rejecting unauthenticated traffic traversing Next.js BFF routes before processing user-specific controllers.
  • Zero-Latency UI: The "Mechanical Luxury" frontend design language ensures immediate tactile feedback. Authentication transitions happen optimistically to prevent loading stutters.

3. Decoupled Enterprise-Grade Hotswap Deployments

The backend application and the Python-powered Signal Engine are decoupled into isolated CI/CD pipelines, yet both rely on enterprise-grade zero-downtime hotswapping:

  1. Independent Builds: The Signal Engine natively builds on the VPS via specialized GitHub Actions to manage heavy PyTorch dependencies efficiently.
  2. Container Hotswapping: The deploy script bootstraps a new container via docker compose --profile signal-engine pull && up -d --no-deps.
  3. Rigorous Health Check: The pipeline waits for up to 20 cycles against the /health REST endpoint of the newly spawned instance. If it drops connection or faults due to data-layer permissions, the pipeline triggers an automated instant rollback.
  4. Traffic Transition: Once healthy, traffic seamlessly switches to the new container signature via native internal Docker routing, totally shielding the Next.js frontend and Spring backends from the transition.

4. Enterprise-Grade AI Signal Intelligence

The core AI Signal Intelligence pipeline operates as a proprietary black-box engine within the Leyoda ecosystem. It is seamlessly integrated as a standalone service that ingests complex unstructured data and outputs high-fidelity investment signals.

To protect the intellectual property and competitive advantage of the analysis methodologies, the internal architecture, extraction mechanisms, and specific pipeline stages are kept strictly confidential. The system communicates securely with the rest of the application via strict proxy structures while maintaining robust state management and fault tolerance.

5. BFF Proxy Architecture

The frontend proxies all backend requests through 36 Next.js API routes (/api/*), hiding the internal backend URL entirely:

Browser → Next.js BFF (/api/v1/*) → Spring Boot Backend (:8080/api/v1/*)
  • JWT tokens are stored in HttpOnly, Secure, SameSite cookies — invisible to client-side JavaScript.
  • Server-side cookie injection/extraction happens in the BFF layer.
  • The browser never learns the backend URL — complete API isolation, eliminating common XSS-based token theft vectors and simplifying CORS configuration.

Security Design

Threat Mitigation
Token theft (XSS) JWT stored in HttpOnly + Secure + SameSite cookies; BFF proxy hides tokens; SessionCreationPolicy.STATELESS
Machine Context Leaks Zero Trust Machine identity with domain cross-service separation
Malware uploads ClamAV antivirus scan on every uploaded file before persistence to MinIO
Unauthorised access Owner-based access control via strictly ordered Spring Security matchers (hasRole("ADMIN") vs authenticated())
Cross-Site Scripting (XSS) Enforced via strict Content-Security-Policy with explicit directives (default-src 'self', script-src 'self' 'unsafe-eval'), overriding browser defaults
Network Interception httpStrictTransportSecurity (HSTS) max age set to 31,536,000 seconds (1 year) across all subdomains
CSRF SameSite cookie policy + CORS origin whitelisting (CorsConfigurationSource)
API enumeration Backend URLs hidden behind Next.js BFF proxy; no direct browser→backend path
Password reset abuse 32-byte SecureRandom tokens, 1-hour TTL, single active token per user
Invalid data injection 3-tier validation: Zod (frontend) → pre-submission guard → JSR-303 (backend)

Testing Strategy

Category Framework Scope
Unit Tests JUnit 5 + Mockito Service layer logic, JWT provider, rate limiting, file validation
Integration Tests Spring Boot Test + Testcontainers Full controller→service→DB flows with real PostgreSQL in Singleton scale
Adversarial E2E PowerShell Custom verify_flow.ps1 to stress test edge authentication parameters
API Tests pytest + FastAPI TestClient Signal Engine endpoints, pipeline stages, health probes
Frontend Validation Biome + TypeScript strict mode Static analysis, lint, type-check, build validation
CI/CD GitHub Actions Backend tests (Testcontainers), frontend lint + type-check + build on push

Project Scale

Metric Value
Application Services 4 (Backend, Frontend, Signal Engine, Nginx)
Infrastructure Services 3 (PostgreSQL + PostGIS, MinIO, ClamAV)
REST Controllers 19
Business Services 22
JPA Entities 15 entities + 21 enums
Flyway Migrations 41
Frontend Components 113 TSX files
BFF Proxy Routes 36
Signal Engine Modules 62 Python files
Total Codebase Volume 90,000+ lines of code spanning across all subsystems
Backend (Java) ~17,800 lines across 225 core component files
Frontend (TS/TSX) ~53,200 lines across 269 source files
Signal Engine (Python) ~16,800 lines across 62 execution modules
Database Properties 90 SQL & Flyway configurations
Verification Fleet 209+ industrial-wide automated test files
Test Code LOC 35,000+ lines defining rigorous schema/flow boundaries
Docker Compose Profiles Advanced layered composability structure mapped to local/VPS targets
CI/CD Workflows Decoupled cross-repo deployment automation mapping to singular domains

Author

Alexandru Cioc

  • GitHub: @WhitehatD
  • Location: Maastricht, Netherlands

Built with ❤️ for the European startup ecosystem

About

AI-powered investor–startup matching platform — architecture, engineering decisions, and technical deep-dive. Java 21 · Spring Boot 3.4 · Next.js 16 · Python/FastAPI · PostgreSQL · Docker

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors