Career intelligence Platform. Not a job board. Not a chatbot.
Horizon builds a personalized, evidence-grounded career roadmap for every user — grounded in real job descriptions, real interview signals, and a self-improving knowledge graph that gets smarter with every request. It combines a living Neo4j career graph, parallel LLM synthesis over real web evidence, and a rigorous company-fit scoring engine into a single async platform.
Going live as a beta soon.
Career guidance is either too generic (LinkedIn Learning paths) or too expensive (career coaches). Neither is grounded in live market reality. Most AI career tools hallucinate paths, ignore the actual hiring bar, and give zero citations for their advice.
Horizon is built differently: every career path it generates is triangulated from real sources, every company fit score is computed against a live JD signal, and the system's knowledge compounds across users through a shared graph.
Resume Parsing — PDF ingested via PyMuPDF, converted to Markdown, then parsed by Gemini 2.5 Flash into a structured schema (education, skills, projects). Skills are immediately passed through a normalizer to canonicalize synonyms ("ReactJS" → "React", "Postgres" → "PostgreSQL").
MBTI Personality Assessment — a lightweight, database-backed questionnaire system. Questions are randomly sampled per MBTI dimension from MongoDB, presented to the user, and scored via normalized Likert scaling. The resulting personality type is stored on the user profile and used to weight path preferences.
User profiles are stored in MongoDB with full async access via Motor.
The flagship feature. Given a user's skill stack and preferences, Horizon builds a 5-path career roadmap with 4+ concrete stages each, every claim sourced from real web evidence.
Pipeline:
Skills → Graph Traversal → Tavily Evidence Fetch → Gemini Synthesis → Citation Resolution → Graph Learning
Step 1 — Graph-first archetype discovery
Before calling any LLM, Horizon queries the Neo4j career graph. It finds the top roles by weighted skill overlap (REQUIRES edge weights), then traverses TRANSITIONS_TO edges up to 15 hops to find the farthest reachable terminal role. This returns validated career trajectories from historical data — not LLM guesses.
If the graph has fewer than 5 trajectory matches (cold start), it falls back to Gemini for archetype generation.
Step 2 — Parallel evidence fetch
For each archetype, Tavily runs an advanced search constrained to high-signal domains:
reddit.com, news.ycombinator.com, teamblind.com, indiehackers.com,
linkedin.com, medium.com, github.com, netflixtechblog.com,
engineering.fb.com, openai.com/research
Up to 14 results per archetype are fetched in parallel (asyncio.gather). Each result is tagged with a SOURCE_REF_N identifier and injected into the synthesis prompt.
Step 3 — Grounded synthesis
Gemini 2.5 Flash generates the full CareerTree JSON against strict rules:
- Every stage must be triangulated from ≥3 source references
fit_scoreis a cold probability — accounts for skill gaps and market reality, not encouragementeta_monthsis grounded in evidence patterns, not LLM priorstop_opportunitiesmust be real, named roles or programsobserved_pathsextracts actual career progressions seen in the evidence
The model returns structured output against a Pydantic schema with response_mime_type="application/json" — no parsing ambiguity.
Step 4 — Citation resolution + graph learning
SOURCE_REF_N tags in stage citations are resolved back to real URLs. Then observed_paths (career sequences extracted from evidence, e.g. ["SWE Intern", "SWE II", "Senior SWE", "Staff SWE"]) are written back to Neo4j via evolve_paths. Every synthesis run makes future traversals smarter.
Trees are cached in Redis for 24 hours (horizon:tree:v7:{user_id}).
Given a target role and a list of companies, Horizon generates a structured fit analysis card per company — in parallel.
JD Fetching — Gemini + Google Search grounding fetches live job descriptions from Greenhouse, Lever, and company career pages. Returns a structured {skills, resp} object. JDs are cached in Redis for 30 minutes keyed on (role, company, location).
Scoring Rubric
A (90-100): >80% stack match + production proof in target ecosystem
B (75-89): >50% match, bridgeable via sibling tech (React→Vue, Java→Kotlin)
C (60-74): <50% match, paradigm shift required, 3+ month ramp
D (<60): Core engineering pillars missing
Modifiers:
FAANG/unicorn experience → +5 pts
Level mismatch → hard cap 20
Ecosystem lock-in → hard cap 30
Each AdvisoryCard includes: fit score, hiring bar difficulty, top 10 skill gaps strictly absent from the user's stack, a ≤10-word brutal verdict, 3-4 verb-first actionable steps with named technologies, and the single highest-leverage advisory insight.
Cards are generated in parallel via asyncio.gather. Every fresh JD fetch (cache miss) triggers a graph.evolve(role, skills) call — strengthening the role→skill graph.
The Neo4j graph is not a static knowledge base. It evolves continuously from two signal streams:
Signal 1 — JD fetches (discover pipeline)
On every cache miss, extracted JD skills are written as weighted REQUIRES edges from a Role node to Skill nodes. Edge weights increment on repeated observations — roles accumulate stronger skill associations over time.
MERGE (r:Role {name: toLower($role)})
MERGE (s:Skill {name: toLower(raw)})
MERGE (r)-[e:REQUIRES]->(s)
ON CREATE SET e.weight = 1.0, e.count = 1
ON MATCH SET e.count = e.count + 1, e.weight = e.weight + 0.1Signal 2 — Career tree synthesis
After every tree generation, observed career progressions extracted from evidence are ingested as TRANSITIONS_TO edges between consecutive role pairs. The more synthesis runs, the richer the transition graph becomes.
MERGE (r1:Role {name: toLower(path[i])})
MERGE (r2:Role {name: toLower(path[i+1])})
MERGE (r1)-[t:TRANSITIONS_TO]->(r2)
ON MATCH SET t.count = t.count + 1Trajectory traversal (find_trajectories) uses weighted skill overlap to identify a starting role, then walks TRANSITIONS_TO edges up to 15 hops to find the terminal node — returning the full path as prior context for synthesis.
This is a compound flywheel: more users → more JD fetches + synthesis runs → denser graph → better trajectory priors → higher-quality career trees.
┌─────────────────────────────────┐
│ FastAPI Backend │
│ (fully async, uvicorn) │
└──────────┬──────────────┬───────┘
│ │
┌────────────────▼──┐ ┌────▼────────────────┐
│ Career Tree │ │ Advisory Cards │
│ tree.py │ │ discover.py │
└──────┬────────────┘ └──────────┬──────────┘
│ │
┌───────────▼──────────────────────────────▼───────────┐
│ neo_graph.py │
│ Neo4j | REQUIRES edges | TRANSITIONS_TO edges │
│ Graph-first retrieval | Continuous evolution │
└──────────────────────────────────────────────────────┘
│ │
┌───────────▼──────┐ ┌─────────────▼──────────┐
│ Tavily Search │ │ Gemini 2.5 Flash │
│ (multi-key) │ │ (structured output) │
└──────────────────┘ └────────────────────────┘
│
┌───────────▼──────────────────────────────────────────┐
│ Redis (aioredis) │
│ Tree cache 24h | Intel cache 7min | JD 30min │
└──────────────────────────────────────────────────────┘
│
┌───────────▼──────────────────────────────────────────┐
│ MongoDB (Motor async) │
│ User profiles | MBTI questions │
└──────────────────────────────────────────────────────┘
| Layer | Technology |
|---|---|
| API | FastAPI, Uvicorn |
| AI | Gemini 2.5 Flash, Gemini 2.5 Flash Lite |
| Web Search | Tavily (advanced, multi-key rotation) |
| Graph DB | Neo4j (async driver) |
| Primary DB | MongoDB (Motor async + PyMongo) |
| Cache | Redis (aioredis) |
| Resume Parsing | PyMuPDF, pymupdf4llm |
| Auth | JWT (HS256) + bcrypt |
| Validation | Pydantic v2 |
Graph before LLM. Archetype discovery queries Neo4j first. The LLM is a fallback, not the default. As the graph matures, cold-start LLM calls become progressively rarer and the system's prior knowledge grows denser.
Evidence over inference. Career paths are built from real Reddit threads, Blind posts, engineering blogs, and LinkedIn stories. Every citation in a generated tree resolves to an actual source URL — zero hallucinated advice.
Structured output everywhere. Every Gemini call uses response_mime_type="application/json" bound to a Pydantic schema. The model cannot return malformed output — no regex fallbacks, no output ambiguity at any layer.
Cache tiered by data volatility. Tree cache runs 24h (user profiles rarely change meaningfully), company intel at 7 minutes (market-sensitive), JDs at 30 minutes (semi-stable). Each layer is independently invalidated.
Cost observability from day one. Every Gemini call logs input/output token counts and cost in INR with per-operation attribution (fetch_jd, build_card, etc.). Built to run a credit-based product without financial blind spots.
Fully async, no blocking. FastAPI + Motor + aioredis + asyncio.gather throughout. All blocking I/O (Tavily, sync PyMongo) is isolated in asyncio.to_thread. The event loop never blocks.
horizon/
├── main.py # FastAPI app, routing, market intel
├── tree.py # Career tree generation pipeline
├── discover.py # Company advisory card engine
├── neo_graph.py # Neo4j graph — evolution + traversal
├── ops.py # JWT auth, MongoDB client, cost logging
└── onboarding/
├── models.py # Pydantic schemas (User, Profile, etc.)
├── user.py # MongoDB user CRUD + skill normalization
├── parse_resume.py # PDF → structured data via Gemini
├── mbti_questionnaire.py # MBTI sampling + scoring engine
└── normalizer/
└── normalizer.py # Skill canonicalization
| Method | Endpoint | Description |
|---|---|---|
POST |
/auth/register |
Register + issue JWT |
POST |
/auth/login |
Login + issue JWT |
POST |
/users/me/resume |
Upload PDF, parse to structured profile |
GET |
/personality/questions |
Fetch MBTI questionnaire |
POST |
/users/me/personality |
Submit answers, compute MBTI type |
POST |
/discover/search |
Generate company advisory cards |
GET |
/career/tree |
Generate full 5-path career roadmap |
Active development. Live beta coming soon.
Horizon is infrastructure for career intelligence. The graph learns. The advice gets sharper. The market signal is always live.