🌐 Horizon

Career intelligence Platform. Not a job board. Not a chatbot.

Horizon builds a personalized, evidence-grounded career roadmap for every user — grounded in real job descriptions, real interview signals, and a self-improving knowledge graph that gets smarter with every request. It combines a living Neo4j career graph, parallel LLM synthesis over real web evidence, and a rigorous company-fit scoring engine into a single async platform.

Going live as a beta soon.

❌ The Core Problem

Career guidance is either too generic (LinkedIn Learning paths) or too expensive (career coaches). Neither is grounded in live market reality. Most AI career tools hallucinate paths, ignore the actual hiring bar, and give zero citations for their advice.

Horizon is built differently: every career path it generates is triangulated from real sources, every company fit score is computed against a live JD signal, and the system's knowledge compounds across users through a shared graph.

⚙️ How It Works

📥 Onboarding Pipeline

Resume Parsing — PDF ingested via PyMuPDF, converted to Markdown, then parsed by Gemini 2.5 Flash into a structured schema (education, skills, projects). Skills are immediately passed through a normalizer to canonicalize synonyms ("ReactJS" → "React", "Postgres" → "PostgreSQL").

MBTI Personality Assessment — a lightweight, database-backed questionnaire system. Questions are randomly sampled per MBTI dimension from MongoDB, presented to the user, and scored via normalized Likert scaling. The resulting personality type is stored on the user profile and used to weight path preferences.

User profiles are stored in MongoDB with full async access via Motor.

🌳 Career Tree Generation (`tree.py`)

The flagship feature. Given a user's skill stack and preferences, Horizon builds a 5-path career roadmap with 4+ concrete stages each, every claim sourced from real web evidence.

Pipeline:

Skills → Graph Traversal → Tavily Evidence Fetch → Gemini Synthesis → Citation Resolution → Graph Learning

Step 1 — Graph-first archetype discovery

Before calling any LLM, Horizon queries the Neo4j career graph. It finds the top roles by weighted skill overlap (REQUIRES edge weights), then traverses TRANSITIONS_TO edges up to 15 hops to find the farthest reachable terminal role. This returns validated career trajectories from historical data — not LLM guesses.

If the graph has fewer than 5 trajectory matches (cold start), it falls back to Gemini for archetype generation.

Step 2 — Parallel evidence fetch

For each archetype, Tavily runs an advanced search constrained to high-signal domains:

reddit.com, news.ycombinator.com, teamblind.com, indiehackers.com,
linkedin.com, medium.com, github.com, netflixtechblog.com,
engineering.fb.com, openai.com/research

Up to 14 results per archetype are fetched in parallel (asyncio.gather). Each result is tagged with a SOURCE_REF_N identifier and injected into the synthesis prompt.

Step 3 — Grounded synthesis

Gemini 2.5 Flash generates the full CareerTree JSON against strict rules:

Every stage must be triangulated from ≥3 source references
fit_score is a cold probability — accounts for skill gaps and market reality, not encouragement
eta_months is grounded in evidence patterns, not LLM priors
top_opportunities must be real, named roles or programs
observed_paths extracts actual career progressions seen in the evidence

The model returns structured output against a Pydantic schema with response_mime_type="application/json" — no parsing ambiguity.

Step 4 — Citation resolution + graph learning

SOURCE_REF_N tags in stage citations are resolved back to real URLs. Then observed_paths (career sequences extracted from evidence, e.g. ["SWE Intern", "SWE II", "Senior SWE", "Staff SWE"]) are written back to Neo4j via evolve_paths. Every synthesis run makes future traversals smarter.

Trees are cached in Redis for 24 hours (horizon:tree:v7:{user_id}).

🧾 Company Advisory Cards (`discover.py`)

Given a target role and a list of companies, Horizon generates a structured fit analysis card per company — in parallel.

JD Fetching — Gemini + Google Search grounding fetches live job descriptions from Greenhouse, Lever, and company career pages. Returns a structured {skills, resp} object. JDs are cached in Redis for 30 minutes keyed on (role, company, location).

Scoring Rubric

A (90-100):  >80% stack match + production proof in target ecosystem
B (75-89):   >50% match, bridgeable via sibling tech (React→Vue, Java→Kotlin)
C (60-74):   <50% match, paradigm shift required, 3+ month ramp
D (<60):     Core engineering pillars missing

Modifiers:
  FAANG/unicorn experience  →  +5 pts
  Level mismatch            →  hard cap 20
  Ecosystem lock-in         →  hard cap 30

Each AdvisoryCard includes: fit score, hiring bar difficulty, top 10 skill gaps strictly absent from the user's stack, a ≤10-word brutal verdict, 3-4 verb-first actionable steps with named technologies, and the single highest-leverage advisory insight.

Cards are generated in parallel via asyncio.gather. Every fresh JD fetch (cache miss) triggers a graph.evolve(role, skills) call — strengthening the role→skill graph.

🧠 The Self-Improving Career Graph (`neo_graph.py`)

The Neo4j graph is not a static knowledge base. It evolves continuously from two signal streams:

Signal 1 — JD fetches (discover pipeline)

On every cache miss, extracted JD skills are written as weighted REQUIRES edges from a Role node to Skill nodes. Edge weights increment on repeated observations — roles accumulate stronger skill associations over time.

MERGE (r:Role {name: toLower($role)})
MERGE (s:Skill {name: toLower(raw)})
MERGE (r)-[e:REQUIRES]->(s)
  ON CREATE SET e.weight = 1.0, e.count = 1
  ON MATCH  SET e.count = e.count + 1, e.weight = e.weight + 0.1

Signal 2 — Career tree synthesis

After every tree generation, observed career progressions extracted from evidence are ingested as TRANSITIONS_TO edges between consecutive role pairs. The more synthesis runs, the richer the transition graph becomes.

MERGE (r1:Role {name: toLower(path[i])})
MERGE (r2:Role {name: toLower(path[i+1])})
MERGE (r1)-[t:TRANSITIONS_TO]->(r2)
  ON MATCH SET t.count = t.count + 1

Trajectory traversal (find_trajectories) uses weighted skill overlap to identify a starting role, then walks TRANSITIONS_TO edges up to 15 hops to find the terminal node — returning the full path as prior context for synthesis.

This is a compound flywheel: more users → more JD fetches + synthesis runs → denser graph → better trajectory priors → higher-quality career trees.

Architecture

                    ┌─────────────────────────────────┐
                    │           FastAPI Backend       │
                    │         (fully async, uvicorn)  │
                    └──────────┬──────────────┬───────┘
                               │              │
              ┌────────────────▼──┐      ┌────▼────────────────┐
              │   Career Tree     │      │  Advisory Cards     │
              │   tree.py         │      │  discover.py        │
              └──────┬────────────┘      └──────────┬──────────┘
                     │                              │
         ┌───────────▼──────────────────────────────▼───────────┐
         │                  neo_graph.py                        │
         │   Neo4j  |  REQUIRES edges |  TRANSITIONS_TO edges   │
         │   Graph-first retrieval |  Continuous evolution      │
         └──────────────────────────────────────────────────────┘
                     │                              │
         ┌───────────▼──────┐         ┌─────────────▼──────────┐
         │  Tavily Search   │         │   Gemini 2.5 Flash     │
         │  (multi-key)     │         │   (structured output)  │
         └──────────────────┘         └────────────────────────┘
                     │
         ┌───────────▼──────────────────────────────────────────┐
         │                Redis (aioredis)                      │
         │   Tree cache 24h  |  Intel cache 7min  |  JD 30min   │
         └──────────────────────────────────────────────────────┘
                     │
         ┌───────────▼──────────────────────────────────────────┐
         │              MongoDB (Motor async)                   │
         │         User profiles  |  MBTI questions             │
         └──────────────────────────────────────────────────────┘

Tech Stack

Layer	Technology
API	FastAPI, Uvicorn
AI	Gemini 2.5 Flash, Gemini 2.5 Flash Lite
Web Search	Tavily (advanced, multi-key rotation)
Graph DB	Neo4j (async driver)
Primary DB	MongoDB (Motor async + PyMongo)
Cache	Redis (aioredis)
Resume Parsing	PyMuPDF, pymupdf4llm
Auth	JWT (HS256) + bcrypt
Validation	Pydantic v2

Key Design Decisions

Graph before LLM. Archetype discovery queries Neo4j first. The LLM is a fallback, not the default. As the graph matures, cold-start LLM calls become progressively rarer and the system's prior knowledge grows denser.

Evidence over inference. Career paths are built from real Reddit threads, Blind posts, engineering blogs, and LinkedIn stories. Every citation in a generated tree resolves to an actual source URL — zero hallucinated advice.

Structured output everywhere. Every Gemini call uses response_mime_type="application/json" bound to a Pydantic schema. The model cannot return malformed output — no regex fallbacks, no output ambiguity at any layer.

Cache tiered by data volatility. Tree cache runs 24h (user profiles rarely change meaningfully), company intel at 7 minutes (market-sensitive), JDs at 30 minutes (semi-stable). Each layer is independently invalidated.

Cost observability from day one. Every Gemini call logs input/output token counts and cost in INR with per-operation attribution (fetch_jd, build_card, etc.). Built to run a credit-based product without financial blind spots.

Fully async, no blocking. FastAPI + Motor + aioredis + asyncio.gather throughout. All blocking I/O (Tavily, sync PyMongo) is isolated in asyncio.to_thread. The event loop never blocks.

Project Structure

horizon/
├── main.py               # FastAPI app, routing, market intel
├── tree.py               # Career tree generation pipeline
├── discover.py           # Company advisory card engine
├── neo_graph.py          # Neo4j graph — evolution + traversal
├── ops.py                # JWT auth, MongoDB client, cost logging
└── onboarding/
    ├── models.py          # Pydantic schemas (User, Profile, etc.)
    ├── user.py            # MongoDB user CRUD + skill normalization
    ├── parse_resume.py    # PDF → structured data via Gemini
    ├── mbti_questionnaire.py  # MBTI sampling + scoring engine
    └── normalizer/
        └── normalizer.py  # Skill canonicalization

API Reference

Method	Endpoint	Description
`POST`	`/auth/register`	Register + issue JWT
`POST`	`/auth/login`	Login + issue JWT
`POST`	`/users/me/resume`	Upload PDF, parse to structured profile
`GET`	`/personality/questions`	Fetch MBTI questionnaire
`POST`	`/users/me/personality`	Submit answers, compute MBTI type
`POST`	`/discover/search`	Generate company advisory cards
`GET`	`/career/tree`	Generate full 5-path career roadmap

Status

Active development. Live beta coming soon.

Rushikesh Yeole

Horizon is infrastructure for career intelligence. The graph learns. The advice gets sharper. The market signal is always live.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
backend		backend
interface		interface
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌐 Horizon

❌ The Core Problem

⚙️ How It Works

📥 Onboarding Pipeline

🌳 Career Tree Generation (`tree.py`)

🧾 Company Advisory Cards (`discover.py`)

🧠 The Self-Improving Career Graph (`neo_graph.py`)

Architecture

Tech Stack

Key Design Decisions

Project Structure

API Reference

Status

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🌐 Horizon

❌ The Core Problem

⚙️ How It Works

📥 Onboarding Pipeline

🌳 Career Tree Generation (tree.py)

🧾 Company Advisory Cards (discover.py)

🧠 The Self-Improving Career Graph (neo_graph.py)

Architecture

Tech Stack

Key Design Decisions

Project Structure

API Reference

Status

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages

🌳 Career Tree Generation (`tree.py`)

🧾 Company Advisory Cards (`discover.py`)

🧠 The Self-Improving Career Graph (`neo_graph.py`)