Skip to content

Aditya060806/Redrob-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Staged Hybrid Ranking Engine (SHRE)

An intelligent, explainable AI recruiter

Rank the Top 100 Senior AI Engineers from a pool of 100k+ candidates — fast, accurate, and fully explainable.

Python scikit-learn XGBoost LightGBM CatBoost

Sentence-Transformers FAISS Streamlit Cost Status

Hugging Face Space

Live demo: Staged Hybrid Ranking Engine (SHRE) — a Hugging Face Space by Aditya1002

SHRE — application landing view

This repository implements a Hybrid Architecture (Anomaly Pre-Filter → Enriched Feature Engineering → ML Ensemble → Learning-to-Rank) with a pure-Python CTAE Fallback wrapper for absolute reliability. It extends a 78-feature base model with four targeted enhancements for deeper JD understanding, richer signal integration, and more accurate, explainable shortlists — all fully open-source and zero extra cost.

Built for: Data & AI Challenge — Intelligent Candidate Discovery. A system that doesn't just filter, but intelligently ranks: deep job understanding, contextual relevance beyond keywords, full signal integration, and a lightning-fast, expertly-ranked shortlist with grounded reasoning.

Live Demo

The engine is deployed and ready to use as a Hugging Face Space:

Staged Hybrid Ranking Engine (SHRE) — a Hugging Face Space by Aditya1002

Paste a job description (optional), upload a candidates.jsonl, and get a ranked shortlist with grounded reasoning plus downloadable CSV and XLSX outputs.

SHRE interface — overview and feature highlights
The application interface: pipeline overview, the four enhancements, and the candidate workflow.

At a glance

Task Rank & shortlist the Top 100 Senior AI Engineers from 100k+ profiles
Architecture 4-stage hybrid: anomaly filter → 93 features → ensemble → LambdaMART, with CTAE fallback
Semantic engine Multi-vector all-MiniLM-L6-v2 embeddings + FAISS (TF-IDF graceful fallback)
Held-out accuracy 90.7% validation / 88.0% test (93-feature LTR run)
Ranking quality NDCG@10 0.991, NDCG@100 0.997; honest hard-slice Spearman 0.894
Inference Loads saved models — no retraining — for fast scoring at pool scale
Deep JD understanding Paste any JD (--jd); it re-targets the experience gate + semantic fit
Reliability Automatic fallback chain: LTR → validated ensemble → pure-Python CTAE

Table of Contents


The Challenge → How SHRE answers it

Challenge requirement How SHRE delivers it
Deep Job Understanding — interpret complex, nuanced JDs JobDescription parser turns any raw JD into 3 semantic facets + an experience band (--jd), re-targeting both the Stage-1 gate and the semantic-fit signal
Contextual Relevance — see beyond keywords Multi-vector transformer embeddings (skills / trajectory / full profile) matched against JD facets via FAISS + weighted fusion
Signal Integration — profile + career + behavioral signals 93 dense features: 78 base + 5 anomaly + 5 behavioral (recruiter demand, OSS, reliability) + 5 semantic
The Output — fast, accurate, expertly-ranked shortlist LambdaMART LTR fused with the ensemble; fast inference mode; top-100 with grounded, non-hallucinated reasoning

The Four Enhancements

# Enhancement What it does Graceful degradation
1 Multi-Vector Semantic Layer Separately embeds candidate skills, experience trajectory, and full profile with all-MiniLM-L6-v2 + FAISS retrieval, matches each against three JD facets, and fuses them with weighted similarity. Falls back to a scikit-learn TF-IDF encoder if transformers/FAISS are unavailable.
2 LambdaMART / XGBoost-LTR head An XGBRanker (rank:ndcg) stacked on the ensemble's class-probability meta-feature and fused with the ensemble ordering to optimize the full ranked list. Falls back to the validated ensemble ordering.
3 Enhanced Honeypot / Anomaly Detection A multi-signal pre-filter catching timeline overlaps, impossible skill durations, and synthetic-profile flags; its anomaly score also feeds the model. Continuous score is always produced; never blocks the pipeline.
4 Behavioral Scoring Module Distills under-utilized platform activity / recruiter-demand / OSS / reliability signals into interpretable sub-scores. Neutral defaults for missing signals.

Base model → "Opus 4.8" (this repo)

Capability Base model Opus 4.8 (this repo)
Features 78 base 93 (+anomaly +behavioral +semantic)
Relevance signal Ensemble class-prob vote Ensemble + LambdaMART LTR fusion
Job understanding Hardcoded role Parsed from any JD (--jd)
Semantic matching Multi-vector transformer + FAISS
Behavioral signals Partial (a few) Full demand/OSS/reliability sub-scores
Anomaly handling 1.05x / 1.5x heuristics Multi-signal scored detector
Scoring at scale Retrain each run Fast inference (load saved models)
Macro-F1 (5-fold) 0.731 0.794

Two engines, one guarantee

SHRE (primary) CTAE (fallback)
Type ML ensemble + LTR + semantics Pure-Python rule engine
Dependencies xgboost, lightgbm, catboost, transformers… None (standard library only)
When it runs Default If any SHRE stage / import fails
Output Identical CSV schema Identical CSV schema
Purpose Maximum ranking quality Never fail to produce a shortlist

Architecture Overview

The system processes candidate data through four stages:

  1. Stage 1 (Anomaly Pre-Filter): AnomalyDetector drops synthetic/honeypot profiles (timeline, skill, and synthetic anomalies), then gates on a JD-driven experience band (parsed from the supplied job description, or the default 5–9 target) and a minimum of 2 skill pillars.
  2. Stage 2 (Enriched Feature Engineering): Computes 93 dense signals = 78 base (career progression, domain specialization in RAG/LLMs/Vector DBs, company classification, platform interactions) + 5 anomaly + 5 behavioral + 5 multi-vector semantic features.
  3. Stage 3 (Ensemble + Learning-to-Rank): A Voting Ensemble (XGBoost + LightGBM + CatBoost) — trained with leakage-safe SMOTE inside CV — produces a class-probability score that, with the enriched features, feeds a LambdaMART (XGBoost rank:ndcg) head; the two are fused into the final ranking score.
  4. Stage 4 (Ranker & Reasoning): Sorts the pool and builds data-backed, non-hallucinated reasoning (citing semantic fit, behavioral signals, and anomaly checks) for each of the top 100. Emits the canonical submission.csv, an enriched submission_detailed.csv, and a formatted submission.xlsx.

If any library or model load fails, the pipeline automatically falls back: LTR → validated ensemble → pure-Python CTAE ranker.

By default Stage 3 runs in fast inference mode — it loads the saved ensemble + LambdaMART artifacts and scores the pool with no retraining (the path that scales to a 100k+ pool). --train forces a full retrain, and --jd re-targets the role (see How to Run).


Pipeline Flow

flowchart TD
    A["candidates.jsonl<br/>(100k+ raw profiles)"] --> B

    subgraph S1["Stage 1 - Anomaly Pre-Filter"]
        B["AnomalyDetector<br/>timeline / skill / synthetic checks"] --> C{"Synthetic?<br/>or out of JD<br/>experience band?<br/>or &lt; 2 pillars?"}
    end
    C -- "drop" --> X["filtered out"]
    C -- "keep" --> D

    subgraph S2["Stage 2 - Feature Engineering (93 signals)"]
        D["78 base features"] --> E["+5 anomaly +5 behavioral<br/>+5 multi-vector semantic"]
    end

    JD["Job Description<br/>(--jd: text or file)"] -. "facets + experience band" .-> B
    JD -. "3 JD facets" .-> E

    E --> F

    subgraph S3["Stage 3 - Ensemble + Learning-to-Rank"]
        F["Voting Ensemble<br/>XGBoost + LightGBM + CatBoost<br/>(leakage-safe SMOTE in CV)"] --> G["ensemble score<br/>(meta-feature)"]
        G --> H["LambdaMART<br/>XGBRanker rank:ndcg"]
        H --> I["Rank Fusion<br/>0.6 ensemble + 0.4 LTR"]
    end

    I --> J

    subgraph S4["Stage 4 - Ranker & Reasoning"]
        J["Sort + grounded reasoning"] --> K["submission.csv (Top 100)<br/>submission_detailed.csv<br/>submission.xlsx + rankings_full.csv"]
    end

    F -. "on failure" .-> CTAE["CTAE fallback<br/>(pure-Python rule ranker)"]
    H -. "on failure" .-> F
    CTAE --> K

    classDef io fill:#0b3d2e,stroke:#10b981,color:#d1fae5;
    classDef jd fill:#3b1f4b,stroke:#a855f7,color:#f3e8ff;
    class A,K io;
    class JD jd;
Loading

Fallback reliability chain

flowchart LR
    LTR["LambdaMART LTR head"] -->|fails| ENS["Validated Voting Ensemble"]
    ENS -->|fails| CTAE["Pure-Python CTAE ranker"]
    LTR -.->|preferred| OUT(["Top-100 shortlist"])
    ENS -.-> OUT
    CTAE -.-> OUT
    classDef ok fill:#0b3d2e,stroke:#10b981,color:#d1fae5;
    class OUT ok;
Loading

Deep Job Understanding (custom JD)

The target role is no longer hardcoded. src/shre/job_description.py parses any raw job description — text or file — into the three semantic facets the engine matches against, plus an experience band that re-targets the Stage-1 gate.

SHRE — paste a job description to re-target the ranking
Define the role: paste a job description to re-target the experience gate and the semantic-fit signal.

flowchart LR
    R["Raw JD text"] --> P["JobDescription parser<br/>(section + regex heuristics)"]
    P --> F1["required_skills facet"]
    P --> F2["ideal_experience facet"]
    P --> F3["role_mission facet"]
    P --> B["experience band<br/>min / max / target years"]
    F1 & F2 & F3 --> SEM["Multi-Vector Semantic Layer"]
    B --> GATE["Stage-1 experience gate"]
Loading

Example: a Staff ML Engineer (8–12 yrs) JD tightens the experience gate and re-anchors semantic fit, so a different shortlist surfaces than the default Founding Senior AI Engineer (5–9 yrs) role — without retraining.

The parsed role flows all the way through to the deliverable. The JD parser also extracts a role title, which (with the experience band) is threaded into Stage 4 so the ranked submission.xlsx subtitle and run logs name the actual role being hired for — e.g. "Staff Machine Learning Engineer · 8–12 yrs target" — instead of always reading "Founding Senior AI Engineer". If no title is detectable it falls back to a neutral "Custom Role (from JD)" rather than mislabelling the canonical role.

The supervised ensemble is still trained on labels for the founding-engineer role. A custom JD re-targets the JD-relative semantic-fit signal and the hard experience gate; the learned "higher fit implies higher relevance" relationship is what transfers across roles. This trade-off is documented honestly rather than hidden.


Fast Inference vs. Retraining

The training path re-fits the full XGB+LGBM+CatBoost ensemble and the LambdaMART head — great for refreshing the model, but the opposite of lightning-fast on a 100k pool.

Mode Trigger What happens Use when
Inference (default) saved artifacts exist loads ensemble + LTR + scaler/selector, scores the pool — no retraining scoring a large/real pool fast
Train --train flag or missing models full leakage-safe retrain, then score + persist artifacts features/labels changed
# Fast inference (default): scores with the saved models, no retraining
python -m src.main data/candidates.jsonl output/submission.csv

# Force a full retrain
python -m src.main data/candidates.jsonl output/submission.csv --train

If inference fails (missing/incompatible artifacts), it automatically falls back to the full training path, then to CTAE.


Installation

To set up the environment and install all dependencies:

pip install -r requirements.txt

Note on the semantic encoder. The transformer stack needs huggingface-hub < 1.0 (pinned in requirements.txt); newer hub releases break transformers/sentence-transformers and silently degrade the layer to TF-IDF. scikit-learn is pinned < 1.6 to match the bundled model artifacts and the XGBoost/LightGBM sklearn wrappers, and numpy is pinned < 2.0 because the saved pickles and scikit-learn 1.5.x were built against the numpy 1.x ABI.


How to Run

1. Primary Ranking Pipeline

Run the end-to-end pipeline to process candidates and output the final rankings:

python -m src.main data/candidates.jsonl output/submission.csv

Trying it out of the box. The full data/candidates.jsonl pool is not shipped (it's .gitignored). A ready-to-run 498-profile sample is bundled, so you can reproduce a complete run immediately:

python -m src.main data/candidates_demo.jsonl output/submission.csv

By default the pipeline runs in fast inference mode when trained artifacts exist in models/ — it loads the saved ensemble + LambdaMART head and scores the pool without any retraining, which is what makes large-pool ranking lightning-fast. To force a full retrain (e.g. after changing features or labels), add --train:

python -m src.main data/candidates.jsonl output/submission.csv --train

Deep Job Understanding (custom JD). The target role is no longer hardcoded. Pass any job description as raw text or a file with --jd; it is parsed into the three semantic facets (skills / experience / mission) and an experience band, which re-target the Stage-1 gate and the semantic-fit signal:

python -m src.main data/candidates.jsonl output/submission.csv --jd data/sample_jd.txt

If --jd is omitted, the canonical "Founding Senior AI Engineer" role (the role the 498 labels were judged for) is used, so behaviour is unchanged. When a custom JD is supplied, the parsed role title and target band also flow into the run logs and the submission.xlsx subtitle, so the deliverable names the exact role it ranked for.

2. Validation & Testing

Run the enhanced end-to-end test (modules, enrichment, LTR pipeline, CTAE fallback) and the ablation study:

python test_enhanced.py
python analysis/ablation_enhanced.py

The original base test suite is still available via python test_pipeline.py.

3. Interactive Sandbox Demo

Run the Streamlit application to upload candidate batches and interactively view profiles, scores, and rationales (with an optional JD text box):

streamlit run sandbox/app.py

SHRE — upload candidates and run the pipeline
Upload a candidates.jsonl batch; the pipeline filters, scores, and ranks the pool.

What you'll see — outputs

Every run writes the ranked shortlist as both an Excel workbook and CSVs to the output directory:

File Format Columns / Sheets Purpose
submission.xlsx XLSX Sheets: Top 100, Full Rankings, Summary The primary ranked deliverable — formatted recommended shortlist
submission.csv CSV candidate_id, rank, score, reasoning Canonical Top-100 (clean 4 columns)
submission_detailed.csv CSV …, semantic_fit, behavioral_score, anomaly_score, anomaly_flags, reasoning Top-100 with the enriched signals exposed
rankings_full.csv CSV candidate_id, rank, score, reasoning Every viable candidate, fully ranked

The XLSX name is derived from the output path you pass — e.g. output/submission.csv produces output/submission.xlsx — and is generated on both the SHRE and CTAE paths. If openpyxl is unavailable the CSVs still write (the workbook is skipped gracefully).

SHRE — summary metrics for the ranked shortlist
Summary metrics: candidates shortlisted, top score, and average semantic / behavioral scores.

SHRE — ranked shortlist with scores, signals, reasoning and downloads
Ranked shortlist with progress-bar scores, enrichment signals, grounded reasoning, and CSV / XLSX downloads.

The ranked XLSX deliverable (submission.xlsx)

A professionally-formatted Excel workbook built for reviewers:

Sheet Contents
Top 100 Recommended shortlist — rank · candidate_id · score · semantic_fit · behavioral_score · anomaly_score · anomaly_flags · reasoning
Full Rankings Every viable candidate, same columns
Summary Run statistics — top/mean score, avg semantic / behavioral / anomaly of the shortlist

Formatting: indigo title banner with role + timestamp + score definition, styled header, banded rows and borders, a red to amber to green colour scale on the score column, frozen header, an auto-filter for sort/filter, 4-decimal number formats, and a wrapped reasoning column.

Example console output (sample run, inference mode):

=== RUNNING SHRE (Enhanced ML Pipeline - 'Opus 4.8' Grade) ===
    JD[default] 'Founding Senior AI Engineer' exp 3-15y (target 5-9y); facets: skills=348 chars, mission=264 chars
Stage 1: Filtered 498 down to 293 viable candidates.
Stage 2: Extracted 93 enriched features.
Stage 3: Inference mode (scoring with saved models, no retraining).
  - Ranking head: LambdaMART fused with ensemble (inference)
Writing top 100 to output/submission.csv...
Writing ranked XLSX to output/submission.xlsx...  Done!

With a custom --jd, the first line instead names the parsed role, e.g. JD[file] 'Staff Machine Learning Engineer' exp 6-18y (target 8-12y); ….

Example shortlist rows (real output — reasoning is generated from actual profile data, never hallucinated):

rank candidate_id score reasoning (truncated)
1 CAND_0072688 1.000 Data Scientist, 6.9 yrs at Niramai, specializing in vector search and RAG (Milvus); strong semantic alignment to the JD (esp. experience trajectory); very high recruiter responsiveness; high recruiter demand…
2 CAND_0044890 0.596 AI Research Engineer, 5.0 yrs at Haptik, vector search & RAG (FAISS); strong semantic JD alignment; active GitHub presence; high recruiter demand.
3 CAND_0030061 0.409 Data Analyst, 5.3 yrs at Ola, applied ML (Python); strong semantic JD alignment; active GitHub; reliable follow-through.

Performance Summary

Feature ablation (5-fold) — the enhancements measurably help classification

Configuration Features Accuracy Macro-F1
Base (78 features) 78 0.833 0.731
+ Anomaly + Behavioral 88 0.845 0.766
+ Semantic (full) 93 0.866 0.794

Latest training run — 93 features, transformer semantics (models/metadata_ltr.json)

Metric Validation Test (held-out)
Accuracy 0.907 0.880
Macro Precision 0.883 0.823
Macro Recall 0.871 0.876
Macro F1 0.874 0.834
Ranking metric Score
NDCG@10 (fused) 0.991
NDCG@100 (fused) 0.997
Pure LambdaMART NDCG@10 0.974
Hard-slice NDCG@10 (relevance 1 vs 2) 1.000
Spearman (full held-out fold) 0.894

Test-set confusion matrix (rows = true, cols = predicted; classes 0–3):

Pred 0 Pred 1 Pred 2 Pred 3
True 0 38 4 0 0
True 1 0 14 1 0
True 2 0 1 8 3
True 3 0 0 0 6

Honest reporting of ranking quality

Because the 498 labels are cleanly rule-separable, full-set NDCG is near-ceiling and overstates difficulty. We therefore also report two harder, more discriminating diagnostics every training run:

  • Hard-slice NDCG@10 — restricted to borderline candidates (relevance 1 vs 2), removing the trivially-separable 0 and 3 classes.
  • Spearman rank correlation over the full held-out fold (≈ 0.89) — clearly sub-ceiling, and the most honest measure of how well the engine orders the confusable middle.
  • Primary Model: Voting Ensemble (XGBoost + LightGBM + CatBoost) + LambdaMART LTR head
  • Semantic Encoder: sentence-transformers/all-MiniLM-L6-v2 + FAISS (TF-IDF fallback)
  • Fallback Model: Rule-based CTAE Ranker (Pure Python, zero-dependency)

Reproduce: python test_enhanced.py (end-to-end) and python analysis/ablation_enhanced.py (ablation).


Scientific Validation Gallery

A 9-phase, leakage-free validation suite (analysis/) regenerates every figure from the phase summary artifacts in analysis_results/no number is typed by hand. Highlights below; full write-up in analysis_results/COMPETITION_REPORT.md.

Learning curve & dataset sufficiency

The validation accuracy plateaus at the full 498 samples — the core signal is captured, with only mild train/val overfitting mitigated by soft-voting.

Learning curves

Train data used Samples Train acc Val acc Val F1
20% 99 0.997 0.838 0.658
40% 199 1.000 0.809 0.714
60% 298 0.999 0.849 0.793
80% 398 0.998 0.857 0.793
100% 498 0.996 0.868 0.806

Feature importance & model comparison

Signal concentrates in profile depth (summary_length), skill depth (avg_skill_duration_months), and domain longevity (domain_x_years).

Gain-based feature importance Per-model comparison

Permutation importance Feature correlation heatmap

SHAP explainability

Global and local SHAP attributions explain why each candidate scores as it does — high ideal_years_score and domain_llm_score push toward ideal hire; long notice periods push down.

SHAP summary SHAP density

SHAP waterfall - best ranked SHAP waterfall - worst ranked

Ablation — models & feature groups

Combining all feature categories beats any single group; the ensemble soft-votes for lower variance.

Model ablation Feature-group ablation

Per-model comparison (5-fold) — XGBoost leads individually; the ensemble trades a hair of accuracy for stability:

Model Accuracy Precision Recall F1
XGBoost 0.855 0.780 0.803 0.787
LightGBM 0.839 0.760 0.763 0.756
CatBoost 0.837 0.771 0.812 0.782
Ensemble (soft-vote) 0.851 0.774 0.788 0.774

Feature-group ablation — every group adds signal; "All features" wins, and engagement-only is the weakest standalone (confirming behavioral signals help but aren't sufficient alone):

Feature group # Features Accuracy F1
All features 62 0.845 0.766
technical 31 0.767 0.643
other 7 0.757 0.637
experience 12 0.753 0.662
interaction 8 0.751 0.644
engagement 6 0.562 0.465

Stability (50 runs) · Honeypot defense · Ranking quality

  • Stability: Acc 85.9% ± 3.0%, Macro-F1 78.9% ± 4.3% over 10x5 repeated stratified CV — ACCEPTABLE (CV 3.5%).
  • Honeypot detection: 71.6% overall — 100% on structural anomalies (flat / impossible-skills / random-noise), weak on keyword-stuffing (handled by the Stage-1 rule filter, by design).
  • Ranking on holdout: NDCG@100 0.9591, Hit@5 / Hit@10 = 100%.

Honeypot detection by attack type (250 synthetic adversarial profiles):

Attack type Detected Verdict
Flat_Profile 100% Caught outright
Impossible_Skills 100% Caught outright
Random_Noise 100% Caught outright
Minimal_Profile 58% Partial (near Class-0/1 boundary)
Keyword_Stuffing 0% Missed (needs Stage-1 keyword-density cap)

Stability boxplots Honeypot analysis Ranking metrics

Error analysis

Only 7 / 75 held-out samples misclassified (9.3% error), concentrated on the Class 2↔3 ideal-hire boundary and rich-but-shallow Class 0→1 profiles.

Error confusion matrix Error transition flow


Repository Structure

|-- requirements.txt            # Main project dependencies (pinned for reproducibility)
|-- submission_metadata.yaml    # Submission metadata
|-- README.md                   # This file
|-- src/
|   |-- main.py                 # Pipeline entry (inference by default; --train, --jd)
|   |-- shre/
|   |   |-- job_description.py       # Deep JD understanding: parse JD -> facets + exp band
|   |   |-- stage1_filter.py        # Anomaly pre-filter + JD-driven experience/pillar gates
|   |   |-- anomaly.py              # Feature 3: Enhanced honeypot/anomaly detection
|   |   |-- behavioral.py           # Feature 4: Behavioral scoring module
|   |   |-- semantic.py             # Feature 1: Multi-vector semantic layer (+ FAISS), JD-aware
|   |   |-- stage2_features.py      # 78 base features + enrichment pass (-> 93)
|   |   |-- stage3_ranking_validated.py  # Voting ensemble (leakage-safe SMOTE)
|   |   |-- stage3_ranking_ltr.py   # Feature 2: LambdaMART/XGBoost-LTR head (+ honest metrics)
|   |   |-- inference.py            # Fast inference-only scoring (no retraining)
|   |   |-- stage4_submit.py        # Ranked top-100 + enriched reasoning + XLSX export
|   |-- ctae/                   # Fallback rule-based engine
|   |-- common/                 # Config, data loader, validator, logging
|-- analysis/                   # 9-phase scientific validation suite (+ ablation_enhanced.py)
|-- analysis_results/           # Regenerated charts + COMPETITION_REPORT.md
|-- validation/                 # Independent validation harness
|-- labeling/                   # 498 labeled examples (combined_labels.json) + guide
|-- test_enhanced.py            # Enhanced end-to-end test (18 checks)
|-- models/                     # Trained models, scalers, selectors, LTR, encoder & metadata
|-- sandbox/                    # Streamlit web UI code
|-- data/                       # Candidate schema, samples, sample JD

Tech Stack

Layer Technology
Language Python 3.9+
Gradient boosting XGBoost · LightGBM · CatBoost (soft-voting ensemble)
Learning-to-Rank XGBoost XGBRanker (rank:ndcg / LambdaMART)
Semantics sentence-transformers/all-MiniLM-L6-v2 + FAISS (TF-IDF fallback)
Class balance imbalanced-learn SMOTE (inside CV folds)
Explainability SHAP, permutation importance
App / Demo Streamlit (deployed as a Hugging Face Space)
Output openpyxl (formatted ranked submission.xlsx) + CSV
Fallback Pure-Python CTAE rule engine (zero dependency)
AI models used GLM 5.2 · Claude

Limitations & Future Work

We report limitations openly — each is paired with a concrete mitigation path.

# Limitation Why it matters Mitigation
1 Keyword-stuffing susceptibility The statistical model can be swayed by keyword-padded profiles that inflate apparent relevance. Enforce a hard, rule-based keyword-density ceiling in the Stage-1 filter (before scoring).
2 Class-3 data scarcity Only 38 labeled ideal-hire samples limit visibility into the top class, widening uncertainty at the very top of the ranking. Run active-learning cycles to label 50+ candidates near the Class 2/3 boundary.
3 Single-role supervision Labels were judged for the founding-engineer role. A custom --jd re-targets the semantic + gate signals, but the supervised relevance model itself stays role-anchored. Collect labels per role family, or train a JD-conditioned ranker.
4 Small held-out set (75 samples) The 90.7% point estimate carries a non-trivial confidence interval. Trust the 50-run stability band (85.9% ± 3.0%) as the more reliable expectation; expand the held-out set as more labels arrive.

In short: the engine is statistically sound, stable, and explainable today — with leakage control and a graceful fallback — and every known gap above has a clear, low-cost path forward.


Team

🛠️ Team Vandalizers

Intelligent Candidate Discovery & Ranking

Hugging Face Space

Member
👨‍💻 Aditya Pandey Pipeline, ML & deployment
👩‍💻 Palak Rai Team member
👨‍💻 Avik Srivastava Team member

Project links

Resource Where
🤗 Live demo Staged Hybrid Ranking Engine (SHRE) — a Hugging Face Space by Aditya1002
🐙 GitHub see submission_metadata.yaml
🧪 Sandbox Streamlit Space (link in submission_metadata.yaml)
▶️ Reproduce python -m src.main data/candidates.jsonl output/submission.csv

Releases

No releases published

Packages

 
 
 

Contributors