Rank the Top 100 Senior AI Engineers from a pool of 100k+ candidates — fast, accurate, and fully explainable.
Live demo: Staged Hybrid Ranking Engine (SHRE) — a Hugging Face Space by Aditya1002
This repository implements a Hybrid Architecture (Anomaly Pre-Filter → Enriched Feature Engineering → ML Ensemble → Learning-to-Rank) with a pure-Python CTAE Fallback wrapper for absolute reliability. It extends a 78-feature base model with four targeted enhancements for deeper JD understanding, richer signal integration, and more accurate, explainable shortlists — all fully open-source and zero extra cost.
Built for: Data & AI Challenge — Intelligent Candidate Discovery. A system that doesn't just filter, but intelligently ranks: deep job understanding, contextual relevance beyond keywords, full signal integration, and a lightning-fast, expertly-ranked shortlist with grounded reasoning.
The engine is deployed and ready to use as a Hugging Face Space:
Staged Hybrid Ranking Engine (SHRE) — a Hugging Face Space by Aditya1002
Paste a job description (optional), upload a candidates.jsonl, and get a ranked shortlist with grounded reasoning plus downloadable CSV and XLSX outputs.
The application interface: pipeline overview, the four enhancements, and the candidate workflow.
| Task | Rank & shortlist the Top 100 Senior AI Engineers from 100k+ profiles |
| Architecture | 4-stage hybrid: anomaly filter → 93 features → ensemble → LambdaMART, with CTAE fallback |
| Semantic engine | Multi-vector all-MiniLM-L6-v2 embeddings + FAISS (TF-IDF graceful fallback) |
| Held-out accuracy | 90.7% validation / 88.0% test (93-feature LTR run) |
| Ranking quality | NDCG@10 0.991, NDCG@100 0.997; honest hard-slice Spearman 0.894 |
| Inference | Loads saved models — no retraining — for fast scoring at pool scale |
| Deep JD understanding | Paste any JD (--jd); it re-targets the experience gate + semantic fit |
| Reliability | Automatic fallback chain: LTR → validated ensemble → pure-Python CTAE |
- Live Demo
- The Challenge → How SHRE answers it
- The Four Enhancements
- Architecture Overview
- Pipeline Flow
- Deep Job Understanding
- Fast Inference vs. Retraining
- Installation
- How to Run
- Performance Summary
- Scientific Validation Gallery
- Repository Structure
- Tech Stack
- Limitations & Future Work
- Team
| Challenge requirement | How SHRE delivers it |
|---|---|
| Deep Job Understanding — interpret complex, nuanced JDs | JobDescription parser turns any raw JD into 3 semantic facets + an experience band (--jd), re-targeting both the Stage-1 gate and the semantic-fit signal |
| Contextual Relevance — see beyond keywords | Multi-vector transformer embeddings (skills / trajectory / full profile) matched against JD facets via FAISS + weighted fusion |
| Signal Integration — profile + career + behavioral signals | 93 dense features: 78 base + 5 anomaly + 5 behavioral (recruiter demand, OSS, reliability) + 5 semantic |
| The Output — fast, accurate, expertly-ranked shortlist | LambdaMART LTR fused with the ensemble; fast inference mode; top-100 with grounded, non-hallucinated reasoning |
| # | Enhancement | What it does | Graceful degradation |
|---|---|---|---|
| 1 | Multi-Vector Semantic Layer | Separately embeds candidate skills, experience trajectory, and full profile with all-MiniLM-L6-v2 + FAISS retrieval, matches each against three JD facets, and fuses them with weighted similarity. |
Falls back to a scikit-learn TF-IDF encoder if transformers/FAISS are unavailable. |
| 2 | LambdaMART / XGBoost-LTR head | An XGBRanker (rank:ndcg) stacked on the ensemble's class-probability meta-feature and fused with the ensemble ordering to optimize the full ranked list. |
Falls back to the validated ensemble ordering. |
| 3 | Enhanced Honeypot / Anomaly Detection | A multi-signal pre-filter catching timeline overlaps, impossible skill durations, and synthetic-profile flags; its anomaly score also feeds the model. | Continuous score is always produced; never blocks the pipeline. |
| 4 | Behavioral Scoring Module | Distills under-utilized platform activity / recruiter-demand / OSS / reliability signals into interpretable sub-scores. | Neutral defaults for missing signals. |
| Capability | Base model | Opus 4.8 (this repo) |
|---|---|---|
| Features | 78 base | 93 (+anomaly +behavioral +semantic) |
| Relevance signal | Ensemble class-prob vote | Ensemble + LambdaMART LTR fusion |
| Job understanding | Hardcoded role | Parsed from any JD (--jd) |
| Semantic matching | — | Multi-vector transformer + FAISS |
| Behavioral signals | Partial (a few) | Full demand/OSS/reliability sub-scores |
| Anomaly handling | 1.05x / 1.5x heuristics | Multi-signal scored detector |
| Scoring at scale | Retrain each run | Fast inference (load saved models) |
| Macro-F1 (5-fold) | 0.731 | 0.794 |
| SHRE (primary) | CTAE (fallback) | |
|---|---|---|
| Type | ML ensemble + LTR + semantics | Pure-Python rule engine |
| Dependencies | xgboost, lightgbm, catboost, transformers… | None (standard library only) |
| When it runs | Default | If any SHRE stage / import fails |
| Output | Identical CSV schema | Identical CSV schema |
| Purpose | Maximum ranking quality | Never fail to produce a shortlist |
The system processes candidate data through four stages:
- Stage 1 (Anomaly Pre-Filter):
AnomalyDetectordrops synthetic/honeypot profiles (timeline, skill, and synthetic anomalies), then gates on a JD-driven experience band (parsed from the supplied job description, or the default 5–9 target) and a minimum of 2 skill pillars. - Stage 2 (Enriched Feature Engineering): Computes 93 dense signals = 78 base (career progression, domain specialization in RAG/LLMs/Vector DBs, company classification, platform interactions) + 5 anomaly + 5 behavioral + 5 multi-vector semantic features.
- Stage 3 (Ensemble + Learning-to-Rank): A Voting Ensemble (XGBoost + LightGBM + CatBoost) — trained with leakage-safe SMOTE inside CV — produces a class-probability score that, with the enriched features, feeds a LambdaMART (XGBoost
rank:ndcg) head; the two are fused into the final ranking score. - Stage 4 (Ranker & Reasoning): Sorts the pool and builds data-backed, non-hallucinated reasoning (citing semantic fit, behavioral signals, and anomaly checks) for each of the top 100. Emits the canonical
submission.csv, an enrichedsubmission_detailed.csv, and a formattedsubmission.xlsx.
If any library or model load fails, the pipeline automatically falls back: LTR → validated ensemble → pure-Python CTAE ranker.
By default Stage 3 runs in fast inference mode — it loads the saved ensemble + LambdaMART artifacts and scores the pool with no retraining (the path that scales to a 100k+ pool). --train forces a full retrain, and --jd re-targets the role (see How to Run).
flowchart TD
A["candidates.jsonl<br/>(100k+ raw profiles)"] --> B
subgraph S1["Stage 1 - Anomaly Pre-Filter"]
B["AnomalyDetector<br/>timeline / skill / synthetic checks"] --> C{"Synthetic?<br/>or out of JD<br/>experience band?<br/>or < 2 pillars?"}
end
C -- "drop" --> X["filtered out"]
C -- "keep" --> D
subgraph S2["Stage 2 - Feature Engineering (93 signals)"]
D["78 base features"] --> E["+5 anomaly +5 behavioral<br/>+5 multi-vector semantic"]
end
JD["Job Description<br/>(--jd: text or file)"] -. "facets + experience band" .-> B
JD -. "3 JD facets" .-> E
E --> F
subgraph S3["Stage 3 - Ensemble + Learning-to-Rank"]
F["Voting Ensemble<br/>XGBoost + LightGBM + CatBoost<br/>(leakage-safe SMOTE in CV)"] --> G["ensemble score<br/>(meta-feature)"]
G --> H["LambdaMART<br/>XGBRanker rank:ndcg"]
H --> I["Rank Fusion<br/>0.6 ensemble + 0.4 LTR"]
end
I --> J
subgraph S4["Stage 4 - Ranker & Reasoning"]
J["Sort + grounded reasoning"] --> K["submission.csv (Top 100)<br/>submission_detailed.csv<br/>submission.xlsx + rankings_full.csv"]
end
F -. "on failure" .-> CTAE["CTAE fallback<br/>(pure-Python rule ranker)"]
H -. "on failure" .-> F
CTAE --> K
classDef io fill:#0b3d2e,stroke:#10b981,color:#d1fae5;
classDef jd fill:#3b1f4b,stroke:#a855f7,color:#f3e8ff;
class A,K io;
class JD jd;
flowchart LR
LTR["LambdaMART LTR head"] -->|fails| ENS["Validated Voting Ensemble"]
ENS -->|fails| CTAE["Pure-Python CTAE ranker"]
LTR -.->|preferred| OUT(["Top-100 shortlist"])
ENS -.-> OUT
CTAE -.-> OUT
classDef ok fill:#0b3d2e,stroke:#10b981,color:#d1fae5;
class OUT ok;
The target role is no longer hardcoded. src/shre/job_description.py parses any raw job description — text or file — into the three semantic facets the engine matches against, plus an experience band that re-targets the Stage-1 gate.
Define the role: paste a job description to re-target the experience gate and the semantic-fit signal.
flowchart LR
R["Raw JD text"] --> P["JobDescription parser<br/>(section + regex heuristics)"]
P --> F1["required_skills facet"]
P --> F2["ideal_experience facet"]
P --> F3["role_mission facet"]
P --> B["experience band<br/>min / max / target years"]
F1 & F2 & F3 --> SEM["Multi-Vector Semantic Layer"]
B --> GATE["Stage-1 experience gate"]
Example: a Staff ML Engineer (8–12 yrs) JD tightens the experience gate and re-anchors semantic fit, so a different shortlist surfaces than the default Founding Senior AI Engineer (5–9 yrs) role — without retraining.
The parsed role flows all the way through to the deliverable. The JD parser also extracts a role title, which (with the experience band) is threaded into Stage 4 so the ranked
submission.xlsxsubtitle and run logs name the actual role being hired for — e.g. "Staff Machine Learning Engineer · 8–12 yrs target" — instead of always reading "Founding Senior AI Engineer". If no title is detectable it falls back to a neutral "Custom Role (from JD)" rather than mislabelling the canonical role.
The supervised ensemble is still trained on labels for the founding-engineer role. A custom JD re-targets the JD-relative semantic-fit signal and the hard experience gate; the learned "higher fit implies higher relevance" relationship is what transfers across roles. This trade-off is documented honestly rather than hidden.
The training path re-fits the full XGB+LGBM+CatBoost ensemble and the LambdaMART head — great for refreshing the model, but the opposite of lightning-fast on a 100k pool.
| Mode | Trigger | What happens | Use when |
|---|---|---|---|
| Inference (default) | saved artifacts exist | loads ensemble + LTR + scaler/selector, scores the pool — no retraining | scoring a large/real pool fast |
| Train | --train flag or missing models |
full leakage-safe retrain, then score + persist artifacts | features/labels changed |
# Fast inference (default): scores with the saved models, no retraining
python -m src.main data/candidates.jsonl output/submission.csv
# Force a full retrain
python -m src.main data/candidates.jsonl output/submission.csv --trainIf inference fails (missing/incompatible artifacts), it automatically falls back to the full training path, then to CTAE.
To set up the environment and install all dependencies:
pip install -r requirements.txtNote on the semantic encoder. The transformer stack needs
huggingface-hub < 1.0(pinned inrequirements.txt); newer hub releases breaktransformers/sentence-transformersand silently degrade the layer to TF-IDF.scikit-learnis pinned< 1.6to match the bundled model artifacts and the XGBoost/LightGBM sklearn wrappers, andnumpyis pinned< 2.0because the saved pickles and scikit-learn 1.5.x were built against the numpy 1.x ABI.
Run the end-to-end pipeline to process candidates and output the final rankings:
python -m src.main data/candidates.jsonl output/submission.csvTrying it out of the box. The full
data/candidates.jsonlpool is not shipped (it's.gitignored). A ready-to-run 498-profile sample is bundled, so you can reproduce a complete run immediately:python -m src.main data/candidates_demo.jsonl output/submission.csv
By default the pipeline runs in fast inference mode when trained artifacts
exist in models/ — it loads the saved ensemble + LambdaMART head and scores
the pool without any retraining, which is what makes large-pool ranking
lightning-fast. To force a full retrain (e.g. after changing features or
labels), add --train:
python -m src.main data/candidates.jsonl output/submission.csv --trainDeep Job Understanding (custom JD). The target role is no longer hardcoded.
Pass any job description as raw text or a file with --jd; it is parsed into
the three semantic facets (skills / experience / mission) and an experience
band, which re-target the Stage-1 gate and the semantic-fit signal:
python -m src.main data/candidates.jsonl output/submission.csv --jd data/sample_jd.txtIf --jd is omitted, the canonical "Founding Senior AI Engineer" role (the role
the 498 labels were judged for) is used, so behaviour is unchanged. When a
custom JD is supplied, the parsed role title and target band also flow into
the run logs and the submission.xlsx subtitle, so the deliverable names the
exact role it ranked for.
Run the enhanced end-to-end test (modules, enrichment, LTR pipeline, CTAE fallback) and the ablation study:
python test_enhanced.py
python analysis/ablation_enhanced.pyThe original base test suite is still available via python test_pipeline.py.
Run the Streamlit application to upload candidate batches and interactively view profiles, scores, and rationales (with an optional JD text box):
streamlit run sandbox/app.py
Upload a candidates.jsonl batch; the pipeline filters, scores, and ranks the pool.
Every run writes the ranked shortlist as both an Excel workbook and CSVs to the output directory:
| File | Format | Columns / Sheets | Purpose |
|---|---|---|---|
submission.xlsx |
XLSX | Sheets: Top 100, Full Rankings, Summary |
The primary ranked deliverable — formatted recommended shortlist |
submission.csv |
CSV | candidate_id, rank, score, reasoning |
Canonical Top-100 (clean 4 columns) |
submission_detailed.csv |
CSV | …, semantic_fit, behavioral_score, anomaly_score, anomaly_flags, reasoning |
Top-100 with the enriched signals exposed |
rankings_full.csv |
CSV | candidate_id, rank, score, reasoning |
Every viable candidate, fully ranked |
The XLSX name is derived from the output path you pass — e.g.
output/submission.csvproducesoutput/submission.xlsx— and is generated on both the SHRE and CTAE paths. Ifopenpyxlis unavailable the CSVs still write (the workbook is skipped gracefully).
Summary metrics: candidates shortlisted, top score, and average semantic / behavioral scores.
Ranked shortlist with progress-bar scores, enrichment signals, grounded reasoning, and CSV / XLSX downloads.
A professionally-formatted Excel workbook built for reviewers:
| Sheet | Contents |
|---|---|
| Top 100 | Recommended shortlist — rank · candidate_id · score · semantic_fit · behavioral_score · anomaly_score · anomaly_flags · reasoning |
| Full Rankings | Every viable candidate, same columns |
| Summary | Run statistics — top/mean score, avg semantic / behavioral / anomaly of the shortlist |
Formatting: indigo title banner with role + timestamp + score definition, styled header, banded rows and borders, a red to amber to green colour scale on the score column, frozen header, an auto-filter for sort/filter, 4-decimal number formats, and a wrapped reasoning column.
Example console output (sample run, inference mode):
=== RUNNING SHRE (Enhanced ML Pipeline - 'Opus 4.8' Grade) ===
JD[default] 'Founding Senior AI Engineer' exp 3-15y (target 5-9y); facets: skills=348 chars, mission=264 chars
Stage 1: Filtered 498 down to 293 viable candidates.
Stage 2: Extracted 93 enriched features.
Stage 3: Inference mode (scoring with saved models, no retraining).
- Ranking head: LambdaMART fused with ensemble (inference)
Writing top 100 to output/submission.csv...
Writing ranked XLSX to output/submission.xlsx... Done!
With a custom
--jd, the first line instead names the parsed role, e.g.JD[file] 'Staff Machine Learning Engineer' exp 6-18y (target 8-12y); ….
Example shortlist rows (real output — reasoning is generated from actual profile data, never hallucinated):
| rank | candidate_id | score | reasoning (truncated) |
|---|---|---|---|
| 1 | CAND_0072688 |
1.000 | Data Scientist, 6.9 yrs at Niramai, specializing in vector search and RAG (Milvus); strong semantic alignment to the JD (esp. experience trajectory); very high recruiter responsiveness; high recruiter demand… |
| 2 | CAND_0044890 |
0.596 | AI Research Engineer, 5.0 yrs at Haptik, vector search & RAG (FAISS); strong semantic JD alignment; active GitHub presence; high recruiter demand. |
| 3 | CAND_0030061 |
0.409 | Data Analyst, 5.3 yrs at Ola, applied ML (Python); strong semantic JD alignment; active GitHub; reliable follow-through. |
| Configuration | Features | Accuracy | Macro-F1 |
|---|---|---|---|
| Base (78 features) | 78 | 0.833 | 0.731 |
| + Anomaly + Behavioral | 88 | 0.845 | 0.766 |
| + Semantic (full) | 93 | 0.866 | 0.794 |
| Metric | Validation | Test (held-out) |
|---|---|---|
| Accuracy | 0.907 | 0.880 |
| Macro Precision | 0.883 | 0.823 |
| Macro Recall | 0.871 | 0.876 |
| Macro F1 | 0.874 | 0.834 |
| Ranking metric | Score |
|---|---|
| NDCG@10 (fused) | 0.991 |
| NDCG@100 (fused) | 0.997 |
| Pure LambdaMART NDCG@10 | 0.974 |
| Hard-slice NDCG@10 (relevance 1 vs 2) | 1.000 |
| Spearman (full held-out fold) | 0.894 |
Test-set confusion matrix (rows = true, cols = predicted; classes 0–3):
| Pred 0 | Pred 1 | Pred 2 | Pred 3 | |
|---|---|---|---|---|
| True 0 | 38 | 4 | 0 | 0 |
| True 1 | 0 | 14 | 1 | 0 |
| True 2 | 0 | 1 | 8 | 3 |
| True 3 | 0 | 0 | 0 | 6 |
Because the 498 labels are cleanly rule-separable, full-set NDCG is near-ceiling and overstates difficulty. We therefore also report two harder, more discriminating diagnostics every training run:
- Hard-slice NDCG@10 — restricted to borderline candidates (relevance 1 vs 2), removing the trivially-separable 0 and 3 classes.
- Spearman rank correlation over the full held-out fold (≈ 0.89) — clearly sub-ceiling, and the most honest measure of how well the engine orders the confusable middle.
- Primary Model: Voting Ensemble (XGBoost + LightGBM + CatBoost) + LambdaMART LTR head
- Semantic Encoder:
sentence-transformers/all-MiniLM-L6-v2+ FAISS (TF-IDF fallback) - Fallback Model: Rule-based CTAE Ranker (Pure Python, zero-dependency)
Reproduce:
python test_enhanced.py(end-to-end) andpython analysis/ablation_enhanced.py(ablation).
A 9-phase, leakage-free validation suite (analysis/) regenerates every figure from the phase summary artifacts in analysis_results/ — no number is typed by hand. Highlights below; full write-up in analysis_results/COMPETITION_REPORT.md.
The validation accuracy plateaus at the full 498 samples — the core signal is captured, with only mild train/val overfitting mitigated by soft-voting.
| Train data used | Samples | Train acc | Val acc | Val F1 |
|---|---|---|---|---|
| 20% | 99 | 0.997 | 0.838 | 0.658 |
| 40% | 199 | 1.000 | 0.809 | 0.714 |
| 60% | 298 | 0.999 | 0.849 | 0.793 |
| 80% | 398 | 0.998 | 0.857 | 0.793 |
| 100% | 498 | 0.996 | 0.868 | 0.806 |
Signal concentrates in profile depth (summary_length), skill depth (avg_skill_duration_months), and domain longevity (domain_x_years).
Global and local SHAP attributions explain why each candidate scores as it does — high ideal_years_score and domain_llm_score push toward ideal hire; long notice periods push down.
Combining all feature categories beats any single group; the ensemble soft-votes for lower variance.
Per-model comparison (5-fold) — XGBoost leads individually; the ensemble trades a hair of accuracy for stability:
| Model | Accuracy | Precision | Recall | F1 |
|---|---|---|---|---|
| XGBoost | 0.855 | 0.780 | 0.803 | 0.787 |
| LightGBM | 0.839 | 0.760 | 0.763 | 0.756 |
| CatBoost | 0.837 | 0.771 | 0.812 | 0.782 |
| Ensemble (soft-vote) | 0.851 | 0.774 | 0.788 | 0.774 |
Feature-group ablation — every group adds signal; "All features" wins, and engagement-only is the weakest standalone (confirming behavioral signals help but aren't sufficient alone):
| Feature group | # Features | Accuracy | F1 |
|---|---|---|---|
| All features | 62 | 0.845 | 0.766 |
| technical | 31 | 0.767 | 0.643 |
| other | 7 | 0.757 | 0.637 |
| experience | 12 | 0.753 | 0.662 |
| interaction | 8 | 0.751 | 0.644 |
| engagement | 6 | 0.562 | 0.465 |
- Stability: Acc 85.9% ± 3.0%, Macro-F1 78.9% ± 4.3% over 10x5 repeated stratified CV — ACCEPTABLE (CV 3.5%).
- Honeypot detection: 71.6% overall — 100% on structural anomalies (flat / impossible-skills / random-noise), weak on keyword-stuffing (handled by the Stage-1 rule filter, by design).
- Ranking on holdout: NDCG@100 0.9591, Hit@5 / Hit@10 = 100%.
Honeypot detection by attack type (250 synthetic adversarial profiles):
| Attack type | Detected | Verdict |
|---|---|---|
| Flat_Profile | 100% | Caught outright |
| Impossible_Skills | 100% | Caught outright |
| Random_Noise | 100% | Caught outright |
| Minimal_Profile | 58% | Partial (near Class-0/1 boundary) |
| Keyword_Stuffing | 0% | Missed (needs Stage-1 keyword-density cap) |
Only 7 / 75 held-out samples misclassified (9.3% error), concentrated on the Class 2↔3 ideal-hire boundary and rich-but-shallow Class 0→1 profiles.
|-- requirements.txt # Main project dependencies (pinned for reproducibility)
|-- submission_metadata.yaml # Submission metadata
|-- README.md # This file
|-- src/
| |-- main.py # Pipeline entry (inference by default; --train, --jd)
| |-- shre/
| | |-- job_description.py # Deep JD understanding: parse JD -> facets + exp band
| | |-- stage1_filter.py # Anomaly pre-filter + JD-driven experience/pillar gates
| | |-- anomaly.py # Feature 3: Enhanced honeypot/anomaly detection
| | |-- behavioral.py # Feature 4: Behavioral scoring module
| | |-- semantic.py # Feature 1: Multi-vector semantic layer (+ FAISS), JD-aware
| | |-- stage2_features.py # 78 base features + enrichment pass (-> 93)
| | |-- stage3_ranking_validated.py # Voting ensemble (leakage-safe SMOTE)
| | |-- stage3_ranking_ltr.py # Feature 2: LambdaMART/XGBoost-LTR head (+ honest metrics)
| | |-- inference.py # Fast inference-only scoring (no retraining)
| | |-- stage4_submit.py # Ranked top-100 + enriched reasoning + XLSX export
| |-- ctae/ # Fallback rule-based engine
| |-- common/ # Config, data loader, validator, logging
|-- analysis/ # 9-phase scientific validation suite (+ ablation_enhanced.py)
|-- analysis_results/ # Regenerated charts + COMPETITION_REPORT.md
|-- validation/ # Independent validation harness
|-- labeling/ # 498 labeled examples (combined_labels.json) + guide
|-- test_enhanced.py # Enhanced end-to-end test (18 checks)
|-- models/ # Trained models, scalers, selectors, LTR, encoder & metadata
|-- sandbox/ # Streamlit web UI code
|-- data/ # Candidate schema, samples, sample JD
| Layer | Technology |
|---|---|
| Language | Python 3.9+ |
| Gradient boosting | XGBoost · LightGBM · CatBoost (soft-voting ensemble) |
| Learning-to-Rank | XGBoost XGBRanker (rank:ndcg / LambdaMART) |
| Semantics | sentence-transformers/all-MiniLM-L6-v2 + FAISS (TF-IDF fallback) |
| Class balance | imbalanced-learn SMOTE (inside CV folds) |
| Explainability | SHAP, permutation importance |
| App / Demo | Streamlit (deployed as a Hugging Face Space) |
| Output | openpyxl (formatted ranked submission.xlsx) + CSV |
| Fallback | Pure-Python CTAE rule engine (zero dependency) |
| AI models used | GLM 5.2 · Claude |
We report limitations openly — each is paired with a concrete mitigation path.
| # | Limitation | Why it matters | Mitigation |
|---|---|---|---|
| 1 | Keyword-stuffing susceptibility | The statistical model can be swayed by keyword-padded profiles that inflate apparent relevance. | Enforce a hard, rule-based keyword-density ceiling in the Stage-1 filter (before scoring). |
| 2 | Class-3 data scarcity | Only 38 labeled ideal-hire samples limit visibility into the top class, widening uncertainty at the very top of the ranking. | Run active-learning cycles to label 50+ candidates near the Class 2/3 boundary. |
| 3 | Single-role supervision | Labels were judged for the founding-engineer role. A custom --jd re-targets the semantic + gate signals, but the supervised relevance model itself stays role-anchored. |
Collect labels per role family, or train a JD-conditioned ranker. |
| 4 | Small held-out set (75 samples) | The 90.7% point estimate carries a non-trivial confidence interval. | Trust the 50-run stability band (85.9% ± 3.0%) as the more reliable expectation; expand the held-out set as more labels arrive. |
In short: the engine is statistically sound, stable, and explainable today — with leakage control and a graceful fallback — and every known gap above has a clear, low-cost path forward.
| Member | |
|---|---|
| 👨💻 Aditya Pandey | Pipeline, ML & deployment |
| 👩💻 Palak Rai | Team member |
| 👨💻 Avik Srivastava | Team member |
Project links
| Resource | Where |
|---|---|
| 🤗 Live demo | Staged Hybrid Ranking Engine (SHRE) — a Hugging Face Space by Aditya1002 |
| 🐙 GitHub | see submission_metadata.yaml |
| 🧪 Sandbox | Streamlit Space (link in submission_metadata.yaml) |
python -m src.main data/candidates.jsonl output/submission.csv |















