Redrob Ranker

Hybrid retrieval (dense + BM25) → engineered features → LightGBM LambdaRank trained on LLM-generated silver labels. All LLM/network work is offline; rank.py is CPU-only, no-network, and finishes within the 5 min / 16 GB budget.

See PLAN.md for the full architecture and rationale.

Setup

python -m venv .venv && source .venv/bin/activate     # Windows: .\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
# place the data at data/candidates.jsonl (100000 lines)

Build artifacts (offline, may exceed 5 min — network + LLM allowed here)

The offline scripts use the Anthropic SDK and need ANTHROPIC_API_KEY set. Run them as modules (relative imports):

python -m src.jd_parse        # JD -> artifacts/jd_rubric.json (+ requirement probes)
python -m src.embed           # -> embeddings.npy, faiss.index, bm25.pkl, ids.npy, jd_vectors.npy, features.parquet
python -m src.silver_labels   # LLM grades a sample -> artifacts/silver_labels.parquet
python -m src.train_ranker    # -> artifacts/ranker.txt (+ ranker_features.json)
python src/rank.py --candidates ./data/candidates.jsonl --out ./submission.csv  # first pass -> artifacts/top100.json
python -m src.reasoning       # grounded reasoning for the top 100 -> artifacts/reasoning.parquet

Produce the submission (the timed step organizers reproduce — ≤5 min, CPU, NO network)

python src/rank.py --candidates ./data/candidates.jsonl --out ./submission.csv

Validate before uploading

bash scripts/validate.sh ./submission.csv

Reproduce in Docker (matches Stage-3 environment)

docker build -t redrob-ranker .
docker run --rm --network none -m 16g -v "$PWD/data:/app/data" \
  redrob-ranker python src/rank.py --candidates ./data/candidates.jsonl --out ./submission.csv

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
artifacts		artifacts
data		data
eval		eval
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
PLAN.md		PLAN.md
README.docx		README.docx
README.md		README.md
candidate_schema.json		candidate_schema.json
conftest.py		conftest.py
job_description.docx		job_description.docx
redrob_signals_doc.docx		redrob_signals_doc.docx
requirements.txt		requirements.txt
sample_candidates.json		sample_candidates.json
sample_submission.csv		sample_submission.csv
submission_metadata.yaml		submission_metadata.yaml
submission_metadata_template.yaml		submission_metadata_template.yaml
submission_spec.docx		submission_spec.docx
validate_submission.py		validate_submission.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Redrob Ranker

Setup

Build artifacts (offline, may exceed 5 min — network + LLM allowed here)

Produce the submission (the timed step organizers reproduce — ≤5 min, CPU, NO network)

Validate before uploading

Reproduce in Docker (matches Stage-3 environment)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Redrob Ranker

Setup

Build artifacts (offline, may exceed 5 min — network + LLM allowed here)

Produce the submission (the timed step organizers reproduce — ≤5 min, CPU, NO network)

Validate before uploading

Reproduce in Docker (matches Stage-3 environment)

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages