Intelligent Resume Ranking System Powered by Machine Learning & Big Data log of Test Evaluation using Apache Spark 🤖✨
💾💻⭐
Features • Installation • Quick Start • Usage • Results
##🤝 Team Members
- 🎓 Manikesh Kumar , 23BDS032
- 🎓 Amarjeet Raj , 23BDS006
- 🎓 Ojas Jogdand , 23BDS039
- ✨ Features
- 📦 Prerequisites
- 🔧 Installation
- 🚀 Quick Start
- 📖 Usage Guide
- 📊 Project Structure
- 🎯 Expected Outcomes
- 🔍 Cheat Detection
- 📝 Examples
- 🤝 Contributing
- 📄 License
- 🎯 Resume Ranking: Automatically rank resumes based on job description match
- 🧠 ML-Powered Matching: Uses machine learning to extract and compare features
- ⚡ Distributed Processing: Leverages Apache Spark for large-scale data processing
- 🔎 Cheat Detection: Identify suspicious quiz attempts with statistical analysis
- 📊 Data Analytics: Comprehensive analytics on website logs and user behavior
- 🎓 Interactive Labs: Jupyter Notebook labs for learning Apache Spark fundamentals
- 💾 CSV Export: Export ranked results and analytics to CSV format
Before you begin, ensure you have the following installed:
- 🐍 Python >= 3.8
- ☕ Java Development Kit (JDK) >= 8
- 📦 pip (Python package manager)
- 💻 Git
git clone https://github.com/DataScience-ArtificialIntelligence/Resume_Screening_Test_Evaluation.git
cd AI-Resume-Rankerbash
python -m venv venv
venv\Scripts\activate
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txtpip install pyspark findsparkpython -c "import pyspark; print(f'PySpark {pyspark.__version__} installed successfully! ✅')"bash
python main.py
--resumes ./resumes
--jd job_description.txt
--top 5
--output output/ranked_resumes.csv
bash
python evaluate_tests.py
--responses ./responses
--shortlist output/ranked_resumes.csv
--out output/test_results
--min_mcq 8
--min_code 300
Create the following folder structure:
AI-Resume-Ranker/ ├── resumes/ # Place resume files here (.pdf, .docx, .txt) ├── responses/ # JSON files of candidate test responses ├── output/ # Generated output files ├── job_description.txt # Job description file └── ws-logs_filtered.csv # Website logs for analysis
Create a job_description.txt file with the role details:
Role: Machine Learning Engineer
Responsibilities:
- Develop and deploy ML models
- Build data pipelines
Required: Python, Machine Learning, SQL Preferred: TensorFlow, Docker, Kubernetes
bash python main.py --resumes ./resumes --jd job_description.txt --top 10
bash
python evaluate_tests.py
--responses ./responses
--shortlist output/ranked_resumes.csv
--out output/test_results
Open resume.ipynb in Jupyter Notebook for interactive Spark analytics:
bash jupyter notebook resume.ipynb
AI-Resume-Ranker/ │ ├── 📄 main.py # Main resume ranking script ├── 📄 evaluate_tests.py # Test evaluation script ├── 📓 resume.ipynb # Interactive Spark lab notebook │ ├── 📁 utils/ │ ├── extract_text.py # Extract text from PDFs/DOCX │ ├── extract_features.py # Feature extraction from resumes │ ├── ranker.py # Ranking algorithm │ └── test_evaluator.py # Test evaluation logic │ ├── 📁 resumes/ # Resume files (input) ├── 📁 responses/ # Test response files (input) ├── 📁 output/ # Output results │ ├── 📄 requirements.txt # Python dependencies ├── 📄 job_description.txt # Job description template ├── 📄 ws-logs_filtered.csv # Website logs data │ ├── 📄 README.md # This file └── 📄 LICENSE # License file
A CSV file (ranked_resumes.csv) with:
- ✅ Candidate name and file path
- ✅ Overall ranking score (0-100)
- ✅ Skill match percentage
- ✅ Experience level match
- ✅ Ranking position
Example Output:
name,file_path,score,rank john_doe,./resumes/john_doe.pdf,95.5,1 jane_smith,./resumes/jane_smith.docx,87.3,2 bob_wilson,./resumes/bob_wilson.pdf,76.2,3
Three CSV files:
- selected_candidates.csv - 🎉 Candidates who passed
- rejected_candidates.csv - ❌ Candidates who were rejected
- all_ranked_candidates.csv - 📋 Complete ranking with scores
Identifies suspicious patterns:
- ⏱ Unusually fast completion times
- 📊 Statistical anomalies (< 1/5 of average time)
- 👥 User behavior analysis
- 📈 Time spent on each problem
The resume.ipynb notebook includes advanced cheat detection analytics:
python
cheaters = identify_cheaters(quiz_logs, threshold=0.2)
Detects:
- 🏃 Suspiciously fast quiz completions
- 📝 Inadequate problem-solving time
- 🎯 Statistically improbable answer patterns
- 🔗 Collaborative behavior indicators
Output Includes:
- List of flagged users
- Detailed timeline analysis
- Early bird detectors
- Fastest solvers per problem
bash
python main.py --resumes ./resumes --jd job_description.txt
python main.py
--resumes /path/to/resumes
--jd /path/to/job_desc.txt
--top 15
--output results/my_rankings.csv
bash
python evaluate_tests.py
--responses ./responses
--shortlist output/ranked_resumes.csv
--out output/final_results
--min_mcq 5
--min_code 200
Launch Jupyter and execute cells in resume.ipynb:
python from pyspark import SparkContext sc = SparkContext("local", "Analytics")
logs_rdd = sc.textFile("ws-logs_filtered.csv")
The included Jupyter notebook (resume.ipynb) teaches:
-
RDD Operations 🎯
- Creating RDDs from lists and files
- Map, filter, flatMap transformations
- Reduce and aggregation operations
-
Optimizations ⚡
- Lazy evaluation
- Caching and persistence
- Checkpointing
- Lineage tracking
-
Spark UI 🖥
- Job monitoring
- Stage analysis
- Storage management
-
Real-World Analytics 📊
- Quiz log analysis
- Cheat detection algorithms
- Performance metrics
Python Packages:
scikit-learn # Machine learning pandas # Data manipulation PyPDF2 # PDF processing python-docx # DOCX processing numpy # Numerical computing pyspark # Distributed computing findspark # Spark initialization
We welcome contributions! 🎉
- Fork the repository 🍴
- Create a feature branch (git checkout -b feature/amazing-feature)
- Commit your changes (git commit -m 'Add amazing feature')
- Push to the branch (git push origin feature/amazing-feature)
- Open a Pull Request 🔔
bash pip install pyspark --upgrade python -m pip install findspark
- Download JDK from oracle.com
- Set JAVA_HOME environment variable
bash chmod +x main.py evaluate_tests.py
bash pip install -r requirements.txt --force-reinstall
- 📧 Email: 23bds032@iiitdwd.ac.in
- 🐛 Issues: GitHub Issues
- 💬 Discussions: GitHub Discussions
This project is licensed under the MIT License - see the LICENSE file for details. 📜
⭐ If you found this helpful, please give it a star! ⭐