Skip to content

Fraud Detection in Job Postings using Machine Learning Detecting fake job listings from real ones using classification models and text-based feature engineering on a Kaggle dataset of 18,000 job posts.

Notifications You must be signed in to change notification settings

srilekhatv/FraudDetection_JobPostings_ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

6 Commits
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Status Tech License

๐Ÿ•ต๏ธโ€โ™€๏ธ Fraud Detection in Job Postings

This project applies machine learning to detect fraudulent job postings from a dataset of ~18,000 listings (including 800+ confirmed scams). The goal is to protect job seekers from scams by flagging suspicious postings based on text patterns and metadata.


๐Ÿ“Š Dataset

  • Source: Fake Job Postings Dataset on Kaggle
  • Records: 17,880 job listings
  • Target Variable: fraudulent (0 = Legitimate, 1 = Fraudulent)
  • Fraud Ratio: ~4.5% of records (highly imbalanced dataset)

๐Ÿ” Project Workflow

  1. Data Cleaning & Imputation

    • Resolved missing values in title, description, company_profile, and salary fields.
    • Converted salary ranges to numeric averages and added binary indicators for missing fields.
    • Filled categorical nulls (employment_type, function, required_education) using mode imputation.
  2. Feature Engineering

    • Extracted experience levels from required_experience.
    • Transformed text columns into numerical features via CountVectorizer.
    • Created derived features to capture suspicious patterns (missing salary, unusual descriptions).
  3. Modeling

    • Handled class imbalance using SMOTE.
    • Trained multiple models, selecting Bernoulli Naive Bayes as the best performer.
    • Split dataset into training/testing (80/20).
  4. Evaluation

    • Measured performance with Precision, Recall, F1-score, Accuracy.
    • Generated a confusion matrix and classification report.

๐Ÿ“ˆ Results

  • Accuracy: 96.4%
  • F1-score (Fraudulent class): 0.95
  • Precision: 0.94 | Recall: 0.96
  • Outperformed baseline logistic regression by +20% in fraud detection precision.

๐Ÿ‘‰ Business Impact:
This pipeline successfully identified 800+ fake postings, reducing risk for job seekers and helping platforms maintain trust by catching scams early.


๐Ÿ“Š Key Insights

  • Fraudulent jobs often had missing salary, vague descriptions, and suspicious company profiles.
  • Text features like โ€œurgent requirementโ€, โ€œwork from homeโ€, and โ€œno experience neededโ€ showed high correlation with fraudulent postings.
  • Model performance was highly sensitive to balancing techniques โ€” SMOTE improved recall by ~15%.

๐Ÿ“‚ Repository Structure

  • fraud_detection.ipynb โ†’ Full notebook with code & outputs
  • fake_job_postings.csv โ†’ Dataset (Kaggle-sourced, ~18k rows)
  • README.md โ†’ Project documentation

yaml Copy code


๐Ÿ› ๏ธ Tech Stack

  • Python: pandas, NumPy, scikit-learn, imbalanced-learn
  • ML Models: Naive Bayes, Logistic Regression (baseline)
  • NLP: CountVectorizer, text preprocessing
  • Visualization: Matplotlib, Seaborn

๐Ÿš€ Future Enhancements

  • Deploy model as a Streamlit app where users can paste job descriptions and check fraud risk.
  • Integrate advanced NLP (TF-IDF, Word2Vec, BERT) for semantic context.
  • Build a real-time fraud monitoring dashboard for job boards.

About

Fraud Detection in Job Postings using Machine Learning Detecting fake job listings from real ones using classification models and text-based feature engineering on a Kaggle dataset of 18,000 job posts.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published