Skip to content

njoppi2/kaggle-titanic

Repository files navigation

Kaggle Titanic

CI License Last Commit

End-to-end ML competition project for Kaggle Titanic survival prediction from tabular passenger data.

Snapshot

Titanic modeling workflow from raw data to submission and checks

Problem

Given train.csv and test.csv, predict Survived for unseen passengers while maintaining transparent preprocessing and a reproducible submission workflow.

Tech Stack

  • Python (notebook and script workflows)
  • Jupyter Notebook
  • XGBoost / classical ML preprocessing
  • GitHub Actions (validation checks)

Repository Layout

  • data/: competition train/test datasets
  • titanic_survival_NN.ipynb: main notebook (EDA, preprocessing, modeling)
  • xgboost.py: script-based model experimentation
  • solutions/: generated submission files
  • tests/: checks for generated output format/content

Quickstart

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Run notebook:

jupyter notebook titanic_survival_NN.ipynb

Or run script experiment:

python xgboost.py

Reproducible CLI Baseline

Generate a deterministic baseline submission and CV report without opening notebooks:

python scripts/reproducible_baseline.py

Outputs:

  • solutions/cli_baseline_submission.csv
  • artifacts/cv_report.json

Validation and CI

Local check:

python scripts/reproducible_baseline.py
python -m unittest discover -s tests -p "test_*.py"

CI (.github/workflows/ci.yml) validates Python syntax for xgboost.py and solution-file tests.

Results

  • Best score in this repository: 0.78229 (Kaggle public leaderboard).
  • Includes notebook-first and script-based experimentation paths.
  • Includes automated checks for generated submission files.

Limitations

  • Workflow is still notebook-centered for main reproducibility path.
  • Hyperparameter search and CV reporting are limited.
  • No single CLI command yet to reproduce final submission end-to-end.

Roadmap

  • Add reproducible CLI pipeline for submission generation.
  • Add cross-validation report and feature-importance artifacts.
  • Add pinned environment lockfile for stronger reproducibility.

Contributing

See CONTRIBUTING.md.

About

Kaggle Titanic ML pipeline with feature engineering, training experiments, and submission checks.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors