Titanic ML Lab (Beginner Friendly)

This project trains a RandomForestClassifier on the Titanic dataset and creates a submission.csv file for test predictions.

Project Structure

src/preprocess.py: data loading, cleaning, feature engineering, and preprocessing pipeline
src/train.py: model training and validation accuracy report
src/predict.py: test-set prediction and submission.csv creation
notebooks/titanic_analysis.ipynb: beginner walkthrough notebook
requirements.txt: required Python packages

Requirements

Python 3.9+
Titanic files:
- data/train.csv
- data/test.csv

Install dependencies:

pip install -r requirements.txt

What This Project Does

Reads train.csv and test.csv from data/
Performs data cleaning:
- fills missing Age and Fare with median
- fills missing Embarked with most frequent value
One-hot encodes categorical columns (Sex, Embarked)
Creates new features:
- FamilySize = SibSp + Parch + 1
- IsAlone (1 if family size is 1, else 0)
Trains a RandomForestClassifier
Evaluates accuracy on a validation split
Generates submission.csv for test predictions

Run

From project root:

python src/train.py
python src/predict.py

After running, you should see:

models/model.joblib
submission.csv

Pipeline Flow

Training Flow (`python src/train.py`)

flowchart TD
    A["train main"] --> B["load data"]
    B --> C["read train csv and test csv"]
    C --> D["prepare train features"]
    D --> E["add family features"]
    E --> F["prepare X and y"]
    F --> G["train test split"]
    G --> H["build preprocessor"]
    H --> I["fit transform train"]
    I --> J["transform validation"]
    J --> K["fit random forest"]
    K --> L["predict validation"]
    L --> M["compute accuracy"]
    M --> N["save model with joblib"]

Prediction Flow (`python src/predict.py`)

flowchart TD
    A["predict main"] --> B{"model joblib exists"}
    B -- "no" --> C["raise file not found"]
    B -- "yes" --> D["load data"]
    D --> E["select test dataframe"]
    E --> F["load model and preprocessor"]
    F --> G["prepare test features"]
    G --> H["add family features"]
    H --> I["transform test features"]
    I --> J["predict survived"]
    J --> K["build submission dataframe"]
    K --> L["write submission csv"]

Notes for Beginners

Start with notebooks/titanic_analysis.ipynb if you want to understand each step interactively.
The scripts in src/ are the same logic in reusable Python modules.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
notebooks		notebooks
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Titanic ML Lab (Beginner Friendly)

Project Structure

Requirements

What This Project Does

Run

Pipeline Flow

Training Flow (`python src/train.py`)

Prediction Flow (`python src/predict.py`)

Notes for Beginners

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Titanic ML Lab (Beginner Friendly)

Project Structure

Requirements

What This Project Does

Run

Pipeline Flow

Training Flow (python src/train.py)

Prediction Flow (python src/predict.py)

Notes for Beginners

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Training Flow (`python src/train.py`)

Prediction Flow (`python src/predict.py`)

Packages