🧠 Data Science Project Template

Here’s the updated README.md reflecting all the recent improvements you've applied, including removal of DVC and Hydra, added testing, enriched pre-commit setup with mypy and nbQA, improved docs and developer UX, and notebook management guidelines.

🧠 Data Science Project Template

This template was built after reading this excellent article by khuyetran1401.
Rather than forking, I’ve rebuilt it myself to understand each component. The goal is simplicity, reproducibility, and maintainability for personal and experimental data science projects.

For industrial or enterprise-grade pipelines, tools like Kedro are still preferable.

🚀 Features and Roadmap

✅ Implemented

Automatically build repository structure
Create and build Conda environment
Enforce static typing and clean code with pre-commit
Run unit tests with pytest
Auto-lint notebooks using nbQA
Generate HTML documentation with pdoc

🧰 Tools Used

Tool	Purpose
Conda	Environment management
pre-commit	Code quality automation
pytest	Unit testing framework
mypy	Static type checking
black	Code formatting
flake8	Linting
nbQA	Code quality checks on Jupyter Notebooks
pdoc	Automatic documentation generator

🧱 Template Structure

.
├── config/                      # Conda and pipeline configuration
│   └── environment.yml          # Conda environment definition
├── data/                        # Data folders (local, untracked)
│   ├── 01_raw/
│   ├── 02_primary/
│   ├── 03_feature/
│   ├── 04_model_input/
│   ├── 05_model_output/
│   └── 06_reporting/
├── docs/                        # HTML and markdown documentation
├── models/                      # Saved models and config
├── notebooks/                   # Jupyter notebooks (experimental zone)
│   └── README.md                # Notebook structure and usage guide
├── scripts/                     # Scripts folder
│   └── setup_cli.py             # Script to setup CLI
├── src/                         # Source code for data pipeline, modeling, etc.
│   └── main.py
├── tests/                       # Unit tests
│   └── test_main.py
├── .pre-commit-config.yaml      # Hooks for linting, formatting, typing
├── Makefile                     # Automation: setup, lint, test, docs
└── README.md

🛠 How to Use This Template

📦 Install Cookiecutter

pip install cookiecutter

🧪 Generate New Project

cookiecutter https://github.com/radema/datascience-personal-templates

⚙️ Setup Your Environment

cd {{cookiecutter.repository-name}}
conda env create -f config/environment.yml
conda activate {{cookiecutter.environment_name}}
make setup

🧪 Run Tests

make test

📚 Generate Documentation

make docs

⚡ Automate Setup via CLI

You can automate environment creation, setup, testing, and docs generation using the built-in CLI.

Example:

python scripts/setup_cli.py all

Or use individual steps:

python scripts/setup_cli.py create-env --env-name=my_env
python scripts/setup_cli.py activate --env-name=my_env
python scripts/setup_cli.py test
python scripts/setup_cli.py docs

You can also run:

make cli

to open the CLI menu directly.

📓 Notebook Guidelines

Notebooks are stored in notebooks/ and follow a numeric prefix convention:

01_data_exploration.ipynb
02_feature_engineering.ipynb
03_model_training.ipynb

Code quality is enforced in notebooks using nbQA integrated with pre-commit (black, flake8, mypy).

📖 Resources


Let me know if you want me to write the updated `Makefile`, `.pre-commit-config.yaml`, or help generate badges and shields for the top of the README!

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
{{cookiecutter.directory_name}}		{{cookiecutter.directory_name}}
.gitignore		.gitignore
README.md		README.md
cookiecutter.json		cookiecutter.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Data Science Project Template

🚀 Features and Roadmap

✅ Implemented

🧰 Tools Used

🧱 Template Structure

🛠 How to Use This Template

📦 Install Cookiecutter

🧪 Generate New Project

⚙️ Setup Your Environment

🧪 Run Tests

📚 Generate Documentation

⚡ Automate Setup via CLI

📓 Notebook Guidelines

📖 Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠 Data Science Project Template

🚀 Features and Roadmap

✅ Implemented

🧰 Tools Used

🧱 Template Structure

🛠 How to Use This Template

📦 Install Cookiecutter

🧪 Generate New Project

⚙️ Setup Your Environment

🧪 Run Tests

📚 Generate Documentation

⚡ Automate Setup via CLI

📓 Notebook Guidelines

📖 Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages