Skip to content

radema/datascience-personal-templates

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Here’s the updated README.md reflecting all the recent improvements you've applied, including removal of DVC and Hydra, added testing, enriched pre-commit setup with mypy and nbQA, improved docs and developer UX, and notebook management guidelines.


🧠 Data Science Project Template

This template was built after reading this excellent article by khuyetran1401.
Rather than forking, I’ve rebuilt it myself to understand each component. The goal is simplicity, reproducibility, and maintainability for personal and experimental data science projects.

For industrial or enterprise-grade pipelines, tools like Kedro are still preferable.


🚀 Features and Roadmap

✅ Implemented

  • Automatically build repository structure
  • Create and build Conda environment
  • Enforce static typing and clean code with pre-commit
  • Run unit tests with pytest
  • Auto-lint notebooks using nbQA
  • Generate HTML documentation with pdoc

🧰 Tools Used

Tool Purpose
Conda Environment management
pre-commit Code quality automation
pytest Unit testing framework
mypy Static type checking
black Code formatting
flake8 Linting
nbQA Code quality checks on Jupyter Notebooks
pdoc Automatic documentation generator

🧱 Template Structure

.
├── config/                      # Conda and pipeline configuration
│   └── environment.yml          # Conda environment definition
├── data/                        # Data folders (local, untracked)
│   ├── 01_raw/
│   ├── 02_primary/
│   ├── 03_feature/
│   ├── 04_model_input/
│   ├── 05_model_output/
│   └── 06_reporting/
├── docs/                        # HTML and markdown documentation
├── models/                      # Saved models and config
├── notebooks/                   # Jupyter notebooks (experimental zone)
│   └── README.md                # Notebook structure and usage guide
├── scripts/                     # Scripts folder
│   └── setup_cli.py             # Script to setup CLI
├── src/                         # Source code for data pipeline, modeling, etc.
│   └── main.py
├── tests/                       # Unit tests
│   └── test_main.py
├── .pre-commit-config.yaml      # Hooks for linting, formatting, typing
├── Makefile                     # Automation: setup, lint, test, docs
└── README.md

🛠 How to Use This Template

📦 Install Cookiecutter

pip install cookiecutter

🧪 Generate New Project

cookiecutter https://github.com/radema/datascience-personal-templates

⚙️ Setup Your Environment

cd {{cookiecutter.repository-name}}
conda env create -f config/environment.yml
conda activate {{cookiecutter.environment_name}}
make setup

🧪 Run Tests

make test

📚 Generate Documentation

make docs

⚡ Automate Setup via CLI

You can automate environment creation, setup, testing, and docs generation using the built-in CLI.

Example:

python scripts/setup_cli.py all

Or use individual steps:

python scripts/setup_cli.py create-env --env-name=my_env
python scripts/setup_cli.py activate --env-name=my_env
python scripts/setup_cli.py test
python scripts/setup_cli.py docs

You can also run:

make cli

to open the CLI menu directly.


📓 Notebook Guidelines

Notebooks are stored in notebooks/ and follow a numeric prefix convention:

01_data_exploration.ipynb
02_feature_engineering.ipynb
03_model_training.ipynb

Code quality is enforced in notebooks using nbQA integrated with pre-commit (black, flake8, mypy).


📖 Resources



Let me know if you want me to write the updated `Makefile`, `.pre-commit-config.yaml`, or help generate badges and shields for the top of the README!

About

Repository to track my templates for personal projects

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors