Here’s the updated README.md reflecting all the recent improvements you've applied, including removal of DVC and Hydra, added testing, enriched pre-commit setup with mypy and nbQA, improved docs and developer UX, and notebook management guidelines.
This template was built after reading this excellent article by khuyetran1401.
Rather than forking, I’ve rebuilt it myself to understand each component. The goal is simplicity, reproducibility, and maintainability for personal and experimental data science projects.
For industrial or enterprise-grade pipelines, tools like Kedro are still preferable.
- Automatically build repository structure
- Create and build Conda environment
- Enforce static typing and clean code with pre-commit
- Run unit tests with
pytest - Auto-lint notebooks using
nbQA - Generate HTML documentation with
pdoc
| Tool | Purpose |
|---|---|
| Conda | Environment management |
| pre-commit | Code quality automation |
| pytest | Unit testing framework |
| mypy | Static type checking |
| black | Code formatting |
| flake8 | Linting |
| nbQA | Code quality checks on Jupyter Notebooks |
| pdoc | Automatic documentation generator |
.
├── config/ # Conda and pipeline configuration
│ └── environment.yml # Conda environment definition
├── data/ # Data folders (local, untracked)
│ ├── 01_raw/
│ ├── 02_primary/
│ ├── 03_feature/
│ ├── 04_model_input/
│ ├── 05_model_output/
│ └── 06_reporting/
├── docs/ # HTML and markdown documentation
├── models/ # Saved models and config
├── notebooks/ # Jupyter notebooks (experimental zone)
│ └── README.md # Notebook structure and usage guide
├── scripts/ # Scripts folder
│ └── setup_cli.py # Script to setup CLI
├── src/ # Source code for data pipeline, modeling, etc.
│ └── main.py
├── tests/ # Unit tests
│ └── test_main.py
├── .pre-commit-config.yaml # Hooks for linting, formatting, typing
├── Makefile # Automation: setup, lint, test, docs
└── README.mdpip install cookiecuttercookiecutter https://github.com/radema/datascience-personal-templatescd {{cookiecutter.repository-name}}
conda env create -f config/environment.yml
conda activate {{cookiecutter.environment_name}}
make setupmake testmake docsYou can automate environment creation, setup, testing, and docs generation using the built-in CLI.
Example:
python scripts/setup_cli.py allOr use individual steps:
python scripts/setup_cli.py create-env --env-name=my_env
python scripts/setup_cli.py activate --env-name=my_env
python scripts/setup_cli.py test
python scripts/setup_cli.py docsYou can also run:
make clito open the CLI menu directly.
Notebooks are stored in notebooks/ and follow a numeric prefix convention:
01_data_exploration.ipynb
02_feature_engineering.ipynb
03_model_training.ipynb
Code quality is enforced in notebooks using nbQA integrated with pre-commit (black, flake8, mypy).
Let me know if you want me to write the updated `Makefile`, `.pre-commit-config.yaml`, or help generate badges and shields for the top of the README!