The documentation contains a couple of sections where the project structure is explained.
The new structure I propose is this one.
my_project
│
├───.pytask
│
├───bld
│ └────...
│
├───data
│ └────...
│
├───src
│ └───my_project
│ ├────__init__.py
│ └────data_preparation.py
│
├───tasks
│ ├────config.py
│ └───data_preparation
│ └────task_data_preparation.py
│
└───pyproject.toml
None.
None.
Is your feature request related to a problem?
The documentation contains a couple of sections where the project structure is explained.
All of them propose to structure the project using an
src layout(good) where the tasks are within the project folder (bad).Why is this bad?
You cannot use
pip install .to install the project but must use the editable mode.Why? If you use the normal installation, the paths
SRCandBLDdefined inconfig.pywill be relative to the installed package path (like/mambaforge/envs/my_project/lib/python-3.11/site-packages/my_project/). It means the data is assumed to lie somewhere there.Of course, you could add the data to your Python project via
MANIFEST.in, but then the data would be copied over to the environment directory on every install, which can be very expensive.The data should not be part of the application.
Describe the solution you'd like
The new structure I propose is this one.
tasks, just like tests.data, out ofsrc.API breaking implications
None.
Describe alternatives you've considered
None.
Additional Context
Popular templates for data science projects also keep