-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Is your feature request related to a problem?
The documentation contains a couple of sections where the project structure is explained.
- https://pytask-dev.readthedocs.io/en/stable/tutorials/set_up_a_project.html
- https://pytask-dev.readthedocs.io/en/stable/how_to_guides/bp_structure_of_a_research_project.html
- https://pytask-dev.readthedocs.io/en/stable/how_to_guides/bp_templates_and_projects.html
All of them propose to structure the project using an src layout (good) where the tasks are within the project folder (bad).
Why is this bad?
-
You cannot use
pip install .to install the project but must use the editable mode.Why? If you use the normal installation, the paths
SRCandBLDdefined inconfig.pywill be relative to the installed package path (like/mambaforge/envs/my_project/lib/python-3.11/site-packages/my_project/). It means the data is assumed to lie somewhere there.Of course, you could add the data to your Python project via
MANIFEST.in, but then the data would be copied over to the environment directory on every install, which can be very expensive. -
The data should not be part of the application.
Describe the solution you'd like
The new structure I propose is this one.
my_project
│
├───.pytask
│
├───bld
│ └────...
│
├───data
│ └────...
│
├───src
│ └───my_project
│ ├────__init__.py
│ └────data_preparation.py
│
├───tasks
│ ├────config.py
│ └───data_preparation
│ └────task_data_preparation.py
│
└───pyproject.toml
- Tasks are moved to a separate folder,
tasks, just like tests. - Data is moved to
data, out ofsrc.
API breaking implications
None.
Describe alternatives you've considered
None.
Additional Context
Popular templates for data science projects also keep