Problem:
Entry point scripts sit at the root inconsistently with the rest of the codebase, two conflicting dependency files exist with no clear authority and there are no defined boundaries between pipeline, classifier, extraction, IO and utility code. A new contributor cannot tell which file to run or which dependency file to use.
Tasks:
Move classify_extract.py and extract-from-txt.py into src/pipeline/
Rename extract-from-txt.py to extract_from_txt.py
Remove requirements.txt in favor of pyproject.toml as the single source of truth for dependencies
Reorganize src/ into classifier/, extraction/, io/ and utils/ subdirectories
Update CONTRIBUTING.md to reflect the new structure
Context:
A new contributor should be able to clone the repo, follow CONTRIBUTING.md and run the pipeline without hunting for the correct entry point or dependency file. Source: CONTRIBUTING.md
Problem:
Entry point scripts sit at the root inconsistently with the rest of the codebase, two conflicting dependency files exist with no clear authority and there are no defined boundaries between pipeline, classifier, extraction, IO and utility code. A new contributor cannot tell which file to run or which dependency file to use.
Tasks:
Move classify_extract.py and extract-from-txt.py into src/pipeline/
Rename extract-from-txt.py to extract_from_txt.py
Remove requirements.txt in favor of pyproject.toml as the single source of truth for dependencies
Reorganize src/ into classifier/, extraction/, io/ and utils/ subdirectories
Update CONTRIBUTING.md to reflect the new structure
Context:
A new contributor should be able to clone the repo, follow CONTRIBUTING.md and run the pipeline without hunting for the correct entry point or dependency file. Source: CONTRIBUTING.md