Skip to content

Refactor Repository Structure for Maintainability #57

@raymondcen

Description

@raymondcen

Problem:
Entry point scripts sit at the root inconsistently with the rest of the codebase, two conflicting dependency files exist with no clear authority and there are no defined boundaries between pipeline, classifier, extraction, IO and utility code. A new contributor cannot tell which file to run or which dependency file to use.
Tasks:
Move classify_extract.py and extract-from-txt.py into src/pipeline/
Rename extract-from-txt.py to extract_from_txt.py
Remove requirements.txt in favor of pyproject.toml as the single source of truth for dependencies
Reorganize src/ into classifier/, extraction/, io/ and utils/ subdirectories
Update CONTRIBUTING.md to reflect the new structure

Context:
A new contributor should be able to clone the repo, follow CONTRIBUTING.md and run the pipeline without hunting for the correct entry point or dependency file. Source: CONTRIBUTING.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions