Is this the right issue type?
Summary
To group refactoring issues.
Problems
-
tools/ Directory is for everything
- 20 files with no organisation
- Mixes domain logic, infrastructure, utilities, and application code
- Makes codebase hard to navigate, maintain, develop
- Given we don't support sqlite we have a lot of code for it
-
cli/ should be thin
- should parse arguments and call services
- currently processes significant business logic
- presentation layer doing application work
-
No Clear Architecture
- Missing standard Python package layers (domain, application, infrastructure)
- Tight coupling between modules
- Mixed abstraction levels
- Configuration mixed with code
-
Poor Documentation
- No architecture documentation
- Minimal inline documentation
Structure
plausible structure ?
carrottransform/
├── domain/ # Business logic & models
│ ├── omop/ # OMOP CDM domain
│ ├── mapping/ # Mapping rules & concepts
│ └── person/ # Person domain logic
│
├── application/ # Application services
│ ├── orchestrator.py
│ ├── processor.py
│ └── record_builder.py
│
├── infrastructure/ # External concerns
│ ├── storage/ # File I/O, database, S3
│ └── sources/ # Data source adapters
│
└── shared/ # Shared utilities
├── utils/ # Pure utilities
└── logging/ # Logging setup
Tests
should mirror new structure
tests/
- unit/
- domain/
- application/
- infrastructure/
- integration/
- fixtures/
- data/
- {test-case-name}/
- source/ # .csv files
- output/ # .tsv files
- v1.json # if available
- v2.json # if available, maybe expect only one?
- don't test docker stuff
- move external dependencies (e.g db, to services)
- fix the markers
- document what we are actually testing and how any of this works
Confirm creation
Is this the right issue type?
Summary
To group refactoring issues.
Problems
tools/Directory is for everythingcli/should be thinNo Clear Architecture
Poor Documentation
Structure
plausible structure ?
Tests
should mirror new structure
Confirm creation