|
| 1 | +# AGENTS.md - AI Coding Assistant Guide |
| 2 | + |
| 3 | +This document provides context for AI coding assistants (Claude, GPT, Copilot, Cursor, etc.) working with the `ryandata-address-utils` codebase. |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +## Codebase Overview |
| 8 | + |
| 9 | +**Package:** `ryandata-address-utils` |
| 10 | +**Purpose:** US address parsing library with Pydantic models, validation, and pandas integration |
| 11 | +**Python Version:** 3.12+ required (<=3.13) |
| 12 | +**License:** MIT |
| 13 | + |
| 14 | +### Architecture Patterns |
| 15 | + |
| 16 | +- **Facade Pattern:** `AddressService` provides a unified interface to parsers, validators, and data sources |
| 17 | +- **Protocol-based Interfaces:** Uses Python `Protocol` classes instead of ABCs for loose coupling |
| 18 | +- **Factory Pattern:** `ParserFactory`, `DataSourceFactory` for extensible component creation |
| 19 | +- **Composite Pattern:** `CompositeValidator` chains multiple validators |
| 20 | +- **Builder Pattern:** `AddressBuilder` for fluent address construction |
| 21 | + |
| 22 | +### Key Dependencies |
| 23 | + |
| 24 | +- `pydantic>=2.0.0` - Data validation and serialization |
| 25 | +- `usaddress>=0.5.16` - Default US address parser backend |
| 26 | +- `abstract-validation-base` - ProcessLog system for transformation tracking |
| 27 | +- `typer` + `trogon` - CLI with interactive TUI |
| 28 | + |
| 29 | +--- |
| 30 | + |
| 31 | +## Key Files |
| 32 | + |
| 33 | +| File | Purpose | |
| 34 | +|------|---------| |
| 35 | +| `src/ryandata_address_utils/__init__.py` | Public API exports - check here for available symbols | |
| 36 | +| `src/ryandata_address_utils/service.py` | `AddressService` facade - main entry point | |
| 37 | +| `src/ryandata_address_utils/models/address.py` | `Address` Pydantic model with 26+ fields | |
| 38 | +| `src/ryandata_address_utils/models/results.py` | `ParseResult`, `ZipInfo` dataclasses | |
| 39 | +| `src/ryandata_address_utils/protocols.py` | Protocol definitions for extensibility | |
| 40 | +| `src/ryandata_address_utils/parsers/` | Parser implementations (usaddress, libpostal) | |
| 41 | +| `src/ryandata_address_utils/validation/` | Validators (ZIP, state, composite) | |
| 42 | +| `src/ryandata_address_utils/data/` | Data sources (CSV-backed ZIP database) | |
| 43 | +| `src/ryandata_address_utils/core/` | Shared utilities (formatters, tracking, errors) | |
| 44 | + |
| 45 | +--- |
| 46 | + |
| 47 | +## Coding Conventions |
| 48 | + |
| 49 | +### Style & Linting |
| 50 | + |
| 51 | +- **Formatter:** Ruff (`ruff format`) |
| 52 | +- **Linter:** Ruff with `E, F, I, UP, B, SIM` rule sets |
| 53 | +- **Type Checker:** MyPy in strict mode (`disallow_untyped_defs = true`) |
| 54 | +- **Line Length:** 100 characters |
| 55 | + |
| 56 | +### Pydantic Models |
| 57 | + |
| 58 | +```python |
| 59 | +# Use Field() with descriptions for all model fields |
| 60 | +field_name: str | None = Field( |
| 61 | + default=None, |
| 62 | + description="Clear description of the field", |
| 63 | + validation_alias=AliasChoices("FieldName", "alias"), |
| 64 | +) |
| 65 | +``` |
| 66 | + |
| 67 | +### Protocol-based Design |
| 68 | + |
| 69 | +```python |
| 70 | +# Define protocols in protocols.py |
| 71 | +class ValidatorProtocol(Protocol): |
| 72 | + def validate(self, address: Address) -> ValidationResult: ... |
| 73 | + |
| 74 | +# Implementations satisfy protocols implicitly |
| 75 | +class ZipCodeValidator: |
| 76 | + def validate(self, address: Address) -> ValidationResult: |
| 77 | + # Implementation |
| 78 | +``` |
| 79 | + |
| 80 | +### ProcessLog for Transformations |
| 81 | + |
| 82 | +```python |
| 83 | +# Models inherit from RyanDataValidationBase which provides process_log |
| 84 | +address.add_cleaning_process( |
| 85 | + field="StateName", |
| 86 | + original_value="Texas", |
| 87 | + new_value="TX", |
| 88 | + reason="Normalized state name to abbreviation", |
| 89 | +) |
| 90 | +``` |
| 91 | + |
| 92 | +### Error Handling |
| 93 | + |
| 94 | +- Use `RyanDataAddressError` for address-specific errors |
| 95 | +- Use `RyanDataValidationError` for validation failures |
| 96 | +- All errors include package context via `PACKAGE_NAME` |
| 97 | + |
| 98 | +--- |
| 99 | + |
| 100 | +## Common Tasks |
| 101 | + |
| 102 | +### Adding a New Validator |
| 103 | + |
| 104 | +1. Create class in `validation/validators.py` |
| 105 | +2. Implement `ValidatorProtocol` (must have `validate(address) -> ValidationResult`) |
| 106 | +3. Register with `CompositeValidator` if needed |
| 107 | + |
| 108 | +```python |
| 109 | +class MyValidator: |
| 110 | + def validate(self, address: Address) -> ValidationResult: |
| 111 | + errors = [] |
| 112 | + # validation logic |
| 113 | + return ValidationResult(is_valid=len(errors) == 0, errors=errors) |
| 114 | +``` |
| 115 | + |
| 116 | +### Adding a New Parser Backend |
| 117 | + |
| 118 | +1. Create class in `parsers/` implementing `AddressParserProtocol` |
| 119 | +2. Register with `ParserFactory.register("name", MyParser)` |
| 120 | +3. Use via `AddressService(parser=ParserFactory.create("name"))` |
| 121 | + |
| 122 | +### Extending the Address Model |
| 123 | + |
| 124 | +1. Add field to `models/address.py` with proper `Field()` definition |
| 125 | +2. Add to `ADDRESS_FIELDS` enum if needed for iteration |
| 126 | +3. Update `AddressFormatter` if field affects full address computation |
| 127 | + |
| 128 | +### Working with Pandas |
| 129 | + |
| 130 | +```python |
| 131 | +service = AddressService() |
| 132 | +df = service.parse_dataframe(df, "address_column", prefix="addr_") |
| 133 | +# Returns DataFrame with addr_StreetName, addr_ZipCode, etc. |
| 134 | +``` |
| 135 | + |
| 136 | +--- |
| 137 | + |
| 138 | +## Testing |
| 139 | + |
| 140 | +- **Framework:** pytest with pytest-cov |
| 141 | +- **Test Location:** `tests/` directory |
| 142 | +- **Run Tests:** `uv run pytest` |
| 143 | +- **Coverage:** Target 80%+ coverage |
| 144 | + |
| 145 | +```bash |
| 146 | +uv run pytest # Run all tests |
| 147 | +uv run pytest -v # Verbose output |
| 148 | +uv run pytest --cov=src # With coverage |
| 149 | +uv run pytest -k "test_parse" # Run specific tests |
| 150 | +``` |
| 151 | + |
| 152 | +--- |
| 153 | + |
| 154 | +## Development Commands |
| 155 | + |
| 156 | +```bash |
| 157 | +uv sync # Install dependencies |
| 158 | +uv run pytest # Run tests |
| 159 | +uv run ruff check src/ # Lint |
| 160 | +uv run ruff format src/ # Format |
| 161 | +uv run mypy src/ # Type check |
| 162 | +uv run ryandata-address-utils-setup # Setup libpostal (optional) |
| 163 | +``` |
| 164 | + |
| 165 | +--- |
| 166 | + |
| 167 | +## Architecture Reference |
| 168 | + |
| 169 | +See `docs/ARCHITECTURE.md` for: |
| 170 | +- Detailed data flow diagrams |
| 171 | +- SOLID principles applied |
| 172 | +- DRY improvements made |
| 173 | +- Full package structure |
| 174 | + |
| 175 | +--- |
| 176 | + |
| 177 | +## Important Notes for AI Assistants |
| 178 | + |
| 179 | +1. **Do not redesign architecture** without explicit approval - stick to incremental changes |
| 180 | +2. **Use existing patterns** - follow the protocol/factory patterns already in place |
| 181 | +3. **ProcessLog is preferred** over legacy `CleaningTracker` for new code |
| 182 | +4. **Check `__init__.py`** for the public API before suggesting imports |
| 183 | +5. **Run `uv run pytest`** to verify changes don't break existing tests |
| 184 | +6. **Cursor-specific workflows** are documented in `.cursor/agents.md` |
0 commit comments