Skip to content

Commit 36b5337

Browse files
author
github-actions
committed
docs: Add AGENTS.md and llms.txt for AI discoverability, update README
- Add AGENTS.md with codebase overview, conventions, and common tasks for AI assistants - Add llms.txt with structured library context for LLMs - Update README.md with Python 3.12+ badge, ProcessLog system, transformation tracking section - Bump version to 0.7.1
1 parent 6c03b62 commit 36b5337

6 files changed

Lines changed: 323 additions & 5 deletions

File tree

AGENTS.md

Lines changed: 184 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,184 @@
1+
# AGENTS.md - AI Coding Assistant Guide
2+
3+
This document provides context for AI coding assistants (Claude, GPT, Copilot, Cursor, etc.) working with the `ryandata-address-utils` codebase.
4+
5+
---
6+
7+
## Codebase Overview
8+
9+
**Package:** `ryandata-address-utils`
10+
**Purpose:** US address parsing library with Pydantic models, validation, and pandas integration
11+
**Python Version:** 3.12+ required (<=3.13)
12+
**License:** MIT
13+
14+
### Architecture Patterns
15+
16+
- **Facade Pattern:** `AddressService` provides a unified interface to parsers, validators, and data sources
17+
- **Protocol-based Interfaces:** Uses Python `Protocol` classes instead of ABCs for loose coupling
18+
- **Factory Pattern:** `ParserFactory`, `DataSourceFactory` for extensible component creation
19+
- **Composite Pattern:** `CompositeValidator` chains multiple validators
20+
- **Builder Pattern:** `AddressBuilder` for fluent address construction
21+
22+
### Key Dependencies
23+
24+
- `pydantic>=2.0.0` - Data validation and serialization
25+
- `usaddress>=0.5.16` - Default US address parser backend
26+
- `abstract-validation-base` - ProcessLog system for transformation tracking
27+
- `typer` + `trogon` - CLI with interactive TUI
28+
29+
---
30+
31+
## Key Files
32+
33+
| File | Purpose |
34+
|------|---------|
35+
| `src/ryandata_address_utils/__init__.py` | Public API exports - check here for available symbols |
36+
| `src/ryandata_address_utils/service.py` | `AddressService` facade - main entry point |
37+
| `src/ryandata_address_utils/models/address.py` | `Address` Pydantic model with 26+ fields |
38+
| `src/ryandata_address_utils/models/results.py` | `ParseResult`, `ZipInfo` dataclasses |
39+
| `src/ryandata_address_utils/protocols.py` | Protocol definitions for extensibility |
40+
| `src/ryandata_address_utils/parsers/` | Parser implementations (usaddress, libpostal) |
41+
| `src/ryandata_address_utils/validation/` | Validators (ZIP, state, composite) |
42+
| `src/ryandata_address_utils/data/` | Data sources (CSV-backed ZIP database) |
43+
| `src/ryandata_address_utils/core/` | Shared utilities (formatters, tracking, errors) |
44+
45+
---
46+
47+
## Coding Conventions
48+
49+
### Style & Linting
50+
51+
- **Formatter:** Ruff (`ruff format`)
52+
- **Linter:** Ruff with `E, F, I, UP, B, SIM` rule sets
53+
- **Type Checker:** MyPy in strict mode (`disallow_untyped_defs = true`)
54+
- **Line Length:** 100 characters
55+
56+
### Pydantic Models
57+
58+
```python
59+
# Use Field() with descriptions for all model fields
60+
field_name: str | None = Field(
61+
default=None,
62+
description="Clear description of the field",
63+
validation_alias=AliasChoices("FieldName", "alias"),
64+
)
65+
```
66+
67+
### Protocol-based Design
68+
69+
```python
70+
# Define protocols in protocols.py
71+
class ValidatorProtocol(Protocol):
72+
def validate(self, address: Address) -> ValidationResult: ...
73+
74+
# Implementations satisfy protocols implicitly
75+
class ZipCodeValidator:
76+
def validate(self, address: Address) -> ValidationResult:
77+
# Implementation
78+
```
79+
80+
### ProcessLog for Transformations
81+
82+
```python
83+
# Models inherit from RyanDataValidationBase which provides process_log
84+
address.add_cleaning_process(
85+
field="StateName",
86+
original_value="Texas",
87+
new_value="TX",
88+
reason="Normalized state name to abbreviation",
89+
)
90+
```
91+
92+
### Error Handling
93+
94+
- Use `RyanDataAddressError` for address-specific errors
95+
- Use `RyanDataValidationError` for validation failures
96+
- All errors include package context via `PACKAGE_NAME`
97+
98+
---
99+
100+
## Common Tasks
101+
102+
### Adding a New Validator
103+
104+
1. Create class in `validation/validators.py`
105+
2. Implement `ValidatorProtocol` (must have `validate(address) -> ValidationResult`)
106+
3. Register with `CompositeValidator` if needed
107+
108+
```python
109+
class MyValidator:
110+
def validate(self, address: Address) -> ValidationResult:
111+
errors = []
112+
# validation logic
113+
return ValidationResult(is_valid=len(errors) == 0, errors=errors)
114+
```
115+
116+
### Adding a New Parser Backend
117+
118+
1. Create class in `parsers/` implementing `AddressParserProtocol`
119+
2. Register with `ParserFactory.register("name", MyParser)`
120+
3. Use via `AddressService(parser=ParserFactory.create("name"))`
121+
122+
### Extending the Address Model
123+
124+
1. Add field to `models/address.py` with proper `Field()` definition
125+
2. Add to `ADDRESS_FIELDS` enum if needed for iteration
126+
3. Update `AddressFormatter` if field affects full address computation
127+
128+
### Working with Pandas
129+
130+
```python
131+
service = AddressService()
132+
df = service.parse_dataframe(df, "address_column", prefix="addr_")
133+
# Returns DataFrame with addr_StreetName, addr_ZipCode, etc.
134+
```
135+
136+
---
137+
138+
## Testing
139+
140+
- **Framework:** pytest with pytest-cov
141+
- **Test Location:** `tests/` directory
142+
- **Run Tests:** `uv run pytest`
143+
- **Coverage:** Target 80%+ coverage
144+
145+
```bash
146+
uv run pytest # Run all tests
147+
uv run pytest -v # Verbose output
148+
uv run pytest --cov=src # With coverage
149+
uv run pytest -k "test_parse" # Run specific tests
150+
```
151+
152+
---
153+
154+
## Development Commands
155+
156+
```bash
157+
uv sync # Install dependencies
158+
uv run pytest # Run tests
159+
uv run ruff check src/ # Lint
160+
uv run ruff format src/ # Format
161+
uv run mypy src/ # Type check
162+
uv run ryandata-address-utils-setup # Setup libpostal (optional)
163+
```
164+
165+
---
166+
167+
## Architecture Reference
168+
169+
See `docs/ARCHITECTURE.md` for:
170+
- Detailed data flow diagrams
171+
- SOLID principles applied
172+
- DRY improvements made
173+
- Full package structure
174+
175+
---
176+
177+
## Important Notes for AI Assistants
178+
179+
1. **Do not redesign architecture** without explicit approval - stick to incremental changes
180+
2. **Use existing patterns** - follow the protocol/factory patterns already in place
181+
3. **ProcessLog is preferred** over legacy `CleaningTracker` for new code
182+
4. **Check `__init__.py`** for the public API before suggesting imports
183+
5. **Run `uv run pytest`** to verify changes don't break existing tests
184+
6. **Cursor-specific workflows** are documented in `.cursor/agents.md`

README.md

Lines changed: 25 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,19 +4,20 @@
44
[![Ruff](https://github.com/Abstract-Data/RyanData-Address-Utils/actions/workflows/lint.yml/badge.svg)](https://github.com/Abstract-Data/RyanData-Address-Utils/actions/workflows/lint.yml)
55
[![MyPy](https://github.com/Abstract-Data/RyanData-Address-Utils/actions/workflows/typecheck.yml/badge.svg)](https://github.com/Abstract-Data/RyanData-Address-Utils/actions/workflows/typecheck.yml)
66
[![codecov](https://codecov.io/gh/Abstract-Data/RyanData-Address-Utils/graph/badge.svg?token=75LQK4KJTZ)](https://codecov.io/gh/Abstract-Data/RyanData-Address-Utils)
7-
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
7+
[![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)
88
[![uv](https://img.shields.io/badge/packaging-uv-9055ff.svg)](https://github.com/astral-sh/uv)
99
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
1010

1111
Parse and validate US addresses with Pydantic models, ZIP/state validation, pandas integration, and semantic-release powered CI.
1212

1313
## Highlights
1414

15-
- Structured parsing of US addresses into 26 components with Pydantic models
15+
- Structured parsing of US addresses into 26+ components with Pydantic models
1616
- ZIP and state validation backed by authoritative datasets
1717
- Pandas-friendly parsing for batch workloads
1818
- Custom errors (`RyanDataAddressError`, `RyanDataValidationError`) with package context
1919
- Builder API for programmatic address construction
20+
- ProcessLog system for transformation auditing (via `abstract-validation-base`)
2021
- Semantic-release CI for automated tagging and releases
2122

2223
## Install
@@ -97,6 +98,24 @@ address = (
9798
)
9899
```
99100

101+
## Transformation tracking
102+
103+
Track what normalizations and cleanings were applied during parsing:
104+
105+
```python
106+
from ryandata_address_utils import AddressService
107+
108+
service = AddressService()
109+
result = service.parse("123 main st, austin texas 78749")
110+
111+
# Aggregate logs from parsing and model transformations
112+
for entry in result.aggregate_logs():
113+
print(f"{entry['field']}: {entry['message']}")
114+
# Example output:
115+
# StateName: Normalized state name from full name to abbreviation
116+
# ZipCode: ZIP code parsed and validated
117+
```
118+
100119
## Workflow at a glance
101120

102121
```mermaid
@@ -113,11 +132,15 @@ flowchart LR
113132
- `parse(...)`: convenience wrapper returning `ParseResult`
114133
- ZIP utilities: `get_city_state_from_zip`, `get_zip_info`, `is_valid_zip`, `is_valid_state`, `normalize_state`
115134
- Builder: `AddressBuilder` for programmatic address construction
135+
- Audit trail: `ProcessLog`, `ProcessEntry` for tracking transformations
136+
- Validation base: `ValidationBase`, `RyanDataValidationBase` for model mixins
116137

117138
## Documentation
118139

119140
- **[Architecture Overview](docs/ARCHITECTURE.md)** - Package structure, data flow diagrams, design patterns, and SOLID/DRY principles applied
120141
- **[Diagrams](docs/diagrams.md)** - Visual references for the codebase
142+
- **[Changelog](CHANGELOG.md)** - Version history and release notes
143+
- **[AI Agent Guide](AGENTS.md)** - Guidance for AI coding assistants
121144

122145
## Development (uv)
123146

llms.txt

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
# ryandata-address-utils
2+
3+
> US address parsing library with Pydantic models, validation, and pandas integration
4+
5+
## Quick Facts
6+
7+
- Package: ryandata-address-utils
8+
- Version: 0.7.1
9+
- Python: >=3.12.8, <=3.13
10+
- License: MIT
11+
- Repository: https://github.com/Abstract-Data/RyanData-Address-Utils
12+
13+
## What This Library Does
14+
15+
Parse, validate, and normalize US addresses into structured components. Features include:
16+
- 26+ address components (street number, name, city, state, ZIP, unit, etc.)
17+
- ZIP code and state validation against authoritative datasets
18+
- Pandas DataFrame integration for batch processing
19+
- International address support via libpostal (optional)
20+
- Transformation tracking with ProcessLog audit trails
21+
22+
## Main Entry Points
23+
24+
```python
25+
from ryandata_address_utils import AddressService, parse
26+
27+
# Simple parsing
28+
result = parse("123 Main St, Austin TX 78749")
29+
if result.is_valid:
30+
print(result.address.ZipCode) # "78749"
31+
32+
# Service-based parsing
33+
service = AddressService()
34+
result = service.parse("456 Oak Ave, Dallas TX 75201")
35+
36+
# Pandas integration
37+
df = service.parse_dataframe(df, "address_column")
38+
```
39+
40+
## Key Classes
41+
42+
- AddressService: Main facade for parsing operations
43+
- Address: Pydantic model with all address components
44+
- ParseResult: Container for parsed address + validation + logs
45+
- AddressBuilder: Fluent API for programmatic address construction
46+
- ProcessLog: Audit trail for transformations
47+
48+
## Installation
49+
50+
```bash
51+
# Using uv (recommended)
52+
uv add git+https://github.com/Abstract-Data/RyanData-Address-Utils.git
53+
54+
# With pandas support
55+
uv add "ryandata-address-utils[pandas] @ git+https://github.com/Abstract-Data/RyanData-Address-Utils.git"
56+
```
57+
58+
## Documentation Files
59+
60+
- README.md: Installation, quick start, usage examples
61+
- docs/ARCHITECTURE.md: Design patterns, data flow, SOLID principles
62+
- AGENTS.md: AI coding assistant guidance
63+
- CHANGELOG.md: Version history and changes
64+
- .cursor/agents.md: Cursor-specific agent workflows
65+
66+
## Architecture
67+
68+
The library uses:
69+
- Facade pattern (AddressService)
70+
- Protocol-based interfaces for extensibility
71+
- Factory pattern for parsers and data sources
72+
- Composite pattern for validators
73+
- Builder pattern for address construction
74+
75+
## Common Operations
76+
77+
### Parse a single address
78+
```python
79+
result = parse("123 Main St, Austin TX 78749")
80+
```
81+
82+
### Parse with automatic US/international detection
83+
```python
84+
result = service.parse_auto("10 Downing Street, London, UK")
85+
```
86+
87+
### Validate ZIP codes
88+
```python
89+
from ryandata_address_utils import is_valid_zip, get_zip_info
90+
is_valid_zip("78749") # True
91+
info = get_zip_info("78749") # ZipInfo with city, state, county
92+
```
93+
94+
### Build addresses programmatically
95+
```python
96+
from ryandata_address_utils import AddressBuilder
97+
address = (
98+
AddressBuilder()
99+
.with_street_number("123")
100+
.with_street_name("Main")
101+
.with_city("Austin")
102+
.with_state("TX")
103+
.with_zip("78749")
104+
.build()
105+
)
106+
```
107+
108+
## Contact
109+
110+
- Issues: https://github.com/Abstract-Data/RyanData-Address-Utils/issues
111+
- Author: Abstract-Data (dev@abstractdata.io)

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "ryandata-address-utils"
3-
version = "0.7.0"
3+
version = "0.7.1"
44
description = "A Python address parser using usaddress with Pydantic models, validation, and extensible architecture"
55
readme = "README.md"
66
license = { text = "MIT" }

src/ryandata_address_utils/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@
104104
)
105105
from ryandata_address_utils.validation.base import RyanDataValidationBase, ValidationBase
106106

107-
__version__ = "0.7.0"
107+
__version__ = "0.7.1"
108108
__package_name__ = "ryandata-address-utils"
109109

110110
__all__ = [

uv.lock

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)