A FastAPI-based service for managing trash pickup schedules with crowdsourced data.
Ready to deploy to production? TrashAlert supports modern cloud deployment:
- API: Deploy to Railway with managed PostgreSQL
- Admin Dashboard: Deploy to Vercel
- CI/CD: Automated testing and deployment with GitHub Actions
- Monitoring: Built-in uptime monitoring and health checks
Quick Deploy: Run ./scripts/deploy-production.sh
📚 Full Guide: See PRODUCTION_DEPLOYMENT.md for detailed instructions.
- POST /report: Submit crowdsourced trash pickup reports
- GET /lookup: Look up trash pickup schedules for an address
- Consensus Algorithm: Automatically verifies crowdsourced data when ≥3 reports with ≥67% agreement
- Multi-source Data: Merges crowdsourced and official data with intelligent prioritization
pip install -r requirements.txtpython init_db.pyThis creates a SQLite database with sample addresses.
uvicorn app.main:app --reloadThe API will be available at http://localhost:8000
In a new terminal:
python test_api.pyThe easiest way to run TrashAlert in production is using Docker. This provides a complete deployment with API, worker services, and nginx reverse proxy.
make docker-upThis command will:
- Build Docker images for the API and worker services
- Initialize the SQLite database automatically
- Start the FastAPI application
- Start nginx as a reverse proxy
- Start a worker container for scheduled pipeline tasks
The API will be available at http://localhost (via nginx on port 80).
# Start all containers
make docker-up
# Stop all containers
make docker-down
# View logs from all containers
make docker-logs
# Check container status
make docker-ps
# Rebuild containers from scratch
make docker-rebuildIf you prefer to use docker-compose directly:
# Build and start all services
docker-compose up --build -d
# View logs
docker-compose logs -f
# Stop all services
docker-compose down
# Check container status
docker-compose psThe Docker deployment includes:
- api: FastAPI application (exposed via nginx)
- worker: Background worker for scheduled pipeline tasks
- nginx: Reverse proxy (ports 80/443)
- certbot: Automatic SSL certificate renewal (optional)
Data is persisted in Docker volumes:
./data: SQLite database./logs: Application logs
Once containers are running, test the API:
# Health check
curl http://localhost/health
# Lookup endpoint
curl "http://localhost/lookup?address=1122%20Palmview%20Ave,%20El%20Centro,%20CA"
# Submit a report
curl -X POST http://localhost/report \
-H "Content-Type: application/json" \
-d '{
"address": "1122 Palmview Ave, El Centro, CA",
"trash_day": "WED",
"recycling_day": "FRI"
}'To run pipeline tasks in the worker container:
# Execute a command in the worker container
docker-compose exec worker python scripts/data_collection/schedule_runner.py --all
# Run the full pipeline
docker-compose exec worker python scripts/run_full_pipeline.py --city "El Centro"Currently using SQLite for simplicity. To switch to PostgreSQL:
- Add a PostgreSQL service to
docker-compose.yml - Update
app/database.pywith PostgreSQL connection string - Install
psycopg2-binaryinrequirements.txt - Update
SQLALCHEMY_DATABASE_URLenvironment variable
Example PostgreSQL service:
postgres:
image: postgres:15-alpine
environment:
POSTGRES_DB: trashalert
POSTGRES_USER: trashalert
POSTGRES_PASSWORD: ${DB_PASSWORD}
volumes:
- postgres-data:/var/lib/postgresql/dataSubmit a crowdsourced trash pickup report.
Request:
{
"address": "1122 Palmview Ave, El Centro, CA",
"trash_day": "WED",
"recycling_day": "FRI",
"green_day": null,
"user_hash": "optional-stable-id"
}Response:
{
"success": true,
"message": "Report submitted successfully",
"address_id": 1,
"normalized_address": "1122 PALMVIEW AVE, EL CENTRO, CA",
"consensus": {
"trash_day": "WED",
"recycling_day": "FRI",
"green_day": null,
"reports_count": 3,
"trash_agreement_ratio": 1.0,
"recycling_agreement_ratio": 1.0,
"green_agreement_ratio": 0.0,
"is_verified": true
}
}Look up trash pickup schedule for an address.
Request:
GET /lookup?address=1122 Palmview Ave, El Centro, CA
Response:
{
"address": "1122 Palmview Ave, El Centro, CA",
"normalized_address": "1122 PALMVIEW AVE, EL CENTRO, CA",
"trash_day": "WED",
"recycling_day": "FRI",
"green_day": null,
"source": "CROWD_VERIFIED",
"consensus_reports_count": 3,
"consensus_agreement_ratio": 1.0,
"lat": 32.792,
"lon": -115.563
}Source Priority:
CROWD_VERIFIED- Crowdsourced data with ≥3 reports and ≥67% agreementOFFICIAL- Official GIS/government dataUNKNOWN- No data available
Get database statistics.
Response:
{
"total_addresses": 3,
"total_reports": 10,
"total_consensus": 2,
"verified_consensus": 2
}- Stores normalized addresses with coordinates
- Contains official pickup schedules (from GIS/rules)
- Individual user-submitted reports
- Tracks user_hash to prevent spam
- Aggregated consensus from multiple reports
- Automatically calculated and verified
- Verification requires:
- At least 3 reports
- At least 67% agreement ratio
The consensus algorithm:
- Aggregates all reports for an address
- Finds the most common value for each pickup day type
- Calculates agreement ratios (% of reports agreeing with consensus)
- Marks as verified if:
- Total reports ≥ 3
- Average agreement ratio ≥ 0.67
# Report 1
curl -X POST http://localhost:8000/report \
-H "Content-Type: application/json" \
-d '{
"address": "1122 Palmview Ave, El Centro, CA",
"trash_day": "WED",
"recycling_day": "FRI",
"user_hash": "user_001"
}'
# Report 2
curl -X POST http://localhost:8000/report \
-H "Content-Type: application/json" \
-d '{
"address": "1122 Palmview Ave, El Centro, CA",
"trash_day": "WED",
"recycling_day": "FRI",
"user_hash": "user_002"
}'
# Report 3 (reaches verification threshold)
curl -X POST http://localhost:8000/report \
-H "Content-Type: application/json" \
-d '{
"address": "1122 Palmview Ave, El Centro, CA",
"trash_day": "WED",
"recycling_day": "FRI",
"user_hash": "user_003"
}'curl "http://localhost:8000/lookup?address=1122%20Palmview%20Ave,%20El%20Centro,%20CA"TrashAlert/
├── app/
│ ├── __init__.py
│ ├── main.py # FastAPI application
│ ├── models.py # Database models
│ ├── schemas.py # Pydantic schemas
│ ├── database.py # Database connection
│ └── utils.py # Utility functions
├── init_db.py # Database initialization
├── test_api.py # Test/demo script
├── requirements.txt # Python dependencies
└── README.md # This file
FastAPI provides automatic interactive documentation:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
The application uses SQLite by default (trashalert.db). To use PostgreSQL or another database, modify app/database.py.
MIT
A scalable data pipeline for collecting and processing trash pickup information across multiple cities.
This pipeline collects address data from OpenStreetMap and processes it for use in the TrashAlert system. It's designed to scale from a single city to hundreds of cities across the United States.
The pipeline consists of several modular scripts that work together:
- fetch_city_boundaries.py - Fetches city boundaries from OpenStreetMap
- build_subdivisions.py - Builds subdivision/neighborhood data for each city
- fetch_addresses_osm.py - Fetches address data from OpenStreetMap
- sample_addresses_per_city.py - Samples addresses per city (up to configurable limit)
- run_full_pipeline.py - Orchestrates the entire pipeline
- bulk_import_pipeline.py - Multi-city bulk importer with checkpoints and monitoring (NEW!)
The bulk import pipeline is a production-ready system for importing OSM data across multiple cities automatically:
Features:
- ✅ Automatic multi-city processing from cities.yaml
- ✅ Rate limiting and retry logic for Overpass API
- ✅ Resumable checkpoints for error recovery
- ✅ Database-backed progress tracking
- ✅ Real-time dashboard for monitoring
- ✅ Comprehensive error handling
Quick Start:
# 1. Run database migration (one time only)
python scripts/migrate_pipeline_tables.py
# 2. Import all cities
python scripts/bulk_import_pipeline.py --all
# 3. Monitor progress at:
# http://localhost:8000/pipeline-status.htmlFor detailed documentation, see: docs/BULK_IMPORT_PIPELINE.md
Cities are configured in config/cities.yaml. Each city entry includes:
- name: City Name
state: State Name
state_abbr: XX
country: USA
has_official_pickup_zones: true/false
pickup_zone_data_source: "URL or note"
notes: "Additional context"Process all cities:
python scripts/run_full_pipeline.py --allProcess a specific city:
python scripts/run_full_pipeline.py --city "Brawley, California"
# or
python scripts/run_full_pipeline.py --city "Brawley"Process all cities in a state:
python scripts/run_full_pipeline.py --state CA
# or
python scripts/run_full_pipeline.py --state CaliforniaSkip certain pipeline steps:
python scripts/run_full_pipeline.py --city "Brawley" \
--skip-boundaries --skip-subdivisionsEach script can be run independently with the same filtering options:
Sample addresses:
# All cities
python scripts/sample_addresses_per_city.py
# One city
python scripts/sample_addresses_per_city.py --only "Brawley, California"
# One state
python scripts/sample_addresses_per_city.py --state CA
# Custom sample size
python scripts/sample_addresses_per_city.py --only "Brawley" --max-per-city 100Fetch boundaries:
python scripts/fetch_city_boundaries.py --only "San Diego, California"Build subdivisions:
python scripts/build_subdivisions.py --state CAFetch addresses:
python scripts/fetch_addresses_osm.py --only "Brawley"- Edit
config/cities.yamland add a new city entry - Run the pipeline for that city:
python scripts/run_full_pipeline.py --city "New City, State"
The pipeline generates the following data:
data/boundaries/*.geojson- City boundary GeoJSON filesdata/subdivisions/*.json- Subdivision/neighborhood datadata/addresses_osm_raw.csv- Raw address data from OpenStreetMapdata/addresses_sampled_50_per_city.csv- Sampled addresses (default: 50 per city)
pip install pyyaml requests pandas# Run the complete pipeline for Brawley
python scripts/run_full_pipeline.py --city "Brawley"
# Output shows:
# - Cities processed: Brawley, CA
# - Steps completed: 4
# - Data statistics per city
# - Summary with timing information- Normalization step for address standardization
- Database loading functionality
- Integration with official city pickup zone data
- Support for international cities (currently US-only)
Crowdsourced trash collection day lookup for California cities
TrashAlert is a pilot project to help residents quickly find their trash collection day through community-driven data. Instead of navigating complex city websites or calling municipal offices, users can look up their address and see when trash is collected based on real observations from their neighbors.
Build a reliable, crowdsourced trash schedule database for San Diego and Imperial Valley cities, demonstrating that community data can be more accurate and up-to-date than official sources.
Current Phase: Data Pipeline Development (Phase 1)
- ✅ OpenStreetMap address extraction
- ✅ Address sampling script (50 per city)
- ✅ Sample data generation for testing
- ⏳ Database schema design
- ⏳ API development
- ⏳ Web interface
- Overview
- Architecture
- Data Flow
- Setup Instructions
- Usage
- Project Structure
- Documentation
- Contributing
- License
Finding your trash collection day is harder than it should be:
- City websites are confusing or outdated
- Schedules vary by neighborhood/subdivision
- Route changes aren't communicated well
- New residents don't know where to look
TrashAlert uses crowdsourcing to build a reliable schedule database:
- Users report when they observe trash collection
- System calculates consensus from multiple reports
- Confidence scores indicate reliability
- Self-correcting as more data comes in
- 📍 Address-based lookup: Enter your address, get your schedule
- 👥 Crowdsourced data: Community observations, not outdated records
- 🎯 Confidence scores: Know how reliable each schedule is
- 🔄 Self-updating: Automatically adapts to schedule changes
- 🗺️ Geographic sampling: Ensures coverage across subdivisions
City Config → City Boundaries → OSM Query → Address Extraction
↓
Subdivision Detection
↓
Sampling (50 per city)
↓
Database
↓
API Endpoints
↓
Web Interface
↓
User Reports
↓
Consensus Calculation
↓
Updated Schedules
-
Data Collection Layer
- City boundary definitions
- OpenStreetMap address queries
- Subdivision detection
- Geographic sampling
-
Database Layer
- PostgreSQL with PostGIS
- Cities, addresses, reports, schedules
- Spatial indexing for location queries
-
API Layer (planned)
- RESTful API with FastAPI/Flask
- Address lookup endpoints
- Report submission
- Schedule queries
-
Crowdsourcing Engine (planned)
- Consensus algorithm
- Confidence scoring
- Conflict detection
- Quality metrics
-
Client Interface (planned)
- Web application
- Mobile app (future)
For detailed architecture, see docs/architecture.md.
1. City Configuration
├─ Define city name and boundaries
└─ Load boundary GeoJSON
2. OSM Address Extraction
├─ Query Overpass API with city boundary
├─ Extract: house_number, street, subdivision, lat, lon
└─ Save to addresses_osm_raw.csv
3. Address Sampling
├─ Load raw addresses
├─ Remove null coordinates and duplicates
├─ Sample up to 50 per city (stratified by subdivision)
└─ Save to addresses_sampled_50_per_city.csv
4. Database Ingestion (planned)
├─ Load sampled addresses
├─ Geocode and validate
└─ Insert into addresses table
1. User Lookup
User enters address → Search API → Return consensus schedule + confidence
2. User Report
User observes collection → Submit report → Store in database →
Recalculate consensus → Update schedule
3. Consensus Calculation
Collect all reports for address → Filter by recency →
Calculate weighted scores → Determine consensus day →
Compute confidence score → Update trash_schedules table
For detailed data flow, see docs/architecture.md.
- Python: 3.11 or higher
- Git: For version control
- PostgreSQL: 14+ with PostGIS extension (for production)
- pip: Python package manager
-
Clone the repository
git clone https://github.com/yourusername/TrashAlert.git cd TrashAlert -
Create a virtual environment
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies (when requirements.txt is created)
pip install -r requirements.txt
Current dependencies (to be added to requirements.txt):
pandas- Data manipulationgeopandas- Geospatial data processingshapely- Geometric operationsrequests- HTTP requests for OSM APIsqlalchemy- Database ORM (future)psycopg2-binary- PostgreSQL adapter (future)fastapi- API framework (future)uvicorn- ASGI server (future)
-
Set up environment variables (future)
cp .env.example .env # Edit .env with your configuration -
Initialize the database (future)
# Create database createdb trashalert # Run migrations alembic upgrade head
Configuration files will be in config/:
cities.json- City definitions and boundariesdatabase.yml- Database connection settingsapi.yml- API configuration
Creates realistic test data for development:
python scripts/create_sample_raw_data.pyOutput: data/addresses_osm_raw.csv
- Generates 445 sample addresses across 6 cities
- Includes subdivisions for San Diego
- Adds test cases: duplicates, null coordinates
Samples up to 50 addresses per city with geographic distribution:
python scripts/sample_addresses_per_city.pyInput: data/addresses_osm_raw.csv
Output: data/addresses_sampled_50_per_city.csv
Features:
- Stratified sampling across subdivisions
- Removes duplicates and null coordinates
- Ensures geographic diversity
- Logs sampling statistics
Example Output:
2025-11-16 10:00:00 - INFO - Loading raw addresses from data/addresses_osm_raw.csv
2025-11-16 10:00:00 - INFO - Loaded 445 addresses from 6 cities
2025-11-16 10:00:00 - INFO - San Diego: sampled 50 from 200 addresses across 8 subdivisions
2025-11-16 10:00:00 - INFO - El Centro: sampled 50 from 80 addresses across 5 subdivisions
2025-11-16 10:00:00 - INFO - Calexico: sampled 50 from 65 addresses (no subdivisions)
2025-11-16 10:00:00 - INFO - ✓ Done! Sampled dataset saved to data/addresses_sampled_50_per_city.csv
# Add city to config/cities.json
python scripts/add_city.py --name "Carlsbad" --state "CA"
# Fetch addresses from OSM
python scripts/fetch_osm_addresses.py --city "Carlsbad"
# Sample addresses
python scripts/sample_addresses.py --city "Carlsbad"
# Load into database
python scripts/load_to_db.py --city "Carlsbad"# Development server
uvicorn app.main:app --reload --port 8000
# Production server
gunicorn app.main:app -w 4 -k uvicorn.workers.UvicornWorkerAPI will be available at: http://localhost:8000
API documentation: http://localhost:8000/docs
# Run all tests
pytest
# Run with coverage
pytest --cov=app --cov-report=html
# Run specific test file
pytest tests/test_consensus.pyTrashAlert/
├── README.md # This file
├── .gitignore # Git ignore rules
├── requirements.txt # Python dependencies (to be created)
├── setup.py # Package setup (to be created)
│
├── data/ # Data files
│ ├── addresses_osm_raw.csv # Raw OSM address data
│ └── addresses_sampled_50_per_city.csv # Sampled addresses
│
├── scripts/ # Data processing scripts
│ ├── create_sample_raw_data.py # Generate test data
│ ├── sample_addresses_per_city.py # Sample addresses
│ ├── fetch_osm_addresses.py # Fetch from OSM (future)
│ ├── load_to_db.py # Load to database (future)
│ └── add_city.py # Add new city (future)
│
├── app/ # Application code (future)
│ ├── __init__.py
│ ├── main.py # FastAPI app entry point
│ ├── config.py # Configuration management
│ ├── database.py # Database connection
│ │
│ ├── models/ # SQLAlchemy models
│ │ ├── __init__.py
│ │ ├── city.py
│ │ ├── address.py
│ │ ├── report.py
│ │ └── schedule.py
│ │
│ ├── api/ # API routes
│ │ ├── __init__.py
│ │ ├── addresses.py
│ │ ├── reports.py
│ │ └── schedules.py
│ │
│ ├── services/ # Business logic
│ │ ├── __init__.py
│ │ ├── consensus.py # Consensus algorithm
│ │ ├── geocoding.py # Address geocoding
│ │ └── validation.py # Data validation
│ │
│ └── utils/ # Utility functions
│ ├── __init__.py
│ └── geo.py # Geospatial helpers
│
├── tests/ # Test suite (future)
│ ├── __init__.py
│ ├── test_consensus.py
│ ├── test_api.py
│ └── test_models.py
│
├── docs/ # Documentation
│ ├── architecture.md # System architecture
│ ├── data_model.md # Database schema
│ ├── crowdsourcing.md # Consensus algorithm
│ └── api.md # API documentation (future)
│
├── config/ # Configuration files (future)
│ ├── cities.json # City definitions
│ ├── database.yml # Database config
│ └── api.yml # API config
│
├── migrations/ # Database migrations (future)
│ └── alembic/ # Alembic migration files
│
└── web/ # Frontend (future)
├── public/
├── src/
└── package.json
- Architecture Overview: System design and component details
- Data Model: Database schema, tables, and relationships
- Crowdsourcing Logic: Consensus algorithm and quality metrics
- API Reference: Endpoint documentation with examples
- Deployment Guide: Production deployment instructions
- Contributing Guide: How to contribute to the project
- User Guide: How to use the web interface
Manual testing with sample data:
# Generate sample data
python scripts/create_sample_raw_data.py
# Run sampling script
python scripts/sample_addresses_per_city.py
# Verify output
head -20 data/addresses_sampled_50_per_city.csvAutomated test suite:
# Unit tests
pytest tests/test_consensus.py
pytest tests/test_models.py
# Integration tests
pytest tests/test_api.py
# End-to-end tests
pytest tests/test_e2e.py- Sample Data: Generated test data for development
- 6 cities: San Diego, El Centro, Calexico, Brawley, Imperial, Holtville
- 445 addresses total (200 in San Diego, varying amounts in others)
- Includes subdivisions where applicable
- OpenStreetMap: Real address data via Overpass API
- City Boundaries: GeoJSON from OpenStreetMap or city open data portals
- Official Schedules: Where available from city websites
- User Reports: Crowdsourced observations
- Python 3.11+: Core language
- Pandas: Data processing
- Standard Library: CSV handling, logging
Backend:
- FastAPI: Modern Python web framework
- SQLAlchemy: ORM for database operations
- PostgreSQL: Database with PostGIS extension
- Alembic: Database migrations
- Pydantic: Data validation
Data Processing:
- GeoPandas: Geospatial data analysis
- Shapely: Geometric operations
- Requests: HTTP client for OSM API
Frontend (future):
- React: UI framework
- Leaflet: Interactive maps
- Tailwind CSS: Styling
Infrastructure:
- Docker: Containerization
- Nginx: Reverse proxy
- GitHub Actions: CI/CD
- Sample data generation
- Address sampling script
- Database schema implementation
- OSM data fetching script
- Data ingestion pipeline
- FastAPI project setup
- Database models (SQLAlchemy)
- CRUD operations
- Consensus algorithm implementation
- API endpoints
- React app setup
- Address search UI
- Schedule display
- Report submission form
- Confidence indicators
- Deploy to cloud platform
- Load real OSM data for pilot cities
- User testing
- Bug fixes and refinements
- Documentation updates
- Add more California cities
- Mobile app (React Native)
- Advanced features (reminders, etc.)
- Integration with official city APIs
Contributions are welcome! This is an early-stage project with lots of opportunities to help.
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature - Make your changes
- Add tests (when test suite is set up)
- Commit:
git commit -m "Add your feature" - Push:
git push origin feature/your-feature - Open a Pull Request
- Database schema refinement
- API development
- Frontend development
- Testing and quality assurance
- Documentation improvements
- Data collection for new cities
[To be determined - recommend MIT or Apache 2.0]
- Project Lead: [Your Name]
- Email: [your-email@example.com]
- GitHub Issues: https://github.com/yourusername/TrashAlert/issues
- OpenStreetMap: For providing free, open address data
- PostGIS: For powerful geospatial database capabilities
- FastAPI: For the excellent Python web framework
- Community Contributors: Everyone who reports trash days!
Note: This is a pilot project. Schedules may not be 100% accurate. Always verify with your local waste management provider for official information.