|
| 1 | +# Domus Crawler |
| 2 | + |
| 3 | +A dedicated CLI tool driven by `@domus/sync` to orchestrate data synchronization between the central DUK system and a local parish database. Designed to be deployed on a mini PC within the parish network. |
| 4 | + |
| 5 | +## 🚀 Overview |
| 6 | + |
| 7 | +The crawler performs four main tasks: |
| 8 | +1. **Seed**: Populates local regional data (Provinces, Regencies, Districts, Villages) from BPS sources. |
| 9 | +2. **Scrape**: Fetches reference data and parishioner records from DUK into a staging area. |
| 10 | +3. **Transform**: Processes staged data into production-ready tables in the local database. |
| 11 | +4. **Sync-Back**: Pushes local updates (made in Domus) back to the DUK system. |
| 12 | + |
| 13 | +## ⚙️ Configuration |
| 14 | + |
| 15 | +The crawler requires the following environment variables. Create a `.env` file in this directory or pass them to Docker. |
| 16 | + |
| 17 | +| Variable | Description | Example | |
| 18 | +| :--- | :--- | :--- | |
| 19 | +| `DATABASE_URL` | Local PostgreSQL connection string. | `postgresql://user:pass@localhost:5432/domus` | |
| 20 | +| `SYNC_TARGET_URL` | Base URL of the DUK system. | `https://duk-target.com` | |
| 21 | +| `SYNC_USERNAME` | Username for DUK system login. | `parish_admin` | |
| 22 | +| `SYNC_PASSWORD` | Password for DUK system login. | `********` | |
| 23 | + |
| 24 | +## 🛠️ Usage |
| 25 | + |
| 26 | +### Local Development |
| 27 | + |
| 28 | +Running from the monorepo root: |
| 29 | + |
| 30 | +```bash |
| 31 | +# Show help |
| 32 | +pnpm --filter @domus/crawler start help |
| 33 | + |
| 34 | +# Run full pipeline |
| 35 | +pnpm --filter @domus/crawler start crawl |
| 36 | + |
| 37 | +# Individual commands |
| 38 | +pnpm --filter @domus/crawler start seed |
| 39 | +pnpm --filter @domus/crawler start scrape |
| 40 | +pnpm --filter @domus/crawler start transform |
| 41 | +pnpm --filter @domus/crawler start syncback |
| 42 | +``` |
| 43 | + |
| 44 | +### Docker Deployment |
| 45 | + |
| 46 | +For production deployment on a mini PC: |
| 47 | + |
| 48 | +1. **Build the image** (run from monorepo root): |
| 49 | + ```bash |
| 50 | + docker build -t domus-crawler -f apps/crawler/Dockerfile . |
| 51 | + ``` |
| 52 | + |
| 53 | +2. **Run the container**: |
| 54 | + ```bash |
| 55 | + docker run --rm --env-file .env domus-crawler [command] |
| 56 | + ``` |
| 57 | + |
| 58 | + *Example: Run full crawl* |
| 59 | + ```bash |
| 60 | + docker run --rm --env-file .env domus-crawler crawl |
| 61 | + ``` |
| 62 | + |
| 63 | +## 🏗️ Docker Details |
| 64 | + |
| 65 | +The Dockerfile uses a multi-stage build: |
| 66 | +- **Base**: Installs all monorepo dependencies. |
| 67 | +- **Run**: Uses `tsx` to execute the TypeScript source directly, ensuring full compatibility with the monorepo's ESM structure without complex build-time transpilation issues. |
| 68 | +
|
| 69 | +--- |
| 70 | +
|
| 71 | +Built with ❤️ for Kristus Raja Barong Tongkok Parish. |
0 commit comments