The Karnataka land records site (landrecords.karnataka.gov.in) blocks all cloud datacenter IPs.
- Railway, Vercel, AWS, Cloudflare Workers (as origin) — all blocked with HTTP 522/504
- Regular user devices (Indian ISPs, mobile networks) — NOT blocked
The correct approach: run PDF extraction client-side in the user's browser.
- Browser makes ASP.NET form POSTs through the Cloudflare Worker (for CORS headers only)
- CF Worker:
https://geodocs-proxy.harshag954.workers.dev - Client-side extractor:
lib/clientPdfExtractor.ts - Entry point:
lib/api.ts→fetchPdfUrl()tries client-side first, falls back to server API
Do NOT try to reach the Karnataka site from any server/cloud function. It will always fail.
User Browser
└─> lib/api.ts → fetchPdfUrl()
└─> lib/clientPdfExtractor.ts (client-side, primary)
└─> fetch() to geodocs-proxy.harshag954.workers.dev (CF Worker, CORS only)
└─> landrecords.karnataka.gov.in (Karnataka site)
└─> /api/get-pdf-url (server fallback, will fail for cloud-blocked IPs)
- GET
/service3/→ grab hidden form fields (__VIEWSTATEetc.) - POST district selection (
__EVENTTARGET: ddl_district) - POST taluk selection (
__EVENTTARGET: ddl_taluk) - POST hobli selection (
__EVENTTARGET: ddl_hobli) - POST village search (
btnSearch: Search) - Parse
window.open('FileDownload.aspx?file=...')from response HTML - Return full
https://landrecords.karnataka.gov.in/service3/FileDownload.aspx?file=...URL
- Railway:
https://geodocs-production.up.railway.app— hosts Next.js static export + Express API server - Cloudflare Worker:
https://geodocs-proxy.harshag954.workers.dev— CORS proxy only, deployed viacf-proxy/ - Railway env vars:
CF_PROXY_URL,NEXT_PUBLIC_CF_PROXY_URL(both set) - No
PDF_BACKEND_URLorPROXY_URL— both were removed (dead ngrok, broken Bright Data)