You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add support for routing read requests to different backend storage providers based on the client's geographic location or credentials. This would allow a single virtual bucket (e.g., statistical) to be served from the nearest replica — reducing latency and improving download speeds for users worldwide.
Motivation
For a data portal like dataforcanada.org, datasets are accessed globally. Currently, each virtual bucket maps to exactly one backend. A client in Tokyo downloading from a Source Cooperative bucket in us-west-2 experiences high latency. With geo-routing, that same request would be served from a Tigris bucket in Tokyo or an R2 bucket in Asia-Pacific.
Proposed Behavior
A new configuration section would map (bucket, country/region) pairs to alternative backend configs. When a read request arrives, the proxy checks the client's location (available via Cloudflare's cf.country on Workers) and routes to the nearest replica. Writes would continue to go to the primary backend.
flowchart TD
JP["Client in Japan 🇯🇵"] -->|"GET /statistical/data.parquet"| Worker["CF Worker - s3.dataforcanada.org"]
CA["Client in Canada 🇨🇦"] -->|"GET /statistical/data.parquet"| Worker
DE["Client in Germany 🇩🇪"] -->|"GET /statistical/data.parquet"| Worker
AU["Client in Australia 🇦🇺"] -->|"GET /statistical/data.parquet"| Worker
PL["Client in Poland 🇵🇱"] -->|"GET /statistical/data.parquet"| Worker
Worker --> GeoMW{"Geo-Router - Middleware"}
GeoMW -->|"🇯🇵 JP → Tigris Tokyo (nrt)"| Tigris["Tigris Data - Tokyo"]
GeoMW -->|"🇨🇦 CA → Primary"| SC["Source Cooperative - us-west-2"]
GeoMW -->|"🇩🇪 DE → R2 Western Europe (weur)"| R2WEUR["Cloudflare R2 - Western Europe"]
GeoMW -->|"🇵🇱 PL → R2 Eastern Europe (eeur)"| R2EEUR["Cloudflare R2 - Eastern Europe"]
GeoMW -->|"🇦🇺 AU → R2 Oceania (oc)"| R2OC["Cloudflare R2 - Oceania"]
Loading
Example Configuration
# Primary bucket (default — North America)
[[buckets]]
name = "statistical"backend_type = "s3"anonymous_access = truebackend_prefix = "dataforcanada/d4c-datapkg-statistical/"
[buckets.backend_options]
bucket_name = "us-west-2.opendata.source.coop"endpoint = "https://s3.us-west-2.amazonaws.com"region = "us-west-2"skip_signature = "true"# Geo overrides for the "statistical" bucket
[geo_routing.statistical]
# Japan → Tigris Tokyo
[geo_routing.statistical.JP]
backend_type = "s3"endpoint = "https://t3.storage.dev"bucket_name = "d4c-datapkg-statistical"region = "nrt"# Asia-Pacific (fallback for other APAC countries) → R2 Asia-Pacific
[geo_routing.statistical.apac]
backend_type = "s3"endpoint = "https://<account>.r2.cloudflarestorage.com"bucket_name = "d4c-datapkg-statistical-apac"region = "apac"# Western Europe → R2 Western Europe
[geo_routing.statistical.weur]
backend_type = "s3"endpoint = "https://<account>.r2.cloudflarestorage.com"bucket_name = "d4c-datapkg-statistical-weur"region = "weur"# Eastern Europe → R2 Eastern Europe
[geo_routing.statistical.eeur]
backend_type = "s3"endpoint = "https://<account>.r2.cloudflarestorage.com"bucket_name = "d4c-datapkg-statistical-eeur"region = "eeur"# Oceania → R2 Oceania
[geo_routing.statistical.oc]
backend_type = "s3"endpoint = "https://<account>.r2.cloudflarestorage.com"bucket_name = "d4c-datapkg-statistical-oc"region = "oc"
Resolution Priority
When a request arrives from a specific country, the geo-router should resolve in this order:
Exact country match — e.g., JP → Tigris Tokyo
Region match — e.g., other APAC countries → R2 Asia-Pacific
Primary backend — fallback to the default bucket config (Source Cooperative)
Scope
Reads only (GET, HEAD, LIST) — writes should always go to the primary backend
Cloudflare Workers — the cf.country field is readily available on every request
Server runtime — could use GeoIP lookup on source_ip as a future extension
Summary
Add support for routing read requests to different backend storage providers based on the client's geographic location or credentials. This would allow a single virtual bucket (e.g.,
statistical) to be served from the nearest replica — reducing latency and improving download speeds for users worldwide.Motivation
For a data portal like dataforcanada.org, datasets are accessed globally. Currently, each virtual bucket maps to exactly one backend. A client in Tokyo downloading from a Source Cooperative bucket in
us-west-2experiences high latency. With geo-routing, that same request would be served from a Tigris bucket in Tokyo or an R2 bucket in Asia-Pacific.Proposed Behavior
A new configuration section would map
(bucket, country/region)pairs to alternative backend configs. When a read request arrives, the proxy checks the client's location (available via Cloudflare'scf.countryon Workers) and routes to the nearest replica. Writes would continue to go to the primary backend.flowchart TD JP["Client in Japan 🇯🇵"] -->|"GET /statistical/data.parquet"| Worker["CF Worker - s3.dataforcanada.org"] CA["Client in Canada 🇨🇦"] -->|"GET /statistical/data.parquet"| Worker DE["Client in Germany 🇩🇪"] -->|"GET /statistical/data.parquet"| Worker AU["Client in Australia 🇦🇺"] -->|"GET /statistical/data.parquet"| Worker PL["Client in Poland 🇵🇱"] -->|"GET /statistical/data.parquet"| Worker Worker --> GeoMW{"Geo-Router - Middleware"} GeoMW -->|"🇯🇵 JP → Tigris Tokyo (nrt)"| Tigris["Tigris Data - Tokyo"] GeoMW -->|"🇨🇦 CA → Primary"| SC["Source Cooperative - us-west-2"] GeoMW -->|"🇩🇪 DE → R2 Western Europe (weur)"| R2WEUR["Cloudflare R2 - Western Europe"] GeoMW -->|"🇵🇱 PL → R2 Eastern Europe (eeur)"| R2EEUR["Cloudflare R2 - Eastern Europe"] GeoMW -->|"🇦🇺 AU → R2 Oceania (oc)"| R2OC["Cloudflare R2 - Oceania"]Example Configuration
Resolution Priority
When a request arrives from a specific country, the geo-router should resolve in this order:
JP→ Tigris TokyoScope
cf.countryfield is readily available on every requestsource_ipas a future extension