Motivation
This proposal is primarily motivated by the challenges discussed in The Storm Events Database Explorer (Great Data Products). As highlighted by Brad Andrich and Quinn Kiter, understanding user needs currently requires intensive manual effort, such as the 20-question surveys used to identify workflows for meteorologists and emergency managers [00:14:36]. Furthermore, there is a critical need to serve "underserved" personas—such as local government planners—who require high-value geospatial insights but lack the "army of data scientists" or SQL expertise to interrogate raw datasets [00:17:02]. By building analytics directly into each data product, Source Cooperative can automate this discovery process for publishers and lower the barrier for non-technical users.
Description of Feature
Add built-in analytics for geospatial data repositories to automatically analyze server access logs and generate insights on most-viewed geospatial areas (e.g., via converting PMTiles range requests to Z/X/Y tile requests and their geospatial extent or GeoParquet spatial filter patterns converted to bounding box extents) and user locations (city-level from client IPs using GeoIP lookups). Additionally, you could use the user-agent for more granular information about the end-users (ex. QGIS) so that Source Cooperative can prioritize work accordingly.
The implementation could include privacy controls to respect Do Not Track (DNT) headers by skipping enrichment and anonymizing IPs when DNT=1. Output aggregated metrics (e.g., view counts per tile extent, spatial query bounds, or city) would be sent to a queryable backend for visualization in tools like Grafana GeoMap panels.
Technical Implementation
The system could leverage OpenTelemetry Collector for log ingestion and Vector for parsing, enrichment, and aggregation as potential open-source components.
If executed effectively, these analytics could be exported as a native GeoParquet file for each data product. This would allow the feature to be expanded to the wider community, enabling partners to aggregate usage data cross-platform to better serve users, acknowledging that Source Cooperative is one of several initiatives serving the geospatial ecosystem.
Value to Source Cooperative
1. Actionable Insights for Publishers
This feature transforms Source from a static data hosting utility into an intelligent platform that reveals usage patterns without requiring publishers to conduct manual user surveys [00:14:18]. Publishers gain instant feedback on:
- Geospatial Hotspots: Identifying which specific regions (e.g., specific waterways, urban zones, or weather corridors) are being accessed most frequently.
- User Geography: Understanding city-level demand from sectors like meteorology and emergency management.
- Strategic Optimization: Enabling data prioritization and regional server optimizations based on real-world demand.
2. Community & Ecosystem Growth
By providing these insights as GeoParquet files, Source Cooperative facilitates a "network effect" for open data:
- Serving the Technical Gap: As discussed in the motivator video, many users (like mid-sized city planners) need the data to "just work" [00:17:02]. Analytics help publishers tailor their data structures (like pre-calculated GeoParquet files) to these specific needs.
- Cross-Initiative Aggregation: Partners can combine Source Cooperative usage data with metrics from other platforms to form a holistic view of global geospatial data demand.
- Standardization: Using GeoParquet for analytics metadata reinforces modern standards, making it easier for users to analyze "data about the data" using tools like DuckDB or Apache Sedona.
3. Platform Differentiation & Trust
For the cooperative, this increases retention by providing high-value analytics as a core utility, differentiating Source from basic S3-like storage providers. It aligns with the mission of making planetary data accessible while fostering community trust through transparent, privacy-respecting aggregation where individual logs are not stored, only the high-level spatial trends.
Let me know if you need any further clarification on the proposed feature, I think this could be a really amazing feature.
Motivation
This proposal is primarily motivated by the challenges discussed in The Storm Events Database Explorer (Great Data Products). As highlighted by Brad Andrich and Quinn Kiter, understanding user needs currently requires intensive manual effort, such as the 20-question surveys used to identify workflows for meteorologists and emergency managers [00:14:36]. Furthermore, there is a critical need to serve "underserved" personas—such as local government planners—who require high-value geospatial insights but lack the "army of data scientists" or SQL expertise to interrogate raw datasets [00:17:02]. By building analytics directly into each data product, Source Cooperative can automate this discovery process for publishers and lower the barrier for non-technical users.
Description of Feature
Add built-in analytics for geospatial data repositories to automatically analyze server access logs and generate insights on most-viewed geospatial areas (e.g., via converting
PMTilesrange requests toZ/X/Ytile requests and their geospatial extent orGeoParquetspatial filter patterns converted to bounding box extents) and user locations (city-level from client IPs using GeoIP lookups). Additionally, you could use the user-agent for more granular information about the end-users (ex. QGIS) so that Source Cooperative can prioritize work accordingly.The implementation could include privacy controls to respect Do Not Track (DNT) headers by skipping enrichment and anonymizing IPs when
DNT=1. Output aggregated metrics (e.g., view counts per tile extent, spatial query bounds, or city) would be sent to a queryable backend for visualization in tools like Grafana GeoMap panels.Technical Implementation
The system could leverage OpenTelemetry Collector for log ingestion and Vector for parsing, enrichment, and aggregation as potential open-source components.
If executed effectively, these analytics could be exported as a native GeoParquet file for each data product. This would allow the feature to be expanded to the wider community, enabling partners to aggregate usage data cross-platform to better serve users, acknowledging that Source Cooperative is one of several initiatives serving the geospatial ecosystem.
Value to Source Cooperative
1. Actionable Insights for Publishers
This feature transforms Source from a static data hosting utility into an intelligent platform that reveals usage patterns without requiring publishers to conduct manual user surveys [00:14:18]. Publishers gain instant feedback on:
2. Community & Ecosystem Growth
By providing these insights as GeoParquet files, Source Cooperative facilitates a "network effect" for open data:
3. Platform Differentiation & Trust
For the cooperative, this increases retention by providing high-value analytics as a core utility, differentiating Source from basic S3-like storage providers. It aligns with the mission of making planetary data accessible while fostering community trust through transparent, privacy-respecting aggregation where individual logs are not stored, only the high-level spatial trends.Let me know if you need any further clarification on the proposed feature, I think this could be a really amazing feature.