Description
The analytics middleware records every HTTP request as a page view in the database. This means search engine crawlers (Googlebot, Bingbot, etc.) hitting endpoints like /robots.txt, /sitemap.xml, /feed.xml, and content pages inflate view counts with non-human traffic.
Problems
- Non-content endpoints tracked:
/robots.txt, /sitemap.xml, /feed.xml, /health, /favicon.ico, /pygments.css all generate analytics rows despite not being user-facing page views
- Bot traffic not filtered: Crawler User-Agents are counted as regular views, skewing admin dashboard metrics
- Unnecessary DB writes: Every bot hit creates a database row, adding write load with no analytical value
Possible Approaches
- Filter by
Content-Type — only track responses with text/html content type
- Filter by path — exclude known non-content paths (
/robots.txt, /sitemap.xml, /feed.xml, /health, etc.)
- Filter by User-Agent — detect common bot UA strings and skip tracking
- Combination of the above
Implementation Notes
- Analytics middleware is in
main.py
- Simplest first pass: only track
text/html responses, which covers all real page views and excludes XML, JSON, CSS, and plain text endpoints
- Bot filtering by User-Agent could be a follow-up enhancement
— Claude
Description
The analytics middleware records every HTTP request as a page view in the database. This means search engine crawlers (Googlebot, Bingbot, etc.) hitting endpoints like
/robots.txt,/sitemap.xml,/feed.xml, and content pages inflate view counts with non-human traffic.Problems
/robots.txt,/sitemap.xml,/feed.xml,/health,/favicon.ico,/pygments.cssall generate analytics rows despite not being user-facing page viewsPossible Approaches
Content-Type— only track responses withtext/htmlcontent type/robots.txt,/sitemap.xml,/feed.xml,/health, etc.)Implementation Notes
main.pytext/htmlresponses, which covers all real page views and excludes XML, JSON, CSS, and plain text endpoints— Claude