fix(stats-proxy): exact-match post pages to stop sibling-permlink overmatch#944
Conversation
…rmatch The page filter used substring `contains`, so `/@alice/foo` also matched longer siblings like `/@alice/foo-2` (Hive mints prefix-sibling permlinks on reposts), inflating a post's stats. Full-permlink lookups now use an end-anchored `matches` regex (escaped path + `$`): Plausible's `matches` maps to ClickHouse multiMatchAny (unanchored), so this still catches every recorded shape (bare, community, tag) ending in the canonical /@author/permlink while excluding longer siblings. Trailing-slash URLs (profile insights `/@user/`) keep `contains` so per-user breakdowns still work.
|
Warning Review limit reached
More reviews will be available in 12 minutes and 23 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more credits in the billing tab to continue. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR fixes a stats overmatch bug where a post's page filter (
Confidence Score: 5/5Safe to merge — the change is tightly scoped to the filter-building logic and correctly handles all documented URL shapes without regressing profile or post queries. The fix is well-reasoned and validated against the exact ClickHouse multiMatchAny behavior described in the PR. The trailing-slash contains path preserves existing profile-insights behavior; the matches path correctly end-anchors post permlinks. The escapeRegExp implementation covers all RE2/PCRE metacharacters, and the query/fragment strip prevents silent empty-result regressions. No error-handling regressions; existing timeout/502 guards are untouched. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[POST /api/stats] --> B["decode + strip query/fragment → page"]
B --> C{ends with slash?}
C -- Yes --> D["contains filter\ne.g. /@user/ pulls all user pages"]
C -- No --> E["escapeRegExp + dollar sign anchor\ne.g. /@alice/foo becomes regex"]
E --> F["multiMatchAny in Plausible\nMatches bare, community, tag paths\nExcludes /@alice/foo-2 and siblings"]
D --> G[Plausible API query]
F --> G
G --> H[Return stats to client]
Reviews (2): Last reviewed commit: "fix(stats-proxy): strip query string/fra..." | Re-trigger Greptile |
Plausible stores the pathname only, so a path carrying a query string or fragment (e.g. a comment permalink's #@author/permlink) would never match. Strip it before building the filter.
Problem
The shared
/api/statsproxy filtered pages with a substringcontains, so a post's stats query (/@alice/foo) also matched longer sibling permlinks like/@alice/foo-2. Hive mints prefix-sibling permlinks (reposting the same title appends-2,-3, …), so affected posts showed inflated view/device counts. Flagged by greptile/codex/coderabbit on the stats PRs (#943, ecency-mobile#3251).Fix
Switch full-permlink lookups from
containsto an end-anchoredmatchesregex:matchesmaps to ClickHousemultiMatchAny(unanchored), so an escaped path +$matches every recorded page shape that ends in the canonical/@author/permlink— bare, community (/hive-123/@…), tag (/tag/@…) — while excluding longer siblings (/@a/pno longer matches/@a/p-2or/@a/foobar).peak.snapsmatches literally (.can't match an arbitrary char).contains— profile insights sends/@user/to pull all pages under a user, which must stay a prefix match.Validated the exact
multiMatchAny(pathname, ['…$'])behavior for each path shape (bare/community/tag matched;-2/foobarsiblings excluded; escaped-dot precision) before shipping.Fixes the overmatch for web (entry-stats) and mobile at once, since both go through this route.