Skip to content

fix(stats-proxy): exact-match post pages to stop sibling-permlink overmatch#944

Merged
feruzm merged 2 commits into
developfrom
bugfix/plausible-stats-exact-match
Jun 11, 2026
Merged

fix(stats-proxy): exact-match post pages to stop sibling-permlink overmatch#944
feruzm merged 2 commits into
developfrom
bugfix/plausible-stats-exact-match

Conversation

@feruzm

@feruzm feruzm commented Jun 11, 2026

Copy link
Copy Markdown
Member

Problem

The shared /api/stats proxy filtered pages with a substring contains, so a post's stats query (/@alice/foo) also matched longer sibling permlinks like /@alice/foo-2. Hive mints prefix-sibling permlinks (reposting the same title appends -2, -3, …), so affected posts showed inflated view/device counts. Flagged by greptile/codex/coderabbit on the stats PRs (#943, ecency-mobile#3251).

Fix

Switch full-permlink lookups from contains to an end-anchored matches regex:

  • matches maps to ClickHouse multiMatchAny (unanchored), so an escaped path + $ matches every recorded page shape that ends in the canonical /@author/permlink — bare, community (/hive-123/@…), tag (/tag/@…) — while excluding longer siblings (/@a/p no longer matches /@a/p-2 or /@a/foobar).
  • The path is regex-escaped so an author like peak.snaps matches literally (. can't match an arbitrary char).
  • Trailing-slash URLs keep contains — profile insights sends /@user/ to pull all pages under a user, which must stay a prefix match.

Validated the exact multiMatchAny(pathname, ['…$']) behavior for each path shape (bare/community/tag matched; -2/foobar siblings excluded; escaped-dot precision) before shipping.

Fixes the overmatch for web (entry-stats) and mobile at once, since both go through this route.

…rmatch

The page filter used substring `contains`, so `/@alice/foo` also matched longer
siblings like `/@alice/foo-2` (Hive mints prefix-sibling permlinks on reposts),
inflating a post's stats. Full-permlink lookups now use an end-anchored `matches`
regex (escaped path + `$`): Plausible's `matches` maps to ClickHouse multiMatchAny
(unanchored), so this still catches every recorded shape (bare, community, tag)
ending in the canonical /@author/permlink while excluding longer siblings.
Trailing-slash URLs (profile insights `/@user/`) keep `contains` so per-user
breakdowns still work.
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Warning

Review limit reached

@feruzm, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 12 minutes and 23 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more credits in the billing tab to continue.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6798976a-cd3c-4506-b2cf-bea9543df8e2

📥 Commits

Reviewing files that changed from the base of the PR and between 8b6cce6 and e2620b1.

📒 Files selected for processing (1)
  • apps/web/src/app/api/stats/route.ts
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bugfix/plausible-stats-exact-match

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps

greptile-apps Bot commented Jun 11, 2026

Copy link
Copy Markdown

Greptile Summary

This PR fixes a stats overmatch bug where a post's page filter (/@alice/foo) incorrectly included sibling permlinks (/@alice/foo-2) because Plausible's contains operator is a substring match. The fix switches full-permlink lookups to an end-anchored matches regex while keeping contains for trailing-slash profile queries; it also strips query strings and fragments from the path before building the filter.

  • Overmatch fix: Non-trailing-slash paths now use escapeRegExp(page) + "$" so Plausible's multiMatchAny anchors at the end, matching bare/community/tag URL shapes that end in /@author/permlink while excluding longer siblings.
  • Regex safety: escapeRegExp ensures special characters in author names (e.g. peak.snaps) are matched literally.
  • Query/fragment stripping: split(/[?#]/)[0] removes any query or fragment from the decoded path before it is used in either filter branch, preventing empty results for paths that carry extra parameters.

Confidence Score: 5/5

Safe to merge — the change is tightly scoped to the filter-building logic and correctly handles all documented URL shapes without regressing profile or post queries.

The fix is well-reasoned and validated against the exact ClickHouse multiMatchAny behavior described in the PR. The trailing-slash contains path preserves existing profile-insights behavior; the matches path correctly end-anchors post permlinks. The escapeRegExp implementation covers all RE2/PCRE metacharacters, and the query/fragment strip prevents silent empty-result regressions. No error-handling regressions; existing timeout/502 guards are untouched.

No files require special attention.

Important Files Changed

Filename Overview
apps/web/src/app/api/stats/route.ts Adds escapeRegExp helper and switches non-trailing-slash page paths from substring contains to end-anchored matches regex, fixing sibling-permlink overmatch; also strips query/fragment before building the filter.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[POST /api/stats] --> B["decode + strip query/fragment → page"]
    B --> C{ends with slash?}
    C -- Yes --> D["contains filter\ne.g. /@user/ pulls all user pages"]
    C -- No --> E["escapeRegExp + dollar sign anchor\ne.g. /@alice/foo becomes regex"]
    E --> F["multiMatchAny in Plausible\nMatches bare, community, tag paths\nExcludes /@alice/foo-2 and siblings"]
    D --> G[Plausible API query]
    F --> G
    G --> H[Return stats to client]
Loading

Reviews (2): Last reviewed commit: "fix(stats-proxy): strip query string/fra..." | Re-trigger Greptile

Comment thread apps/web/src/app/api/stats/route.ts
Plausible stores the pathname only, so a path carrying a query string or fragment
(e.g. a comment permalink's #@author/permlink) would never match. Strip it before
building the filter.
@feruzm feruzm merged commit e6c12f9 into develop Jun 11, 2026
5 checks passed
@feruzm feruzm deleted the bugfix/plausible-stats-exact-match branch June 11, 2026 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant