-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Userscript
Problem:
The time-to-read userscript currently activates on many websites that don't contain article content or long-form text. This causes unnecessary script execution and occasionally displays reading time indicators on inappropriate pages.
Potential Causes:
After looking throught the code, I've found several factors contributing to this issue:
-
Overly Broad URL Matching:
- Using
@match *://*/*in the userscript metadata means the script loads on every website - While the script does have exclusion logic, it only blocks specific URL patterns rather than identifying article sites
- Using
-
Generic Content Selectors:
- Fallback selectors like
main,.content, and genericarticletags match elements on many non-article websites - The "catch-all" pattern
{ domain: "*", contentSelector: "main" }is particularly problematic as nearly all websites have a<main>element
- Fallback selectors like
-
Liberal Text Block Detection:
- The findLargestTextBlock() function selects any block with sufficient text density
- Many sites with long navigation menus, documentation, or code examples can trigger this detection
-
Minimal Word Count Threshold:
- Current MIN_WORD_COUNT of 100 is too low for reliably identifying article content
- Many non-article pages (product listings, documentation indexes) can exceed this threshold
Steps to reproduce:
- Visit sites like GitHub repository pages, documentation sites, or e-commerce product listings
- Observe that the reading time indicator appears despite no article content being present
- Specific examples:
- Shopping cart pages on e-commerce sites
- GitHub repository main pages
- API documentation home pages
- Social media feed pages
Proposed Solutions:
-
Improved Site Detection:
- Implement a more robust content detection algorithm
- Consider checking meta tags (e.g.,
<meta property="og:type" content="article">) - Look for article schema markup (
itemtype="http://schema.org/Article")
-
More Specific URL Matching:
- Limit script execution to known content sites (news, blogs)
- Add more excluded URL patterns for common non-article paths
-
Better Content Heuristics:
- Assess text-to-HTML ratio in candidate content blocks
- Check for article-specific patterns (date published, author byline, etc.)
- Increase MIN_WORD_COUNT to a more selective threshold (e.g., 250-300)
-
User Configuration:
- Add a whitelist feature where users can specify which sites should run the script
- Implement a "training mode" where users can manually select article blocks
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working