[BUG]: time-to-read userscript activates on non-article websites

## Userscript

- [time-to-read.js](https://github.com/bittricky/userscripts/blob/main/time-to-read/time-to-read.js)

## Problem:

The `time-to-read` userscript currently activates on many websites that don't contain article content or long-form text. This causes unnecessary script execution and occasionally displays reading time indicators on inappropriate pages.

## Potential Causes:

After looking throught the code, I've found several factors contributing to this issue:

1. **Overly Broad URL Matching**: 
   - Using `@match *://*/*` in the userscript metadata means the script loads on every website
   - While the script does have exclusion logic, it only blocks specific URL patterns rather than identifying article sites

2. **Generic Content Selectors**:
   - Fallback selectors like `main`, `.content`, and generic `article` tags match elements on many non-article websites
   - The "catch-all" pattern `{ domain: "*", contentSelector: "main" }` is particularly problematic as nearly all websites have a `<main>` element

3. **Liberal Text Block Detection**:
   - The [findLargestTextBlock()](cci:1://file:///Users/bittricky/Projects/userscripts/time-to-read/time-to-read.js:83:2-104:3) function selects any block with sufficient text density
   - Many sites with long navigation menus, documentation, or code examples can trigger this detection

4. **Minimal Word Count Threshold**:
   - Current MIN_WORD_COUNT of 100 is too low for reliably identifying article content
   - Many non-article pages (product listings, documentation indexes) can exceed this threshold

## Steps to reproduce:

1. Visit sites like GitHub repository pages, documentation sites, or e-commerce product listings
2. Observe that the reading time indicator appears despite no article content being present
3. Specific examples:
   - Shopping cart pages on e-commerce sites
   - GitHub repository main pages
   - API documentation home pages
   - Social media feed pages

## Proposed Solutions:

1. **Improved Site Detection**:
   - Implement a more robust content detection algorithm
   - Consider checking meta tags (e.g., `<meta property="og:type" content="article">`)
   - Look for article schema markup (`itemtype="http://schema.org/Article"`)

2. **More Specific URL Matching**:
   - Limit script execution to known content sites (news, blogs)
   - Add more excluded URL patterns for common non-article paths

3. **Better Content Heuristics**:
   - Assess text-to-HTML ratio in candidate content blocks
   - Check for article-specific patterns (date published, author byline, etc.)
   - Increase MIN_WORD_COUNT to a more selective threshold (e.g., 250-300)

4. **User Configuration**:
   - Add a whitelist feature where users can specify which sites should run the script
   - Implement a "training mode" where users can manually select article blocks


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: time-to-read userscript activates on non-article websites #13

Userscript

Problem:

Potential Causes:

Steps to reproduce:

Proposed Solutions:

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[BUG]: time-to-read userscript activates on non-article websites #13

Description

Userscript

Problem:

Potential Causes:

Steps to reproduce:

Proposed Solutions:

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions