Skip to content

qloo/reddit-ner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

reddit-ner

Extract named entities from Reddit posts using GLiNER (zero-shot NER) and match them against the Qloo API.

Features

  • Fetches posts from any public subreddit (no API credentials required)
  • Extracts entities using GLiNER zero-shot NER (no fine-tuning needed)
  • Entity types aligned with Qloo's taxonomy: artist, brand, movie, place, etc.
  • Deduplicates and aggregates entities across posts
  • Optional Qloo API lookup to match entities to Qloo's database
  • Outputs as formatted table or JSON

Installation

Requires uv and Python 3.11+.

git clone https://github.com/Qloo/reddit-ner
cd reddit-ner
uv sync

Usage

# Basic usage - analyze r/technology
uv run reddit-ner technology

# Specify number of posts
uv run reddit-ner music -n 20

# JSON output
uv run reddit-ner gaming -f json

# Different sort order
uv run reddit-ner news -s new

# Lower threshold for more entities
uv run reddit-ner movies -t 0.3

# Look up entities in Qloo API
uv run reddit-ner technology --qloo

Options

reddit-ner <subreddit> [OPTIONS]

Arguments:
  subreddit             Subreddit to analyze (without r/ prefix)

Options:
  -n, --limit           Number of posts to fetch (default: 10)
  -f, --format          Output format: table or json (default: table)
  -t, --threshold       Confidence threshold 0-1 (default: 0.5)
  -s, --sort            Sort order: hot, new, top, rising (default: hot)
  --qloo                Look up discovered entities in Qloo API
  -h, --help            Show help

Example Output

$ uv run reddit-ner technology -n 5

Subreddit: r/technology
Posts analyzed: 5

          Entities Found
┏━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━┓
┃ TYPE    ┃ ENTITY       ┃ COUNT ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━┩
│ brand   │ TikTok       │     3 │
│ person  │ Elon Musk    │     2 │
│ place   │ California   │     1 │
└─────────┴──────────────┴───────┘

Total entities: 6

Qloo API Integration

With the --qloo flag, entities are matched against Qloo's database:

$ uv run reddit-ner technology --qloo

...

              Qloo API Matches
┏━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━┓
┃ STATUS ┃ QUERY      ┃ TYPE  ┃ QLOO MATCH   ┃ ID          ┃  POP ┃
┡━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━┩
│   ✓    │ TikTok     │ brand │ TikTok       │ 98D268D6... │ 100% │
│   ✓    │ Elon Musk  │ person│ Elon Musk    │ A1B2C3D4... │  99% │
│   –    │ California │ place │ No match     │             │      │
└────────┴────────────┴───────┴──────────────┴─────────────┴──────┘

Qloo matches: 2/3

To use the Qloo API, set your API key in .env:

cp .env.example .env
# Edit .env and add your QLOO_API_KEY

Entity Types

Entity types are aligned with Qloo's taxonomy:

  • artist - Taylor Swift, Kendrick Lamar
  • book - The Great Gatsby, Dune
  • brand - Nike, Apple, Samsung
  • destination - Paris, Tokyo, New York
  • movie - Inception, The Godfather
  • person - Elon Musk, Oprah
  • place - Chipotle, Marriott, Central Park
  • podcast - Serial, Joe Rogan Experience
  • tv show - Breaking Bad, The Office
  • video game - Minecraft, Zelda

Development

# Install with dev dependencies
uv sync --extra dev

# Run tests
uv run pytest tests/ -v

# Run a specific test
uv run pytest tests/test_ner_engine.py -v

How It Works

  1. Fetches posts from Reddit's public JSON API (reddit.com/r/{sub}.json)
  2. Cleans Reddit markdown formatting
  3. Runs GLiNER zero-shot NER to extract entities
  4. Aggregates and deduplicates across posts
  5. Optionally looks up entities in Qloo API
  6. Displays results

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages