Skip to content

[repo-monitor] Medium: Interest categories systematically mislabeled — off-by-one in tab sequence #4

@Liohtml

Description

@Liohtml

Summary

In extract_interests, the JS scraper runs before the correct tab is activated, causing every interest record to have the category of the previous iteration's tab.

Location

  • File: src/scrapers/person.rs
  • Line(s): 434–498

Severity

Medium

Details

The loop iterates ["companies", "groups", "schools", ...]. In each iteration it: (1) runs JS scraper on the currently-active tab, (2) clicks the corresponding tab for the next iteration. Result: items from the "companies" tab are labeled "groups", "groups" items are labeled "schools", etc. The first iteration's category is undefined.

Suggested Fix

Reverse the order — click the tab first, sleep, then scrape:

for category in categories {
    // 1. click tab for this category
    // 2. sleep for page to load
    // 3. run JS scraper and label results with this category
}

Automated finding by repo-monitor

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions