Skip to content

[repo-monitor] High: URL validation is substring-only — SSRF to arbitrary hosts #2

@Liohtml

Description

@Liohtml

Summary

Each scraper validates LinkedIn URLs with a simple .contains("/in/") substring check that does not validate the hostname, allowing SSRF to arbitrary hosts.

Location

  • File: src/scrapers/person.rs (and company.rs, job.rs, company_posts.rs)
  • Line(s): 21

Severity

High

Details

A URL like "https://evil.com/in/victim" passes validation and causes the browser to navigate to evil.com. The scraper then constructs sub-page URLs via format!("{}/details/experience/", profile_url...), driving further authenticated requests to the same external host. Scraped data and cookies could be exposed.

Suggested Fix

Parse the URL and validate the hostname:

use url::Url;
let parsed = Url::parse(linkedin_url).map_err(|_| ScraperError::InvalidUrl(...))?;
if parsed.host_str() != Some("www.linkedin.com") && parsed.host_str() != Some("linkedin.com") {
    return Err(ScraperError::InvalidUrl("URL must be on linkedin.com".to_string()));
}

Automated finding by repo-monitor

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions