Summary
Each scraper validates LinkedIn URLs with a simple .contains("/in/") substring check that does not validate the hostname, allowing SSRF to arbitrary hosts.
Location
- File:
src/scrapers/person.rs (and company.rs, job.rs, company_posts.rs)
- Line(s): 21
Severity
High
Details
A URL like "https://evil.com/in/victim" passes validation and causes the browser to navigate to evil.com. The scraper then constructs sub-page URLs via format!("{}/details/experience/", profile_url...), driving further authenticated requests to the same external host. Scraped data and cookies could be exposed.
Suggested Fix
Parse the URL and validate the hostname:
use url::Url;
let parsed = Url::parse(linkedin_url).map_err(|_| ScraperError::InvalidUrl(...))?;
if parsed.host_str() != Some("www.linkedin.com") && parsed.host_str() != Some("linkedin.com") {
return Err(ScraperError::InvalidUrl("URL must be on linkedin.com".to_string()));
}
Automated finding by repo-monitor
Summary
Each scraper validates LinkedIn URLs with a simple
.contains("/in/")substring check that does not validate the hostname, allowing SSRF to arbitrary hosts.Location
src/scrapers/person.rs (and company.rs, job.rs, company_posts.rs)Severity
High
Details
A URL like
"https://evil.com/in/victim"passes validation and causes the browser to navigate toevil.com. The scraper then constructs sub-page URLs viaformat!("{}/details/experience/", profile_url...), driving further authenticated requests to the same external host. Scraped data and cookies could be exposed.Suggested Fix
Parse the URL and validate the hostname:
Automated finding by repo-monitor