A intelligent web application that extracts structured job data from URLs using LLMs, with smart caching and deduplication.
- Drag & Drop URLs File or Single URL Input - Upload
.txtfiles containing URLs or process individual URLs one at a time - Smart Caching - Cached jobs aren't scraped again unless refreshed
- Interactive UI - In-place job card expansion with full details
- Responsive Design - Works on desktop and mobile devices
- Python 3.9+
- Playwright (for browser automation)
-
Clone & Setup
git clone https://github.com/caiocrocha/SkillsRadar.git cd SkillsRadar python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate pip install -r requirements.txt
-
Configure Environment
cp .env.example .env # Edit .env with your API keys -
Install Playwright Browsers
playwright install
-
Run Development Server
uvicorn app.main:app --reload
Visit
http://localhost:8000
SkillsRadar/
├── app/ # FastAPI
│ ├── main.py
│ ├── routes.py
│ └── settings.py
├── scraper/ # Web
│ ├── llm.py # Prompt and LLM Instantiation
│ ├── nodes.py # LangGraph
│ ├── graph.py # LangGraph
│ └── extract.py # LLM Call and JSON Validator
├── models/ # Pydantic / JSON Schema
├── normalization/ # Data
├── pipelines/ # Processing
├── frontend/ # Web UI
│ ├── index.html
│ ├── app.js
│ └── styles.css
├── data/ # Cached data
├── requirements.txt # Python
└── README.md
| Method | Endpoint | Description |
|---|---|---|
| POST | /upload |
Upload .txt file with URLs |
| POST | /process-urls |
Process URLs with cache checking |
| POST | /refresh-with-urls |
Force refresh (ignore cache) |
| POST | /clear-cache |
Delete all cached data |
- Cache Location:
data/cache.json - Structure:
files: Map of file hashes to URL listsall_urls: Map of URLs to their job data
- Persistence: Automatically saved after each operation
- File Upload or URL Input: Backend checks if URLs exist in cache
- Refresh: Forces re-scraping regardless of cache
- Frontend: Filters duplicate URLs by
source_urlbefore display
HF_API_KEY=your_huggingface_key
See .env.example for all options.
Built with: FastAPI • LangGraph • LLMs • Playwright • Vanilla JavaScript