Skip to content

caiocrocha/SkillsRadar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔍 Skills Radar

A intelligent web application that extracts structured job data from URLs using LLMs, with smart caching and deduplication.

✨ Features

  • Drag & Drop URLs File or Single URL Input - Upload .txt files containing URLs or process individual URLs one at a time
  • Smart Caching - Cached jobs aren't scraped again unless refreshed
  • Interactive UI - In-place job card expansion with full details
  • Responsive Design - Works on desktop and mobile devices

🚀 Quick Start

Prerequisites

  • Python 3.9+
  • Playwright (for browser automation)

Local Development

  1. Clone & Setup

    git clone https://github.com/caiocrocha/SkillsRadar.git
    cd SkillsRadar
    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    pip install -r requirements.txt
  2. Configure Environment

    cp .env.example .env
    # Edit .env with your API keys
  3. Install Playwright Browsers

    playwright install
  4. Run Development Server

    uvicorn app.main:app --reload

    Visit http://localhost:8000

📁 Project Structure

SkillsRadar/
├── app/                 # FastAPI 
│   ├── main.py            
│   ├── routes.py          
│   └── settings.py        
├── scraper/             # Web
│   ├── llm.py           # Prompt and LLM Instantiation
│   ├── nodes.py         # LangGraph
│   ├── graph.py         # LangGraph
│   └── extract.py       # LLM Call and JSON Validator
├── models/              # Pydantic / JSON Schema
├── normalization/       # Data
├── pipelines/           # Processing
├── frontend/            # Web UI
│   ├── index.html         
│   ├── app.js             
│   └── styles.css         
├── data/                # Cached data
├── requirements.txt     # Python
└── README.md              

🔌 API Endpoints

Method Endpoint Description
POST /upload Upload .txt file with URLs
POST /process-urls Process URLs with cache checking
POST /refresh-with-urls Force refresh (ignore cache)
POST /clear-cache Delete all cached data

💾 Data Storage

  • Cache Location: data/cache.json
  • Structure:
    • files: Map of file hashes to URL lists
    • all_urls: Map of URLs to their job data
  • Persistence: Automatically saved after each operation

🔄 How Deduplication Works

  1. File Upload or URL Input: Backend checks if URLs exist in cache
  2. Refresh: Forces re-scraping regardless of cache
  3. Frontend: Filters duplicate URLs by source_url before display

🛠️ Configuration

Environment Variables

HF_API_KEY=your_huggingface_key

See .env.example for all options.


Built with: FastAPI • LangGraph • LLMs • Playwright • Vanilla JavaScript

About

LLM-powered pipeline for extracting and visualizing job market skill trends from web job postings.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors