This project represents an intelligent agent system capable of crawling the web, scraping specific content (including dynamic pages like eMAG), applying semantic filters, and optionally interacting with AI-based tools for data processing or enrichment. The architecture is modular, separating crawling, scraping, decision logic, and tool execution.
- Language: Python 3
- Core Libraries:
asyncio,aiohttp,json,re,uuid,os,redis,playwright - Main Components:
agent/: Logic for decision-making and intelligent behaviorcrawler/: Responsible for both static and dynamic web scraping + Wikipedia crawlingmain.py: Entry point that integrates agent + crawler + tool functions
Agent/
├── agent/
│ ├── __init__.py
│ ├── agent.py # The intelligent agent core
│ ├── tool_funcs.py # Tool functions the agent can invoke
│ └── tools.py # Mapping of tools
├── crawler/
│ ├── __init__.py
│ ├── crawl.py # Wikipedia crawler with depth control + Redis URL caching
│ ├── scrap.py # Web scraping for weather and structured data
│ └── dynamic_scraper.py # Playwright-powered dynamic content scraper (eMAG support)
├── data/
│ ├── filter.json # Subject keywords for relevance filtering
│ ├── linksJson.json # Crawled Wikipedia links
│ ├── product.json # Result storage from scraping
│ └── test.json # Output for debug/test
└── main.py # Main integration script
- Loads and initializes the agent.
- Sends an initial prompt to the agent (
"weather in London"by default). - The agent parses the prompt, checks if a tool is needed, and calls it.
- Core LLM-based agent logic.
- Chooses tools using a mapping (
toolsdict). - Executes tool functions and formats the results as agent responses.
Callable tools include:
crawlSubjectFunc(subject: str, depth: int)– Wikipedia crawler with Redis caching.scrapWeatherFunc(location: str)– Web scraper for weather.calculateFunc(expression: str)– Mathematical evaluator.scrapeDynamicProduct(link: str)– Uses Playwright to extract product name, price, and specs from eMAG.
- Static scraping using aiohttp and regex/html parsing.
- Designed for structured info like weather
- Controls crawling depth and applies filters from
filter.json. - Uses Redis to cache visited URLs, preventing redundant work and improving performance.
- Dynamic headless browser-based scraping using
Playwright. - Extracts live data (name, price, description) from pages like eMAG.
- Supports product scraping for dynamic content, including
div-rendered prices or JS-loaded specs.
filter.json: Semantic filtering keywords.linksJson.json: Crawled URLs.product.json: Scraped data from product pages (eMAG etc).test.json: Output testing.
# Clone the repository
git clone https://github.com/AMihneaa/Agent.git
cd Agent
# Install dependencies
pip install aiohttp redis playwright
# For Playwright (first-time setup)
playwright install
# Run the agent
python main.pySample prompts in main.py:
run_agent("calculate 2 + 2")
run_agent("scrap weather in Bucharest")
run_agent("crawl subject Artificial Intelligence depth 2")
run_agent("scrap emag for top 10 laptops with 16 GB Ram and Intel i7")- The agent parses user prompts and checks for tool calls.
- If tool needed, it dynamically invokes Python functions (e.g., scrape weather, crawl Wikipedia).
- The crawler uses Redis to store and check previously visited links → reduces redundancy.
- Playwright is used for dynamic page scraping, useful for modern e-commerce websites.
- Add tool selection via a local LLM (e.g., DeepSeek or OpenAI via API).
- Automatic product classification based on content.
- Integration with vector databases for semantic memory.
- Embed product summaries using LLM.
This project is licensed under the MIT License. See the LICENSE file for details.
Built by Mihnea – for intelligent automation and web interaction with Python.