Warning
UNDER DEVELOPMENT: This project is in a state of constant evolution. Some features might be missing or under refactoring. If you encounter any issues or missing documentation, it is likely due to the ongoing development process.
This is the evolution of my previous Amazon scraper. It has transitioned from a simple script into a robust Backend API built with FastAPI. The system now features a professional data persistence layer using PostgreSQL, allowing for historical price tracking and structured data management.
By using SQLAlchemy (ORM), the application maps web-scraped data directly into a relational database, providing a scalable foundation for any future frontend or mobile application.
From a single-file script to a Layered Architecture:
- API Layer (FastAPI): Handles requests and orchestrates the workflow.
- Worker Layer (Selenium): Automates the browser to collect real-time data.
- Persistence Layer (PostgreSQL): Safely stores every search result for historical analysis.
- ORM Layer (SQLAlchemy): Bridges the gap between Python objects and SQL tables.
- FastAPI: High-performance web framework for building APIs.
- Uvicorn: ASGI server for production-ready performance.
- Dependency Injection: Used for managing database sessions efficiently.
- PostgreSQL: Professional-grade relational database.
- SQLAlchemy: Powerful ORM for database migrations and queries.
- Relational Mapping: Structured tables for products, terms, and timestamps.
- Selenium WebDriver: Advanced browser automation.
- Smart Waiting: Implemented
WebDriverWaitto handle Amazon's dynamic loading. - Data Normalization: Automatic cleaning of currency strings into
Floatvalues.
├── main.py # API Routes and App Initialization
├── models.py # Database Tables Definitions (SQLAlchemy)
├── database.py # Connection and Session Configuration
├── crud.py # Create, Read, Update, Delete Logic
├── scrap.py # Selenium Scraper Class
└── requirements.txt # Project Dependencies
Developed by Vinicius Santos – Tech