A sophisticated document management system built with FastAPI that provides automated document classification, processing, and storage capabilities. The system supports various document formats, implements user authentication, and offers a RESTful API interface for document operations.
- CPU: 2 cores
- RAM: 4GB
- Storage: 10GB available space
- Python 3.8 or higher
- PostgreSQL 12 or higher
- Tesseract OCR engine
- Required OS: Windows 10/11, Ubuntu 20.04+, or macOS 10.15+
Key dependencies include:
- FastAPI
- SQLAlchemy
- PyTesseract
- python-docx
- PyPDF2
- Pillow
- scikit-learn (Full list available in requirements.txt)
python --version # Should be 3.8 or higher
sudo apt update sudo apt install python3.8 python3.8-venv python3-pip
sudo apt install postgresql postgresql-contrib
sudo apt install tesseract-ocr
-
Clone the repository: git clone [repository-url] cd DocumentManagement
-
Create and activate virtual environment:
python -m venv venv
venv\Scripts\activate
source venv/bin/activate
-
Install dependencies: pip install -r requirements.txt
-
Create environment file:
touch .env
DATABASE_URL=postgresql://user:password@localhost:5432/db_name SECRET_KEY=your_secret_key ALGORITHM=HS256 ACCESS_TOKEN_EXPIRE_MINUTES=30 TESSERACT_PATH=/path/to/tesseract # Windows example: C:\Program Files\Tesseract-OCR\tesseract.exe
- Initialize database:
psql -U postgres CREATE DATABASE db_name; \q
alembic upgrade head
- Verify installation:
python check_drivers.py
DocumentManagement/ +-- app/ ¦ +-- api/ ¦ ¦ +-- v1/ ¦ ¦ +-- api.py # API router configuration ¦ ¦ +-- documents.py # Document endpoints ¦ ¦ +-- document_types.py # Document type endpoints ¦ ¦ +-- users.py # User management endpoints ¦ +-- core/ ¦ ¦ +-- config.py # Application settings ¦ +-- db/ ¦ ¦ +-- database.py # Database configuration ¦ +-- models/ ¦ ¦ +-- models.py # Database models ¦ +-- schemas/ ¦ ¦ +-- schemas.py # Data validation schemas ¦ +-- services/ ¦ ¦ +-- document_service.py # Document processing ¦ ¦ +-- document_type_service.py # Type management ¦ ¦ +-- user_service.py # User operations ¦ +-- main.py # Application entry point +-- check_drivers.py # Installation verification +-- requirements.txt # Dependencies +-- Starting the application.txt # Startup guide
- Ensure virtual environment is activated:
venv\Scripts\activate
source venv/bin/activate
-
Start the application: uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
-
Verify the application is running:
- Access API documentation: http://localhost:8000/docs
- Access alternative API documentation: http://localhost:8000/redoc
- POST
/api/v1/users/login: User authentication - POST
/api/v1/users/register: User registration
- POST
/api/v1/documents/: Upload new document - GET
/api/v1/documents/: List documents - GET
/api/v1/documents/{id}: Get document details - PUT
/api/v1/documents/{id}: Update document - DELETE
/api/v1/documents/{id}: Delete document
- POST
/api/v1/document-types/: Create document type - GET
/api/v1/document-types/: List document types - GET
/api/v1/document-types/{id}: Get type details - PUT
/api/v1/document-types/{id}: Update type - DELETE
/api/v1/document-types/{id}: Delete type
- Database Connection Errors
services.msc # Look for PostgreSQL service
sudo systemctl status postgresql
- Tesseract OCR Issues
tesseract --version
TESSERACT_PATH=/path/to/tesseract
- Python Package Conflicts
pip uninstall -r requirements.txt -y pip install -r requirements.txt
- Application logs are stored in
logs/app.log - Database logs are in PostgreSQL's default logging location
N/A
Marco Alejandro Santiago msantiago@excelendeavormedia.com