|
| 1 | +# DocPilot |
| 2 | + |
| 3 | +**AI-powered contract review and extraction platform.** |
| 4 | + |
| 5 | + |
| 6 | + |
| 7 | +[](https://nextjs.org/) |
| 8 | +[](https://fastapi.tiangolo.com/) |
| 9 | +[](https://www.postgresql.org/) |
| 10 | +[](https://redis.io/) |
| 11 | +[](https://docs.celeryq.dev/) |
| 12 | +[](https://openai.com/) |
| 13 | +[](https://docs.docker.com/compose/) |
| 14 | +[](https://www.typescriptlang.org/) |
| 15 | +[](https://github.com/your-username/docpilot/actions) |
| 16 | +[](LICENSE) |
| 17 | + |
| 18 | +## What It Does |
| 19 | + |
| 20 | +Upload PDF contracts (NDAs, service agreements, employment contracts) and DocPilot automatically classifies the document type, extracts key fields like parties, dates, and payment terms using GPT-4o, and flags risky clauses with plain-English explanations. Compare two contracts side-by-side to spot differences at a glance. |
| 21 | + |
| 22 | +## Key Features |
| 23 | + |
| 24 | +- **AI-Powered Extraction** — Classifies document type and extracts structured fields using GPT-4o with JSON mode and Pydantic validation |
| 25 | +- **Risk Analysis** — Identifies risky clauses (non-compete, liability, termination) with risk levels and plain-English explanations |
| 26 | +- **Contract Comparison** — Side-by-side field-level diff between any two contracts with match/difference/missing indicators |
| 27 | +- **Real-Time Progress** — Server-Sent Events stream processing updates to the browser as each pipeline step completes |
| 28 | +- **Team Workspaces** — Multi-tenant architecture with role-based access (owner, admin, member) and team invitations |
| 29 | +- **Async Pipeline** — Celery workers process documents in the background so the API stays responsive |
| 30 | + |
| 31 | +## Architecture |
| 32 | + |
| 33 | +```mermaid |
| 34 | +graph LR |
| 35 | + Browser -->|REST API| Next["Next.js Frontend"] |
| 36 | + Next -->|HTTP| API["FastAPI"] |
| 37 | + API -->|async queries| DB[(PostgreSQL)] |
| 38 | + API -->|dispatch task| Worker["Celery Worker"] |
| 39 | + Worker -->|structured output| OpenAI["OpenAI GPT-4o"] |
| 40 | + Worker -->|read/write| DB |
| 41 | + Worker -->|publish progress| Redis[(Redis)] |
| 42 | + API -->|SSE stream| Redis |
| 43 | +``` |
| 44 | + |
| 45 | +## Tech Stack |
| 46 | + |
| 47 | +| Layer | Technology | |
| 48 | +|-------|-----------| |
| 49 | +| **Frontend** | Next.js 15, TypeScript, Tailwind CSS, shadcn/ui | |
| 50 | +| **Backend** | Python 3.12, FastAPI, SQLAlchemy (async), Alembic | |
| 51 | +| **AI** | OpenAI GPT-4o, structured JSON output, Pydantic validation | |
| 52 | +| **Queue** | Celery, Redis | |
| 53 | +| **Database** | PostgreSQL 16 | |
| 54 | +| **DevOps** | Docker Compose, GitHub Actions CI | |
| 55 | + |
| 56 | +## Getting Started |
| 57 | + |
| 58 | +### Prerequisites |
| 59 | + |
| 60 | +- [Docker](https://docs.docker.com/get-docker/) and Docker Compose |
| 61 | +- [Node.js 20+](https://nodejs.org/) (for frontend development) |
| 62 | +- An [OpenAI API key](https://platform.openai.com/api-keys) |
| 63 | + |
| 64 | +### Setup |
| 65 | + |
| 66 | +```bash |
| 67 | +# Clone the repo |
| 68 | +git clone https://github.com/your-username/docpilot.git |
| 69 | +cd docpilot |
| 70 | + |
| 71 | +# Copy env file and add your OpenAI API key |
| 72 | +cp .env.example .env |
| 73 | +# Edit .env and set OPENAI_API_KEY=sk-your-key-here |
| 74 | + |
| 75 | +# Start the backend services (API + worker + Postgres + Redis) |
| 76 | +docker compose up -d |
| 77 | + |
| 78 | +# Run database migrations |
| 79 | +docker compose run --rm api alembic upgrade head |
| 80 | + |
| 81 | +# Install frontend dependencies and start the dev server |
| 82 | +cd apps/web |
| 83 | +npm install |
| 84 | +npm run dev |
| 85 | +``` |
| 86 | + |
| 87 | +Open [http://localhost:3000](http://localhost:3000) and create an account to get started. |
| 88 | + |
| 89 | +> **Note:** The API runs on port 8001 by default. Swagger docs are available at [http://localhost:8001/docs](http://localhost:8001/docs). |
| 90 | +
|
| 91 | +## Project Structure |
| 92 | + |
| 93 | +``` |
| 94 | +docpilot/ |
| 95 | +├── apps/ |
| 96 | +│ ├── api/ # FastAPI backend |
| 97 | +│ │ ├── app/ |
| 98 | +│ │ │ ├── main.py # App entry point + CORS |
| 99 | +│ │ │ ├── config.py # Pydantic Settings (env vars) |
| 100 | +│ │ │ ├── database.py # Async SQLAlchemy engine |
| 101 | +│ │ │ ├── models/ # ORM models (User, Document, Extraction, ...) |
| 102 | +│ │ │ ├── schemas/ # Pydantic request/response schemas |
| 103 | +│ │ │ ├── routers/ # API endpoints (auth, documents, compare, teams) |
| 104 | +│ │ │ ├── services/ # Business logic (extraction pipeline, compare) |
| 105 | +│ │ │ ├── tasks/ # Celery background tasks |
| 106 | +│ │ │ ├── prompts/ # LLM prompt templates per doc type |
| 107 | +│ │ │ └── utils/ # PDF parser, text chunker, LLM client |
| 108 | +│ │ ├── alembic/ # Database migrations |
| 109 | +│ │ ├── Dockerfile |
| 110 | +│ │ └── requirements.txt |
| 111 | +│ │ |
| 112 | +│ └── web/ # Next.js frontend |
| 113 | +│ ├── src/ |
| 114 | +│ │ ├── app/ # App Router pages |
| 115 | +│ │ ├── components/ # UI components (shadcn + custom) |
| 116 | +│ │ ├── hooks/ # React hooks (SSE, documents, auth) |
| 117 | +│ │ ├── lib/ # API client, auth utils |
| 118 | +│ │ └── types/ # TypeScript interfaces |
| 119 | +│ ├── Dockerfile |
| 120 | +│ └── package.json |
| 121 | +│ |
| 122 | +├── docker-compose.yml # Local dev (with hot reload) |
| 123 | +├── docker-compose.prod.yml # Production |
| 124 | +├── .github/workflows/ci.yml # CI pipeline |
| 125 | +└── .env.example # Environment variables template |
| 126 | +``` |
| 127 | + |
| 128 | +## API Documentation |
| 129 | + |
| 130 | +With the backend running, interactive API docs are available at: |
| 131 | + |
| 132 | +- **Swagger UI:** [http://localhost:8001/docs](http://localhost:8001/docs) |
| 133 | +- **ReDoc:** [http://localhost:8001/redoc](http://localhost:8001/redoc) |
| 134 | + |
| 135 | +## Resume Bullets |
| 136 | + |
| 137 | +> For anyone using this as a portfolio piece — here's how to talk about it: |
| 138 | +
|
| 139 | +- Built an AI-powered contract review platform that extracts key fields and flags risky clauses using GPT-4o with structured JSON output and Pydantic validation |
| 140 | +- Designed an async document processing pipeline using Celery and Redis with real-time progress streaming via Server-Sent Events |
| 141 | +- Implemented side-by-side contract comparison with field-level diffing across different contract types |
| 142 | +- Built a multi-tenant architecture with JWT authentication, role-based access control, and team workspaces |
| 143 | + |
| 144 | +## License |
| 145 | + |
| 146 | +[MIT](LICENSE) |
0 commit comments