PDF-QA

PDF Query and Slack Integration

This project is a Python-based tool that processes PDF documents, extracts relevant information, and posts responses to a specified Slack channel using OpenAI's language models. It employs natural language processing techniques to answer user queries based on the contents of the PDF.

Demo

Features

PDF Processing: Extracts text from PDF documents and splits it into manageable chunks.
Natural Language Queries: Users can ask questions related to the content of the PDF.
OpenAI Integration: Utilizes OpenAI's models for generating responses and embeddings.
Confidence Handling: Implements logic to handle low-confidence responses.
Exact Match Response: Returns exact matches from the PDF when queries match exactly, using greedy strategy of token generation.
Slack Notifications: Posts responses directly to a specified Slack channel.
Error Handling and Logging: Includes robust error handling, retry logic, and detailed logging.

Requirements

Python 3.x
Libraries:
- openai
- slack_sdk
- sklearn
- PyPDF2

You can install the required libraries using:

pip install -r requirements.txt

Configuration

Before running the application, make sure to configure the following parameters in your configuration file or command line arguments:

pdf_path: Path to the PDF document to process.
questions: Comma-separated list of questions to ask.
api_key: Your OpenAI API key.
slack_token: Slack API token for sending messages.
slack_channel: Slack channel ID to post the messages.
model:optional: Model to use for generating responses (default=gpt-4o-mini).
embed:optional: Whether to use embeddings for pdf chunks for faster and cost-efficient retrieval using cosine-similarity (default=true).
embed_model:optional: Embedding model to use (default=text-embedding-3-small).
chunk_size:optional: Size of each chunk when splitting the PDF (default=500).
chunk_overlap:optional: Number of overlapping characters between chunks (default=100).
confidence_threshold:optional: Confidence threshold for openapi responses (default=-1.5, can be fine-tuned).

Usage

Clone the repository:

git clone https://github.com/yourusername/PDF-QA.git
cd PDF-QA

Run the script with the desired parameters:

python main.py --questions "Comma-separated list of questions here" --pdf_path "path/to/pdf"

Logging

Logs are recorded both in the console and in a log file. Ensure that the logging level is set according to your needs for debugging or monitoring in main.py.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
agents		agents
model		model
utils		utils
README.md		README.md
config.py		config.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF-QA

PDF Query and Slack Integration

Demo

Features

Requirements

Configuration

Usage

Logging

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PDF-QA

PDF Query and Slack Integration

Demo

Features

Requirements

Configuration

Usage

Logging

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages