Skip to content

SrishtiGautam/PDF-QA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF-QA

PDF Query and Slack Integration

This project is a Python-based tool that processes PDF documents, extracts relevant information, and posts responses to a specified Slack channel using OpenAI's language models. It employs natural language processing techniques to answer user queries based on the contents of the PDF.

Demo

Features

  • PDF Processing: Extracts text from PDF documents and splits it into manageable chunks.
  • Natural Language Queries: Users can ask questions related to the content of the PDF.
  • OpenAI Integration: Utilizes OpenAI's models for generating responses and embeddings.
  • Confidence Handling: Implements logic to handle low-confidence responses.
  • Exact Match Response: Returns exact matches from the PDF when queries match exactly, using greedy strategy of token generation.
  • Slack Notifications: Posts responses directly to a specified Slack channel.
  • Error Handling and Logging: Includes robust error handling, retry logic, and detailed logging.

Requirements

  • Python 3.x
  • Libraries:
    • openai
    • slack_sdk
    • sklearn
    • PyPDF2

You can install the required libraries using:

pip install -r requirements.txt

Configuration

Before running the application, make sure to configure the following parameters in your configuration file or command line arguments:

  • pdf_path: Path to the PDF document to process.
  • questions: Comma-separated list of questions to ask.
  • api_key: Your OpenAI API key.
  • slack_token: Slack API token for sending messages.
  • slack_channel: Slack channel ID to post the messages.
  • model:optional: Model to use for generating responses (default=gpt-4o-mini).
  • embed:optional: Whether to use embeddings for pdf chunks for faster and cost-efficient retrieval using cosine-similarity (default=true).
  • embed_model:optional: Embedding model to use (default=text-embedding-3-small).
  • chunk_size:optional: Size of each chunk when splitting the PDF (default=500).
  • chunk_overlap:optional: Number of overlapping characters between chunks (default=100).
  • confidence_threshold:optional: Confidence threshold for openapi responses (default=-1.5, can be fine-tuned).

Usage

  1. Clone the repository:

    git clone https://github.com/yourusername/PDF-QA.git
    cd PDF-QA
  2. Run the script with the desired parameters:

    python main.py --questions "Comma-separated list of questions here" --pdf_path "path/to/pdf"

Logging

Logs are recorded both in the console and in a log file. Ensure that the logging level is set according to your needs for debugging or monitoring in main.py.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages