A Streamlit application designed to process scanned handwritten answer sheets (PDF format), extract answers using the Qwen-VL model, merge multi-page answers, and provide AI-powered assessment and chat functionalities using Google Gemini.
(Add a screenshot of the running application here)
- PDF Upload: Upload multi-page PDF answer sheets.
- Image Conversion: Converts PDF pages to images using PyMuPDF.
- Advanced OCR: Utilizes the
Qwen/Qwen2.5-VL-7B-Instructmodel for OCR, specifically tailored to extract structured answer data (number + text) based on predefined layout rules (delimiters, number boxes). - JSON Output: OCR process generates structured JSON output per page.
- Answer Merging: Intelligently merges answer text that spans multiple pages based on "Continuation" markers identified during OCR.
- Verification Tab: Allows users to view the original image and the raw/parsed OCR output for each page.
- AI Assessment: Uses Google Gemini (
gemini-1.5-flash) to assess the quality, clarity, and coherence of the extracted answer text. - AI Chat Assistant: Provides a chat interface powered by Google Gemini (
gemini-1.5-pro) for asking questions about the extracted content or assessments. - GPU Accelerated: Leverages GPU for faster Qwen-VL model inference (
torch,accelerate). - Memory Optimization: Uses
float16precision for the Qwen model to reduce memory footprint.
- Python: 3.9+
- pip: Package installer for Python.
- Git: (Optional) For cloning the repository.
- NVIDIA GPU: Required for running the Qwen-VL model efficiently.
- CUDA Toolkit & cuDNN: Compatible versions installed for your NVIDIA driver and PyTorch.
- Google Gemini API Key: You need an API key from Google AI Studio.
-
Clone the Repository (Optional):
git clone <your-repo-url> cd <your-repo-directory>
Alternatively, just place
main.pyandrequirements.txtin a directory. -
Create a Virtual Environment (Recommended):
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install Dependencies:
pip install -r requirements.txt
Note: Installing PyTorch might take time and depends on your CUDA setup. Ensure you have a compatible version.
-
Configure API Key:
⚠️ Security Warning: The current code hardcodes the Google Gemini API key. This is highly insecure for shared or deployed applications.- Recommended Method (Streamlit Secrets):
- Create a directory
.streamlitin your project folder. - Inside
.streamlit, create a file namedsecrets.toml. - Add your API key to
secrets.toml:# .streamlit/secrets.toml GOOGLE_API_KEY="AIzaSy..."
- Modify
main.pyto load the key usingst.secrets:# Replace the hardcoded key section in main.py try: # Attempt to load from secrets first api_key = st.secrets["GOOGLE_API_KEY"] except Exception: # Fallback or error (remove hardcoded fallback for production) st.error("Google API Key not found in Streamlit secrets (/.streamlit/secrets.toml)") api_key = None # Or use the hardcoded one for local testing ONLY if necessary if api_key: try: genai.configure(api_key=api_key) genai.list_models() # Test configuration st.session_state.api_key_configured = True except Exception as e: st.error(f"Gemini API configuration failed: {e}", icon="❌") st.session_state.api_key_configured = False else: st.session_state.api_key_configured = False # Remove the global HARDCODED_API_KEY variable and its usage
- Create a directory
- Alternative (Environment Variables): Set an environment variable
GOOGLE_API_KEYand load it in Python usingos.getenv("GOOGLE_API_KEY").
- Ensure your virtual environment is activated.
- Make sure the API key is configured (preferably using secrets).
- Run the Streamlit app:
streamlit run main.py
- Open your web browser and navigate to the local URL provided by Streamlit (usually
http://localhost:8501).
You can run this application inside a Docker container, leveraging GPU acceleration via the NVIDIA Container Toolkit.
- Docker: Install Docker Desktop or Docker Engine.
- NVIDIA Container Toolkit: Install this to enable GPU access within Docker containers. Installation Guide
docker build -t uniarch-ocr-assessor .-
Using Streamlit Secrets: Mount your
.streamlitdirectory into the container.docker run --gpus all -p 8501:8501 \ -v ./.streamlit:/app/.streamlit \ uniarch-ocr-assessor
-
Using Environment Variables: Pass the API key as an environment variable.
docker run --gpus all -p 8501:8501 \ -e GOOGLE_API_KEY="AIzaSy..." \ uniarch-ocr-assessor(Remember to modify
main.pyto read the key fromos.getenv("GOOGLE_API_KEY")if using this method).
Access the application at http://localhost:8501 in your browser.
- Streamlit: Web application framework.
- Qwen-VL (Transformers): Vision-Language Model for OCR.
- Google Gemini (google-generativeai): AI model for assessment and chat.
- PyTorch: Deep learning framework (backend for Transformers).
- PyMuPDF (fitz): PDF parsing and image conversion.
- Pillow (PIL): Image manipulation.
- API Keys: Google Gemini API key (handle securely!).
- Models:
- OCR:
Qwen/Qwen2.5-VL-7B-Instruct - Assessment:
gemini-1.5-flash - Chat:
gemini-1.5-pro - These are hardcoded in
main.pybut could be made configurable.
- OCR: