Skip to content

ScholarLensAI/scholarlensAI-BE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scholarlensAI-BE

Research Paper Reading Assistant Backend API Server

Overview

  • Utilizes Upstage Document Parse and Solar LLM to parse, summarize, translate, answer questions, and generate highlights for research papers.
  • Defines HTTP endpoints in the routers layer and handles Upstage API integration and domain logic in the services layer.

Tech Stack

Category Technology
Framework FastAPI
Server Uvicorn (ASGI)
Validation Pydantic v2
AI/ML Upstage Document Parse, Solar LLM
Deployment Docker

Key Features

1. Document Processing (DocumentParserService)

  • PDF Upload and Parsing: High-precision PDF parsing using Upstage Document Parse API
  • Structured Data Extraction: Document structure analysis in HTML and Text formats, automatic document section recognition (Introduction, Methods, Results, etc.)
  • Large Document Support: Maximum 50MB, automatic selection between synchronous (100 pages) and asynchronous (1000 pages) methods

2. AI-Based Analysis

  • Automatic Section Summaries: Recognition and summarization of Introduction, Methods, Results, Discussion, Conclusion
  • Interactive Q&A: Solar LLM-based paper content Q&A
  • Key Point Extraction: Automatic extraction of main content from each section
  • Automatic Highlighting: AI automatically detects and highlights key sentences and important content in papers
  • Section Importance Analysis: Automatic identification of the most important parts in each section

Quick Start

Requirements

1. Installation

# Clone repository
git clone https://github.com/ScholarLensAI/scholarlensAI-BE.git
cd scholarlensAI-BE

2. Environment Variable Setup

Create .env file:

cp .env.example .env

.env file example: Replace UPSTAGE_API_KEY and other values with actual values

UPSTAGE_API_KEY=up_your_api_key_here
UPSTAGE_BASE_URL=https://api.upstage.ai/v1
LOG_LEVEL=INFO
DEBUG=False
Environment Variable Required Default Description
UPSTAGE_API_KEY - Upstage API Key
UPSTAGE_BASE_URL https://api.upstage.ai/v1 API Base URL
LOG_LEVEL INFO Log Level (DEBUG, INFO, WARNING, ERROR)
DEBUG False FastAPI Debug Mode

⚠️ Security Notice: Never commit API keys to code repository. Manage them using .env files or environment variables.

3. Run Server

📌 After running, you can access the API at:

  • Backend → http://localhost:8000
  • Swagger UI → http://localhost:8000/docs

Option 1: Docker-based Execution (Recommended)

With only Docker installed, you can run immediately without additional setup.

# Build image
docker build -t scholarlens-backend .

# Run container
docker run \
  --name scholarlens \
  -p 8000:8000 \
  -e UPSTAGE_API_KEY="your_api_key_here" \
  -e UPSTAGE_BASE_URL="https://api.upstage.ai/v1" \
  -e LOG_LEVEL="INFO" \
  -e DEBUG="0" \
  scholarlens-backend

Option 2: Local Development Environment

Use this for FastAPI development and debugging.

# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Create .env file and set API key
cp .env.example .env
# Enter UPSTAGE_API_KEY in .env file

# Run server (choose one)
python3 main.py                                    # Basic execution
uvicorn main:app --reload --host 0.0.0.0 --port 8000  # Hot reload

Project Structure

scholarlensAI-BE/
├── models/
│   └── schemas.py                      # Pydantic schema definitions
│
├── routers/                            # API endpoints
│   ├── chat.py                         # Chat API
│   ├── highlights.py                   # Highlights API
│   ├── summary.py                      # Summary API
│   └── translation.py                  # Translation API
│
├── services/                           # Business logic (Service layer)
│   ├── document_service.py             # Document processing service
│   ├── heading_config.py
│   ├── highlight_service.py            # Highlight service
│   ├── summary_service.py              # Summary service
│   └── upstage_service.py              # Upstage API client
│
├── utils/                              # Utility functions
│   └── helpers.py                      # Common helper functions
│
├── main.py                             # FastAPI application
├── config.py                           # Configuration and environment variable management
├── requirements.txt                    # Python dependencies
├── test.py                             # Endpoint testing script
├── Dockerfile                          # Docker image configuration
├── .env.example                        # Environment variable template
└── .gitignore

API Endpoint Summary

This project follows RESTful style and is organized into summary/translation/chat/highlight domains by functionality.

Method Path Description
GET / Server status/documentation link
GET /health Health check and API key presence indicator
POST /api/summary/upload Upload PDF and start parsing (returns document ID)
GET /api/summary/sections/{document_id} Query parsed section/heading index
GET /api/summary/generate/{document_id} Generate paper section summaries
POST /api/summary/section Summarize specific section (document_id, section_name form data)
POST /api/translation/translate Translate text or section (JSON: text or document_id+section)
POST /api/translation/translate-section Section translation based on query parameters
GET /api/translation/languages List supported languages
POST /api/chat/message Document context-based Q&A
GET /api/chat/history/{document_id} (Placeholder) Query chat history
DELETE /api/chat/history/{document_id} (Placeholder) Delete chat history
GET /api/highlights/{document_id} Query highlight areas in document

Request/Response Examples

Document Upload

curl -F "file=@paper.pdf" http://localhost:8000/api/summary/upload

Section Summary

curl -X POST \
  -F "document_id=..." \
  -F "section_name=Introduction" \
  http://localhost:8000/api/summary/section

Translation

curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello", "source_language": "en", "target_language": "ko"}' \
  http://localhost:8000/api/translation/translate

Testing

Automated Testing (test.py)

test.py is a script that sequentially tests all major endpoints

  • Calls major endpoints in order and outputs JSON results to console
  • Server must be running (uvicorn main:app --reload)
  • Terminates immediately on request/response errors

Usage

Option Description Example
--base-url API server address (default: http://localhost:8000) --base-url http://192.168.1.100:8000
--pdf PDF file path to test --pdf ./paper.pdf
--document-id Reuse existing document ID --document-id 550e8400-...

Execution Examples

# 1. Health check only (verify server connection)
python3 test.py

# 2. Upload PDF on local server and test all endpoints
python3 test.py --pdf path/to/paper.pdf

# 3. Upload PDF on remote server and test all endpoints
python test.py --base-url http://localhost:8000 --pdf path/to/paper.pdf

# 4. Test with existing document_id (skip upload)
python3 test.py --document-id <doc_id> --base-url http://localhost:8000

When using upload mode, verify file path is correct first. Terminates immediately if path is invalid.

Test Flow

  1. /health - Verify server status
  2. POST /api/summary/upload - Upload PDF (with --pdf option)
  3. GET /api/summary/sections/{id} - Query sections
  4. GET /api/summary/generate/{id} - Full summary
  5. POST /api/summary/section - Section summary
  6. POST /api/translation/translate - Text translation
  7. POST /api/translation/translate-section - Section translation
  8. POST /api/chat/message - Q&A
  9. GET /api/highlights/{id} - Highlights

Manual Verification Points

  • Check Swagger UI at http://localhost:8000/docs
  • After upload, verify /api/summary/sections/{document_id} response includes sections/headings
  • Verify no errors when calling translation/chat/highlights

Troubleshooting

  • API Key Error: Verify correct UPSTAGE_API_KEY is set in .env
  • Upload Failure: Check PDF extension and file size is under 50MB
  • Parsing Delay: Large documents may take time for asynchronous parsing
  • CORS Issues: Check CORS_ORIGINS settings in config.py and restart

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors