Skip to content

Hilo-Hilo/DeepSeek-OCR-Web

 
 

Repository files navigation

DeepSeek-OCR Studio

Project Overview

This project is a multimodal document parsing tool based on DeepSeek-OCR with Next.js frontend and FastAPI backend.

Designed for Nvidia DGX Spark: This repository is optimized for the Nvidia DGX Spark environment. Docker builds use the NVIDIA NGC PyTorch 25.12 base image (PyTorch built with CUDA 13.1). On GB10 (sm_121) you may still see a PyTorch warning about supported CUDA capability; this is expected and does not necessarily mean GPU inference will fail.

DeepSeek OCR Demo

This tool can efficiently process PDF documents and images, providing powerful Optical Character Recognition (OCR) capabilities, supporting multi-language text recognition, table parsing, chart analysis, and many other features.

Key Features

  • Multi-format Document Parsing: Supports uploading and parsing documents in various formats such as PDF and images
  • Intelligent OCR Recognition: Based on the DeepSeek-OCR model, providing high-precision text recognition
  • Layout Analysis: Intelligently recognizes document layout structure and accurately extracts content layout
  • Multi-language Support: Supports text recognition in multiple languages including Chinese and English
  • Table & Chart Parsing: Professional table recognition and chart data extraction functionality
  • Professional Domain Drawing Recognition: Supports semantic recognition of various professional domain drawings
  • Data Visualization: Supports reverse parsing of data analysis visualization charts
  • Markdown Conversion: Converts PDF content to structured Markdown format

Project Demo

PDF Document Parsing - Supports complex content including images and tables

Document Parsing
Multi-language Text Parsing Chart & Table Parsing
Multi-language Text Parsing Chart & Table Parsing
Professional Domain Drawing Recognition
(CAD, Flowcharts, Decorative Drawings)
Data Visualization Chart
Reverse Parsing
CAD Drawing Semantic Recognition Data Visualization Chart Reverse Parsing

Usage Guide

System Requirements

Important Notice:

  • Platform: Nvidia DGX Spark (optimized) / Linux
  • GPU Requirements: GPU >= 7 GB VRAM (16-24 GB recommended for large images/multi-page PDFs)
  • Compatibility Note: Uses NVIDIA NGC PyTorch 25.12 (CUDA 13.1). Some builds may still warn on GB10 (sm_121).

Quick Start

This repository is designed for a one-step setup on Nvidia DGX Spark.

Choose one of the following methods:

Method Best For Setup Time
Docker (Recommended) Production, Nvidia DGX Spark, Easy setup ~10 min
Native Script Development, Custom setup ~20 min
Manual Installation Full control ~30 min

Method 1: Docker (Recommended)

Docker provides the easiest setup with all dependencies pre-configured, specifically tailored for Nvidia DGX Spark with NVIDIA NGC PyTorch 25.12 (CUDA 13.1).

Prerequisites:

Quick Start:

# 1. Download model weights
pip install modelscope
mkdir -p ./deepseek-ocr
modelscope download --model deepseek-ai/DeepSeek-OCR --local_dir ./deepseek-ocr

# 2. Build and run (Optimized for Nvidia DGX Spark)
# Use --network=host if you have DNS issues
docker build --network=host -t deepseek-ocr-web .
docker run -d --gpus all \
  -p 8002:8002 -p 3001:3000 \
  -v ./deepseek-ocr:/app/deepseek-ocr:ro \
  -v ./workspace:/app/workspace \
  --restart unless-stopped \
  --name deepseek-ocr-web \
  deepseek-ocr-web

# 3. Access the application
# Frontend: http://localhost:3001 (or http://<tailscale-ip>:3001)
# Backend:  http://localhost:8002

For detailed Docker documentation including development mode, troubleshooting, and configuration options, see DOCKER.md.


Method 2: Native Script

One-click setup for native installation (requires Conda).

# Install dependencies and download model
bash install.sh

# Start services
bash start.sh

Access:


Method 3: Manual Installation

For full control over the installation process.

Step 1: Download Model Weights

pip install modelscope
mkdir ./deepseek-ocr
modelscope download --model deepseek-ai/DeepSeek-OCR --local_dir ./deepseek-ocr

Step 2: Setup Environment

# Create Conda environment
conda create -n deepseek-ocr -c conda-forge python=3.12 nodejs=22 -y
conda activate deepseek-ocr

# Install PyTorch
pip install torch torchvision torchaudio

# Install dependencies
pip install -r requirements.txt

# Optional: Install flash-attn for acceleration
pip install flash-attn --no-build-isolation

Step 3: Configure Environment

Create .env file in project root:

MODEL_PATH=/path/to/deepseek-ocr

Step 4: Start Services

# Terminal 1: Backend
cd backend
uvicorn main:app --host 0.0.0.0 --port 8002 --reload

# Terminal 2: Frontend
cd frontend
npm install
# Use 3001 for easy remote access (e.g. via Tailscale)
npm run dev -- --hostname 0.0.0.0 --port 3001

File Locations

Data Location Description
Uploaded Files workspace/uploads/ Original PDFs and images
OCR Results workspace/results/ Markdown output, annotated images
Job History workspace/logs/ Task status and metadata
Model Weights deepseek-ocr/ DeepSeek-OCR model files

Documentation

  • DOCKER.md - Docker deployment guide, development mode, troubleshooting

Contributing

We welcome contributions through GitHub PR submissions or issues. All forms of contribution are appreciated, including feature improvements, bug fixes, or documentation optimization.

Technical Communication

Scan to add our assistant, reply "DeepSeekOCR" to join the technical communication group.

Technical Communication Group QR Code

About

Out-of-the-box DeepSeek OCR document parsing Web Studio

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • Python 55.0%
  • TypeScript 40.6%
  • Shell 2.7%
  • CSS 1.1%
  • Other 0.6%