Skip to content

✨File preview: Add file preview backend service#2642

Open
Stockton11 wants to merge 4 commits intodevelopfrom
zwb/file_preview
Open

✨File preview: Add file preview backend service#2642
Stockton11 wants to merge 4 commits intodevelopfrom
zwb/file_preview

Conversation

@Stockton11
Copy link

1.添加文件预览接口/preview/{object_name:path}
2.添加文件预览逻辑。
office文件:使用LibreOffice将Office文件转换为PDF并将转换后的PDF缓存回MinIO,设置为7天有效期。(重构为LibreOffice 在 data_process 容器中运行,backend 通过 HTTP 调用)
其他文件:直接返回文件流。
3.添加对应单元测试文件

Copilot AI review requested due to automatic review settings March 6, 2026 09:20
@Stockton11 Stockton11 review requested due to automatic review settings March 6, 2026 09:21
@codecov
Copy link

codecov bot commented Mar 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copilot AI review requested due to automatic review settings March 6, 2026 09:57
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a file preview backend service allowing users to preview documents inline in the browser. Office documents (.doc, .docx, .xls, .xlsx, .ppt, .pptx) are converted to PDF using LibreOffice running in the data_process container, with the converted PDFs cached in MinIO for 7 days. Other file types (PDF, images, text) are served as-is.

Changes:

  • Adds /preview/{object_name:path} endpoint to the file management API, with inline Content-Disposition headers, caching, and ETag support
  • Adds copy_file and file_exists operations to all three storage layers (SDK, client.py, attachment_db.py), plus a /tasks/convert_to_pdf endpoint to data_process_app.py and convert_office_to_pdf_impl to data_process_service.py
  • Adds convert_office_to_pdf utility in file_management_utils.py, CJK font support in the data-process Dockerfile, MinIO ILM lifecycle rules for auto-expiry of preview cache, and comprehensive unit tests for all new functionality

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
backend/apps/file_management_app.py Adds /preview/{object_name:path} GET endpoint with inline disposition and error handling; extends build_content_disposition_header with inline parameter
backend/apps/data_process_app.py Adds /tasks/convert_to_pdf POST endpoint to trigger Office-to-PDF conversion
backend/services/file_management_service.py Adds preview_file_impl, _get_cached_pdf_stream, _convert_office_to_cached_pdf with per-file locking for duplicate-conversion prevention
backend/services/data_process_service.py Adds convert_office_to_pdf_impl pipeline (download→convert→upload→validate→cleanup) with concurrency semaphore
backend/utils/file_management_utils.py Adds async convert_office_to_pdf wrapper around LibreOffice subprocess
backend/database/attachment_db.py Adds file_exists and copy_file functions; adds .md MIME type
backend/database/client.py Adds file_exists and copy_file to MinioClient
sdk/nexent/storage/minio.py Adds copy_file to MinIOStorageClient
sdk/nexent/storage/storage_client_base.py Adds copy_file abstract method to StorageClient ABC
backend/consts/const.py Adds FILE_PREVIEW_SIZE_LIMIT, MAX_CONCURRENT_CONVERSIONS, OFFICE_MIME_TYPES
backend/consts/exceptions.py Adds OfficeConversionException, UnsupportedFileTypeException, FileTooLargeException
docker/docker-compose.yml, docker/docker-compose.prod.yml Adds MinIO ILM lifecycle rule to expire preview cache after 7 days
make/data_process/Dockerfile Adds CJK font support for LibreOffice
.github/workflows/auto-unit-test.yml Installs LibreOffice in CI for tests
test/sdk/storage/test_minio.py Tests for copy_file in MinIOStorageClient
test/backend/database/test_attachment_db.py Tests for file_exists and copy_file in attachment_db
test/backend/database/test_client.py Tests for file_exists and copy_file in MinioClient
test/backend/services/test_file_management_service.py Tests for preview_file_impl, _get_cached_pdf_stream, _convert_office_to_cached_pdf
test/backend/services/test_data_process_service.py Tests for convert_office_to_pdf_impl pipeline
test/backend/utils/test_file_management_utils.py Tests for convert_office_to_pdf utility
test/backend/app/test_file_management_app.py Tests for preview_file endpoint and build_content_disposition_header
test/backend/app/test_data_process_app.py Tests for /tasks/convert_to_pdf endpoint

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants