diff --git a/.gitignore b/.gitignore
index 2bb4caa..b2653b2 100644
--- a/.gitignore
+++ b/.gitignore
@@ -85,6 +85,11 @@ test-*.webp
test_*.py
*_test.py
scripts/test_*.py
+!fastapi_app/tests/test_*.py
+!fastapi_app/tests/*_test.py
+
+# Frontend test artifacts
+frontend/test-results/
# Jupyter
.ipynb_checkpoints/
diff --git a/README.md b/README.md
index 2ad3e5a..ae35666 100644
--- a/README.md
+++ b/README.md
@@ -1,305 +1,522 @@
-
-
-
+
-# Open-NotebookLM / ThinkFlow
-
-Open-NotebookLM 是一个面向论文阅读、产品调研、课程学习和团队汇报的 AI 知识工作台。前端产品形态叫 ThinkFlow,它把资料导入、基于来源的问答、文本/图片多模态检索、知识沉淀和多形态产出放进同一个笔记本里,让一次资料处理可以持续演化为摘要、梳理文档、报告、导图、PPT、播客、卡片和测验。
-
-项目由 FastAPI 后端、React/Vite 前端和本地文件工作区组成。后端负责来源管理、文档处理、文本与视觉向量索引、知识库记录、LLM/VLM 调用、TTS、PPT/报告等产出编排;前端提供三栏式知识工作台、对话区、图片附件、PDF 图片图库、文档工作台和产出预览。
-
-> 本 README 的截图来自 2026-06-01 的本地 Playwright 走查,截图材料是演示来源,不包含账号密码。
-
-
-
-## 项目能做什么
-
-Open-NotebookLM 解决的是“资料进来以后如何持续加工成可复用成果”的问题。它不是只提供一次性聊天窗口,而是把来源、对话、沉淀和产出组织成一个可回溯的闭环。
-
-- **统一管理来源**:支持上传文件、粘贴文本、导入网页,也可以通过搜索和深度研究补充外部资料。
-- **围绕来源问答**:对话区基于已选来源回答问题,适合论文精读、竞品调研、课程复习和材料梳理。
-- **VLM 多模态检索**:对话区可在文本检索和 VLM 检索之间切换,支持粘贴/附加图片,用图片问题或视觉线索检索 PDF 页面图、插图和图片来源。
-- **PDF 图片索引与图库**:PDF 入库后可以重建图片索引、提取图片并在来源侧查看,VLM 模式会优先利用视觉索引和多模态 embedding。
-- **音视频来源处理**:音频、视频和图片可以作为来源进入工作区,后端会尽量转写、OCR 或调用 VLM 生成可检索内容。
-- **把有价值内容沉淀下来**:重要回答可以进入 Summary、梳理文档或产出指导,避免聊天内容一次性消失。
-- **保留对话上下文**:支持新建对话、查看历史对话,并保存每轮对话绑定的来源、活跃文档和产出工作区状态。
-- **生成多种结果**:基于来源快照生成报告、思维导图、PPT、播客、学习卡片和测验。
-- **保留产出依据**:产出会锁定当次来源、梳理文档和产出指导,方便后续追溯与重生成。
-
-## 核心工作流
-
-ThinkFlow 的主界面采用三栏布局。左侧管理来源、对话历史和已生成产出;中间是基于来源的主对话;右侧是知识资产和产出工作台。
+

-
-
-典型流程如下:
-
-1. **建立笔记本**:每个笔记本对应一次研究、课程、产品调研或汇报任务。
-2. **导入来源**:把 PDF、Markdown、网页、访谈纪要或粘贴文本统一登记为来源。
-3. **基于来源对话**:围绕选中的来源提问,逐步形成可验证的理解;需要读图、看 PDF 页面图或上传截图时,可切换到 VLM 模式。
-4. **沉淀知识资产**:把关键结论保存到 Summary、梳理文档和产出指导。
-5. **生成结果**:选择报告、导图、PPT、播客、卡片或测验,基于锁定的来源快照生成成果。
-
-## 功能导览
-
-### 1. 来源与对话
-
-来源是整个系统的第一优先级。用户可以在左侧栏选择参与当前对话和产出的材料,中间对话区会围绕这些来源回答问题。回答旁边提供沉淀入口,方便把单条回答、一轮问答或多条消息推送到右侧知识资产。
-
-
-
-### 2. VLM 多模态检索与图片附件
-
-最新合并的多模态检索能力把普通文本 RAG 扩展为文本、图片和 PDF 页面图的联合检索。中间对话区可以一键切换“文本 / VLM”模式;在 VLM 模式下,用户可以直接粘贴图片、附加本地图片,或用文字问题检索 PDF 中抽取出的图片、页面截图和图表。后端会使用 `VISUAL_EMBEDDING_*` 配置构建视觉索引;如果没有单独配置视觉 embedding,也会按配置回退到普通 embedding 服务。
-
-这一能力适合处理论文图表、PPT 截图、产品页面截图、实验结果图和带大量插图的 PDF。回答中可以返回检索到的图片线索,帮助用户从“图片证据”回到原始来源。
+# Open-NotebookLM / ThinkFlow
-
+[](https://www.python.org/)
+[](https://fastapi.tiangolo.com/)
+[](https://react.dev/)
+[](https://vitejs.dev/)
+[](LICENSE)
+[](https://github.com/OpenDCAI/ThinkFlow)
-### 3. PDF 图片索引、图库与多格式来源
+中文 | [English](README_EN.md)
-PDF 来源除了正文解析外,还支持提取页面图和内嵌图片,并在左侧来源区提供图片索引重建、PDF 图片查看等入口。对于图片来源,系统可以调用 VLM 做 OCR/描述;对于音频和视频来源,后端会尽量转写成可检索文本,让访谈、演示视频、课程录音也能进入同一个知识库。
+✨ **面向论文阅读、产品调研、课程学习和团队汇报的 AI 知识工作台:从资料导入、来源问答、多模态检索,到文档沉淀、报告、导图、PPT、播客、卡片和测验生成** ✨
-这类多格式来源会统一沉淀到笔记本目录下,并参与后续对话、文档沉淀和报告/PPT 等产出。
+| 📚 **基于来源问答** | 🧠 **多模态检索** | 📝 **知识工作台** | 🎬 **多形态产出** |
-### 4. 对话历史与工作区状态
+
-ThinkFlow 现在支持更完整的多轮对话工作区。用户可以新建对话、查看历史对话,并在每个对话中保存当前选择的来源、绑定文档、活跃文档和相关产出状态。这样同一个笔记本可以容纳多个研究分支,例如“论文方法细读”“实验复现问题”“汇报大纲讨论”,每个分支都能保留自己的上下文。
+
+
+
+
+
+
+
+
+
-
+
+
-### 5. Summary 卡片
+

-Summary 用来保存从对话和资料中提炼出的关键结论。它适合承载“我已经确认过的要点”,也可以进一步重算为总 Summary,作为后续产出的背景材料。
+
-
+## 📑 目录
-### 6. 梳理文档
+- [✨ 核心功能](#-核心功能)
+- [🔁 工作流](#-工作流)
+- [📸 功能展示](#-功能展示)
+- [🚀 快速启动](#-快速启动)
+- [⚙️ 配置说明](#️-配置说明)
+- [📂 项目结构](#-项目结构)
+- [🧪 开发命令](#-开发命令)
+- [📦 本地数据位置](#-本地数据位置)
+- [🗺️ 路线图](#️-路线图)
+- [📚 更多文档](#-更多文档)
+
+## ✨ 核心功能
+
+> ThinkFlow 把一个笔记本变成可追踪的知识生产闭环:来源进入笔记本,对话形成理解,确认过的内容被沉淀,最终产出从锁定上下文中生成。
+
+- **📚 统一来源接入**:支持上传文件、粘贴文本、导入网页、搜索和深度研究,把材料统一放进一个笔记本。
+- **💬 基于来源问答**:围绕已选来源提问,保留引用、来源映射和多轮上下文。
+- **🧠 VLM 多模态检索**:在文本模式和 VLM 模式之间切换,支持图片附件、粘贴图片、PDF 页面图和图表证据检索。
+- **🖼️ PDF 图片索引与图库**:可重建 PDF 图片索引、查看抽取图片,并让视觉证据参与检索和后续产出。
+- **📝 知识工作台**:把有价值的回答保存为 Summary 卡片、可编辑文档和产出指导,避免知识散落在聊天里。
+- **📌 对话状态保留**:按对话保存已选来源、绑定文档、活跃文档和产出上下文。
+- **📄 报告生成**:基于来源、梳理文档和产出指导生成报告草稿。
+- **🗺️ 思维导图生成**:把材料整理成层级结构,支持预览和导出。
+- **🎞️ PPT 工作流**:先生成大纲,再逐页生成和确认演示内容。
+- **🎧 播客生成**:把来源内容转成脚本和可播放音频。
+- **🧩 学习产出**:生成学习卡片和测验,适合课程复习、团队培训和知识验收。
+- **🎬 视频生成**:基于资料和脚本生成分镜、口播稿和视频结果。
+
+---
+
+## 🔁 工作流
+
+
+
+| 1. 导入 | 2. 提问 | 3. 沉淀 | 4. 约束 | 5. 生成 |
+| --- | --- | --- | --- | --- |
+| PDF / Word / 图片 / 音频 / 视频 / 文本 / 网页 | 文本 RAG 或 VLM 检索 | Summary、文档和可复用笔记 | 产出指导和来源快照 | 报告、导图、PPT、视频、播客、卡片、测验 |
+
+
+
+ThinkFlow 不是一次性聊天窗口,而是为持续知识工作设计的工作台:
+
+1. **创建笔记本**:对应一篇论文、一次产品调研、一门课程或一场团队汇报。
+2. **导入来源**:选择哪些来源参与当前对话或产出。
+3. **基于来源提问**:文本模式处理普通资料,VLM 模式处理图片、截图、PDF 图表和视觉证据。
+4. **沉淀确认过的内容**:把关键结论保存到 Summary、文档和产出指导。
+5. **生成最终成果**:从锁定的来源、文档和指导中生成可追溯结果。
+
+---
+
+## 📸 功能展示
+
+### 📚 来源工作区与基于来源问答
+
+
+
+
+
+
+
+ ✨ 将文件、文本、网页、搜索和深度研究材料统一放进笔记本
+ |
+
+
+ ✨ 三栏式工作区统一管理来源、对话、文档和产出
+ |
+
+
+
+
+
+### 🧠 多模态检索
+
+
+
+
+
+
+
+ ✨ 文本模式检索来源片段,并围绕上下文回答问题
+ |
+
+
+ ✨ VLM 模式支持图片提问,并检索 PDF 和图片来源里的视觉证据
+ |
+
+
+
+
+
+### 📝 知识工作台
+
+
+
+
+
+
+
+ ✨ 把关键结论保存为 Summary 卡片
+ |
+
+
+ ✨ 维护可编辑梳理文档,作为报告和 PPT 的主输入
+ |
+
+
+ ✨ 保存受众、风格和重点约束,指导后续生成
+ |
+
+
+
+
+
+
+
+
+
+ ✨ 将有价值的对话回答推送到可复用知识资产
+ |
+
+
+ ✨ 在后续产出中显式引用已经沉淀的文档
+ |
+
+
+
+
+
+### 📄 报告与导图
+
+
+
+
+
+
+
+ ✨ 从来源、文档和产出指导生成报告草稿
+ |
+
+
+ ✨ 把材料整理成层级导图,便于快速复盘
+ |
+
+
+
+
+
+### 🧩 学习产出
+
+
+
+
+
+
+
+ ✨ 将来源内容转成学习卡片
+ |
+
+
+ ✨ 在工作台里翻阅和复习卡片
+ |
+
+
+ ✨ 生成带答案和解释的测验题
+ |
+
+
+
+
+
+### 🎞️ PPT、视频和播客
+
+
+
+
+
+
+
+ ✨ 先生成并调整 PPT 大纲,再进入逐页生成
+ |
+
+
+ ✨ 检查逐页生成进度和页面内容
+ |
+
+
+ ✨ 在产出工作台中打开生成结果
+ |
+
+
+
+
+
+
+
+
+
+ ✨ 生成视频前确认口播稿和分镜
+ |
+
+
+ ✨ 从锁定来源生成播客脚本和可播放音频
+ |
+
+
+
+
+
+
查看视频生成演示
+
+
+
+---
+
+## 🚀 快速启动
-梳理文档是后续报告、导图和 PPT 的主输入区。它不是聊天记录副本,而是用户确认过的正文内容,可以持续追加、整理、融合和回看历史版本。
+### 环境要求
-
+- Python 3.11 或更高版本
+- Node.js 18 或更高版本
+- npm
+- 可用的 LLM 和 embedding API 配置
+- 可选:`ffmpeg`,用于音视频处理和媒体产出
-### 7. 产出指导
+Ubuntu 常用运行依赖示例:
-产出指导是高权重 brief,用来约束后续结果的重点、风格和讲述顺序。它适合保存“最终产出应该强调什么、避免什么、采用什么口径”这类信息。
+```bash
+sudo apt-get update
+sudo apt-get install -y ffmpeg libxcb-shm0 libxcb-shape0 libxcb-xfixes0
+```
-
+### 1. 克隆项目
-### 8. 报告生成
+```bash
+git clone https://github.com/OpenDCAI/Open-NotebookLM.git
+cd Open-NotebookLM
+```
-报告产出会合并来源、梳理文档和产出指导,生成可预览、可下载的 Markdown/PDF 结果。它适合作为调研报告、论文阅读笔记、课程总结或汇报材料底稿。
+### 2. 创建环境并安装后端依赖
-
+```bash
+python -m venv .venv
+source .venv/bin/activate
+pip install -r requirements.txt
+```
-### 9. 思维导图
+如果需要运行测试:
-思维导图会把来源内容整理成层级结构,便于快速把握主题、模块和子问题。前端提供展开、收缩、缩放、适应视图、下载 PNG、导出文本和 Mermaid 等操作入口。
+```bash
+pip install -r requirements-dev.txt
+```
-
+### 3. 安装前端依赖
-### 10. 学习卡片
+```bash
+cd frontend
+npm install
+cd ..
+```
-学习卡片把材料转成逐张翻阅的问答卡,适合课程复习、论文方法记忆、产品知识培训和团队 onboarding。
+### 4. 配置环境变量
-
+```bash
+cp fastapi_app/.env.example fastapi_app/.env
+```
-### 11. 互动测验
+编辑 `fastapi_app/.env`,至少填写 LLM 和 embedding 配置。示例见 [配置说明](#️-配置说明)。
-测验产出会基于来源生成选择题,并保留正确答案和解释。它适合检查资料理解、课程复习和团队知识验收。
+### 5. 一键启动
-
+```bash
+./scripts/start.sh
+```
-### 12. PPT 工作台
+脚本会启动:
-PPT 采用阶段化流程:先生成可讨论的大纲,再确认大纲并进入逐页生成、页级核对和单页重做。PPT 工作台会展示来源锁定、大纲确认、逐页生成确认、确认进度和重新生成入口。
+- 后端:`http://localhost:8000`
+- 前端:`http://localhost:3001`
+- 本地 embedding 服务:默认 `8899` 端口,如果端口已有服务会复用
+- 监控脚本:用于基础进程恢复
-
+停止服务:
-### 13. 播客生成
+```bash
+./scripts/stop.sh
+```
-播客产出会基于锁定来源生成脚本和音频文件。结果页提供音频播放器、重新生成、回流来源和打开结果入口,适合把阅读材料转成可听内容。
+### 6. 手动启动
-
+如果不希望脚本启动内置本地 embedding 服务,可以手动启动前后端:
-## 适用场景
+```bash
+# 终端 1:后端
+python -m uvicorn fastapi_app.main:app --host 0.0.0.0 --port 8000
+```
-- **论文阅读**:导入论文、实验记录和参考资料,围绕方法、贡献、实验和局限持续提问,再沉淀为摘要、梳理文档和汇报材料。
-- **产品调研**:整合网页、竞品资料、访谈纪要和行业报告,生成竞品分析、调研报告、导图和路线图讨论材料。
-- **课程学习**:把教材或讲义转成问答、卡片和测验,形成可复习的学习资产。
-- **团队汇报**:把原始资料加工为报告、导图和 PPT,并保留来源快照,方便回溯结果依据。
-- **数据与表格分析**:项目内还包含数据抽取和表格分析相关接口,可用于把结构化数据接入对话式分析流程。
+```bash
+# 终端 2:前端
+cd frontend
+npm run dev -- --host 0.0.0.0 --port 3001
+```
-## 项目结构
+打开:
```text
-.
-├── fastapi_app/ # FastAPI 后端,包含认证、知识库、来源、文档、产出、TTS、搜索等路由
-├── frontend/ # React + Vite 前端,ThinkFlow 主界面
-├── workflow_engine/ # 工作流和算子引擎
-├── vendor/presentagent/ # 可编辑 PPT / PresentAgent 相关集成
-├── docs/ # 设计文档和 README 截图资产
-├── outputs/ # 本地用户数据、来源文件、产出文件和工作区状态
-├── scripts/ # 启停脚本、监控脚本、embedding 启动脚本
-└── requirements-base.txt # Python 后端基础依赖
+http://localhost:3001
```
-## 快速启动
+健康检查:
-### 环境要求
+```bash
+curl http://localhost:8000/health
+```
-- Python 3.11
-- Node.js 18 或更高版本
-- npm
-- 可用的 LLM / Embedding / TTS / Image Generation 配置,按需填写在 `fastapi_app/.env`
+期望返回:
-### 1. 配置环境变量
+```json
+{"status":"ok"}
+```
-复制后端环境变量模板:
+---
-```bash
-cp fastapi_app/.env.example fastapi_app/.env
-```
+## ⚙️ 配置说明
+
+后端配置位于 `fastapi_app/.env`。示例文件只包含占位符,请替换为你自己的服务配置。
-至少需要根据你要使用的功能配置这些变量:
+### LLM
```bash
LLM_API_URL=https://api.example.com/v1
LLM_API_KEY=your_llm_api_key
LLM_MODEL=your_model_name
+```
+
+### Embedding
+OpenAI 兼容或 ApiYi 兼容 embedding:
+
+```bash
EMBEDDING_PROVIDER=apiyi
EMBEDDING_API_URL=https://api.example.com/v1
EMBEDDING_API_KEY=your_embedding_api_key
EMBEDDING_MODEL=text-embedding-3-small
-
-TTS_PROVIDER=apiyi
-TTS_API_URL=https://api.example.com/v1
-TTS_API_KEY=your_tts_api_key
-TTS_MODEL=qwen-tts
+EMBEDDING_DIMENSION=1536
```
-VLM 多模态检索和 PDF 图片索引是可选增强能力。如果需要启用图片 embedding 和 VLM 对话,可以继续配置:
+本地 embedding 服务:
```bash
-KB_VLM_MODEL=your_multimodal_chat_model
-
-VISUAL_EMBEDDING_API_URL=https://api.example.com/v1
-VISUAL_EMBEDDING_API_KEY=your_visual_embedding_api_key
-VISUAL_EMBEDDING_MODEL=qwen3-vl-embedding
+EMBEDDING_PROVIDER=local
+EMBEDDING_API_URL=http://localhost:8899/v1
+EMBEDDING_API_KEY=
+EMBEDDING_MODEL=/path/to/your/embedding-model
+EMBEDDING_DIMENSION=1024
```
-视觉索引初始化需要显式配置 `VISUAL_EMBEDDING_API_URL`;`VISUAL_EMBEDDING_API_KEY` 留空时会回退到普通 `EMBEDDING_API_KEY`。如果未配置视觉 embedding 或 VLM,图片检索和图片理解能力会受限,但普通文本来源、文本问答和文档沉淀仍可运行。
+> [!NOTE]
+> `scripts/start.sh` 会在 `8899` 端口空闲时尝试启动 `scripts/start_embedding_4b.sh`。如果你的机器没有默认本地模型路径,请设置 `EMBEDDING_MODEL` 和 `EMBEDDING_PYTHON_BIN`,或改用外部 embedding provider。
+
+### VLM 与视觉 embedding
-如果需要登录和云端用户体系,继续配置 Supabase:
+这些配置用于启用图片附件、PDF 图片检索和多模态回答:
```bash
-SUPABASE_URL=https://your-project-id.supabase.co
-SUPABASE_ANON_KEY=your_supabase_anon_key
-SUPABASE_SERVICE_ROLE_KEY=your_supabase_service_role_key
+KB_VLM_MODEL=your_multimodal_chat_model
+VISUAL_EMBEDDING_API_URL=https://api.example.com/v1
+VISUAL_EMBEDDING_API_KEY=your_visual_embedding_api_key
+VISUAL_EMBEDDING_MODEL=your_visual_embedding_model
```
-如果不配置 Supabase,项目仍可用本地工作区方式运行;数据会主要保存在 `outputs/` 下。
-
-### 2. 安装依赖
+如果 `VISUAL_EMBEDDING_API_KEY` 留空,视觉 embedding 客户端可以回退使用普通 embedding key。即使未配置 VLM 或视觉 embedding,文本 RAG、来源导入、文档沉淀和标准产出仍可运行。
-后端:
+### TTS、搜索、图像生成和视频
```bash
-python -m venv .venv
-source .venv/bin/activate
-pip install -r requirements-base.txt
-```
+TTS_PROVIDER=apiyi
+TTS_API_URL=https://api.example.com/v1
+TTS_API_KEY=your_tts_api_key
+TTS_MODEL=qwen-tts
-前端:
+SEARCH_PROVIDER=serper
+SERPER_API_KEY=your_serper_key_here
+SERPAPI_KEY=your_serpapi_key_here
+BOCHA_API_KEY=your_bocha_key_here
-```bash
-cd frontend
-npm install
-cd ..
+IMAGE_GEN_API_URL=https://api.example.com/v1
+IMAGE_GEN_API_KEY=your_image_gen_api_key
+IMAGE_GEN_MODEL=your_image_model
+
+GUI_PLUS_API_KEY=your_dashscope_or_bailian_key
+LIVEPORTRAIT_KEY=your_liveportrait_key
```
-### 3. 一键启动
+### Supabase 认证
-仓库提供了后台启动脚本:
+Supabase 是可选配置。未配置时,应用仍可使用 `outputs/` 下的本地工作区数据运行。
```bash
-./scripts/start.sh
+SUPABASE_URL=https://your-project-id.supabase.co
+SUPABASE_ANON_KEY=your_supabase_anon_key
+SUPABASE_SERVICE_ROLE_KEY=your_supabase_service_role_key
```
-脚本会启动:
-
-- 后端:`http://localhost:8000`
-- 前端:`http://localhost:3001`
-- 本地 embedding 服务:默认 `8899` 端口,如果该端口已有服务会复用
-- 监控脚本:异常时尝试拉起服务
+---
-停止服务:
+## 📂 项目结构
-```bash
-./scripts/stop.sh
+```text
+.
+├── fastapi_app/ # FastAPI 后端:认证、笔记本、来源、文档、产出、搜索、TTS
+├── frontend/ # React + Vite 前端,ThinkFlow 主工作区
+├── workflow_engine/ # 工作流编排、多模态工具、提示词模板和产出流水线
+├── docs/ # 产品文档、架构说明、走查文档和 README 资产
+│ └── assets/ # README/docs 使用的截图、logo 和视频素材
+├── scripts/ # 启停脚本、监控脚本和本地 embedding 服务启动脚本
+├── static/ # 静态 README/产品资产
+├── requirements.txt # Python 依赖入口
+├── requirements-base.txt # 后端运行时依赖
+└── requirements-dev.txt # 测试/开发依赖
```
-### 4. 手动启动前后端
+---
-如果你只想启动前后端,或者本地没有 embedding 模型环境,可以手动运行:
+## 🧪 开发命令
```bash
-# 终端 1:后端
-python -m uvicorn fastapi_app.main:app --host 0.0.0.0 --port 8000
-```
+# 后端测试
+pytest -q
-```bash
-# 终端 2:前端
-cd frontend
-npm run dev -- --host 0.0.0.0 --port 3001
-```
+# 后端语法检查
+python -m compileall fastapi_app workflow_engine scripts
-前端的 Vite 配置会把 `/api` 和 `/outputs` 代理到 `http://localhost:8000`。
+# 前端构建
+cd frontend && npm run build
-### 5. 检查服务
+# 前端测试
+cd frontend && npm test
-```bash
+# 服务健康检查
curl http://localhost:8000/health
+
+# 停止脚本启动的服务
+./scripts/stop.sh
```
-返回结果应为:
+---
-```json
-{"status":"ok"}
-```
+## 📦 本地数据位置
-然后打开:
+ThinkFlow 默认把运行数据保存在项目目录下,适合本地试用、调试和迁移:
-```text
-http://localhost:3001
-```
+- `outputs/`:笔记本、上传来源、生成结果、向量索引和本地工作区状态。迁移或清理项目前,建议先备份这个目录。
+- `logs/`:通过 `scripts/start.sh` 启动时生成的后端、前端和 embedding 服务日志,便于排查启动、检索和生成问题。
+- `fastapi_app/.env`:本机环境配置文件,由 `fastapi_app/.env.example` 复制生成,用来填写模型、embedding、TTS、搜索、图像生成等 provider 配置。
-## 常用命令
+---
-```bash
-# 前端构建
-cd frontend && npm run build
+## 🗺️ 路线图
-# 前端测试
-cd frontend && npm test
+| 状态 | 模块 | 方向 |
+| --- | --- | --- |
+| ✅ | 基于来源的知识工作台 | 笔记本、来源、对话、引用、文档和产出 |
+| ✅ | 多模态检索 | VLM 模式、图片附件、视觉 embedding、PDF 图片图库 |
+| ✅ | 知识资产 | Summary 卡片、可编辑文档、产出指导、文档引用 |
+| ✅ | 多形态产出 | 报告、导图、PPT、播客、卡片、测验、视频 |
+| 🚧 | 可编辑产出流程 | 为 PPT、视频和报告提供更结构化的审阅与编辑闭环 |
+| 🚧 | 部署方案 | 补充 Docker/生产部署和 provider 配置指南 |
+| 🚧 | 评测与追踪 | 增强生成 trace、来源覆盖检查和产出质量诊断 |
-# 查看后端健康状态
-curl http://localhost:8000/health
+---
-# 停止脚本启动的服务
-./scripts/stop.sh
-```
+## 📚 更多文档
+
+- [docs/](docs/)
-## 数据和产物位置
+您可以使用 Claude Code / Codex 阅读 `docs/`,帮助理解整个项目。
-- `outputs/`:笔记本、来源、向量索引、工作区状态和生成结果。
-- `logs/`:通过 `scripts/start.sh` 启动时产生的后端、前端和 embedding 日志。
-- `docs/assets/thinkflow/`:README 使用的截图资产。
+---
-## 更多文档
+## 📄 许可证
-- [ThinkFlow 走查 README](docs/thinkflow-readme.md)
-- [开发架构说明](docs/development-architecture-guide.md)
-- [文件处理流程](docs/thinkflow-upload-file-processing-flow.md)
-- [OnlyOffice 可编辑 PPT](docs/onlyoffice-editable-ppt.md)
+本项目使用 [Apache License 2.0](LICENSE)。
diff --git a/README_EN.md b/README_EN.md
new file mode 100644
index 0000000..9628c14
--- /dev/null
+++ b/README_EN.md
@@ -0,0 +1,522 @@
+
+
+

+
+# Open-NotebookLM / ThinkFlow
+
+[](https://www.python.org/)
+[](https://fastapi.tiangolo.com/)
+[](https://react.dev/)
+[](https://vitejs.dev/)
+[](LICENSE)
+[](https://github.com/OpenDCAI/ThinkFlow)
+
+[中文](README.md) | English
+
+✨ **An AI knowledge workspace for paper reading, product research, course learning, and team presentations: from source ingestion and grounded chat to multimodal retrieval, knowledge assets, reports, mindmaps, PPTs, podcasts, flashcards, and quizzes** ✨
+
+| 📚 **Source-grounded QA** | 🧠 **Multimodal Retrieval** | 📝 **Knowledge Workspace** | 🎬 **Multi-output Generation** |
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+

+
+
+
+## 📑 Table of Contents
+
+- [✨ Core Features](#-core-features)
+- [🔁 Workflow](#-workflow)
+- [📸 Showcase](#-showcase)
+- [🚀 Quick Start](#-quick-start)
+- [⚙️ Configuration](#️-configuration)
+- [📂 Project Structure](#-project-structure)
+- [🧪 Development Commands](#-development-commands)
+- [📦 Local Data Locations](#-local-data-locations)
+- [🗺️ Roadmap](#️-roadmap)
+- [📚 More Docs](#-more-docs)
+
+## ✨ Core Features
+
+> ThinkFlow turns a notebook into a traceable knowledge production loop: sources enter the notebook, conversations refine understanding, confirmed knowledge is saved, and final outputs are generated from locked context.
+
+- **📚 Unified source ingestion**: upload files, paste text, import URLs, run search/deep-research flows, and organize all materials inside one notebook.
+- **💬 Source-grounded conversation**: ask questions against selected sources, keep citations and source mappings, and continue in multiple named conversation branches.
+- **🧠 VLM multimodal retrieval**: switch between text mode and VLM mode, attach or paste images, retrieve PDF page images/figures, and ground answers in visual evidence.
+- **🖼️ PDF image indexing and gallery**: rebuild PDF image indexes, view extracted images, and feed those visual assets into retrieval and downstream outputs.
+- **📝 Knowledge workspace**: save useful answers into Summary cards, editable documents, and output guidance instead of losing them in chat history.
+- **📌 Stateful conversations**: preserve selected sources, bound documents, active documents, and output context per conversation.
+- **📄 Report generation**: produce report drafts from sources, documents, and guidance.
+- **🗺️ Mindmap generation**: turn source material into navigable hierarchical maps with preview and export options.
+- **🎞️ PPT workflow**: generate outlines first, then review and produce slide-level presentation content.
+- **🎧 Podcast generation**: generate scripts and playable audio from source-grounded context.
+- **🧩 Learning outputs**: create flashcards and quizzes for course review, onboarding, and knowledge checks.
+- **🎬 Video generation**: generate storyboards, narration scripts, and video outputs from source material.
+
+---
+
+## 🔁 Workflow
+
+
+
+| 1. Ingest | 2. Ask | 3. Save | 4. Guide | 5. Generate |
+| --- | --- | --- | --- | --- |
+| PDF / Word / image / audio / video / text / web | Text RAG or VLM retrieval | Summary, documents, and reusable notes | Output guidance and source snapshots | Report, mindmap, PPT, video, podcast, cards, quiz |
+
+
+
+ThinkFlow is not a one-shot chat window. It is designed for iterative knowledge work:
+
+1. **Create a notebook** for a paper, product investigation, class, or team presentation.
+2. **Import sources** and select which ones should participate in each conversation or output.
+3. **Ask source-grounded questions** in text mode, or switch to VLM mode for images, screenshots, PDF figures, and visual references.
+4. **Save confirmed knowledge** into Summary, documents, and output guidance.
+5. **Generate final artifacts** from a locked source/document/guidance context so results remain traceable.
+
+---
+
+## 📸 Showcase
+
+### 📚 Source Workspace and Grounded Chat
+
+
+
+
+
+
+
+ ✨ Bring files, text, URLs, search results, and deep-research materials into one notebook
+ |
+
+
+ ✨ Use the three-column workspace to manage sources, chat, documents, and outputs
+ |
+
+
+
+
+
+### 🧠 Multimodal Retrieval
+
+
+
+
+
+
+
+ ✨ Text mode retrieves source chunks and answers with grounded context
+ |
+
+
+ ✨ VLM mode accepts image prompts and retrieves visual evidence from PDFs and image sources
+ |
+
+
+
+
+
+### 📝 Knowledge Workspace
+
+
+
+
+
+
+
+ ✨ Save distilled conclusions into Summary cards
+ |
+
+
+ ✨ Maintain editable documents as the main input for reports and PPTs
+ |
+
+
+ ✨ Save audience, style, and focus constraints as output guidance
+ |
+
+
+
+
+
+
+
+
+
+ ✨ Push valuable chat answers into reusable knowledge assets
+ |
+
+
+ ✨ Reuse saved documents as explicit references for later outputs
+ |
+
+
+
+
+
+### 📄 Reports and Mindmaps
+
+
+
+
+
+
+
+ ✨ Generate report drafts from sources, documents, and guidance
+ |
+
+
+ ✨ Turn source material into a structured mindmap for fast review
+ |
+
+
+
+
+
+### 🧩 Learning Outputs
+
+
+
+
+
+
+
+ ✨ Convert source content into flashcards
+ |
+
+
+ ✨ Review and flip cards in the workspace
+ |
+
+
+ ✨ Generate quizzes with answers and explanations
+ |
+
+
+
+
+
+### 🎞️ PPT, Video, and Podcast
+
+
+
+
+
+
+
+ ✨ Create and refine PPT outlines before slide generation
+ |
+
+
+ ✨ Review slide generation progress and page-level content
+ |
+
+
+ ✨ Open generated PPT results inside the output workspace
+ |
+
+
+
+
+
+
+
+
+
+ ✨ Confirm narration and storyboard before video rendering
+ |
+
+
+ ✨ Generate podcast scripts and playable audio from locked sources
+ |
+
+
+
+
+
+
View video generation demo
+
+
+
+---
+
+## 🚀 Quick Start
+
+### Requirements
+
+- Python 3.11 or newer
+- Node.js 18 or newer
+- npm
+- LLM and embedding API configuration for the features you want to run
+- Optional: `ffmpeg` for audio/video processing and media outputs
+
+Ubuntu example for common media/runtime packages:
+
+```bash
+sudo apt-get update
+sudo apt-get install -y ffmpeg libxcb-shm0 libxcb-shape0 libxcb-xfixes0
+```
+
+### 1. Clone and enter the project
+
+```bash
+git clone https://github.com/OpenDCAI/Open-NotebookLM.git
+cd Open-NotebookLM
+```
+
+### 2. Create environment and install backend dependencies
+
+```bash
+python -m venv .venv
+source .venv/bin/activate
+pip install -r requirements.txt
+```
+
+For tests:
+
+```bash
+pip install -r requirements-dev.txt
+```
+
+### 3. Install frontend dependencies
+
+```bash
+cd frontend
+npm install
+cd ..
+```
+
+### 4. Configure environment variables
+
+```bash
+cp fastapi_app/.env.example fastapi_app/.env
+```
+
+Edit `fastapi_app/.env` with at least LLM and embedding settings. See [Configuration](#️-configuration) for examples.
+
+### 5. Start all services
+
+```bash
+./scripts/start.sh
+```
+
+The script starts:
+
+- Backend: `http://localhost:8000`
+- Frontend: `http://localhost:3001`
+- Local embedding service on port `8899` if that port is free
+- Monitor script for basic process recovery
+
+Stop services:
+
+```bash
+./scripts/stop.sh
+```
+
+### 6. Manual startup
+
+If you do not want the script to start the bundled local embedding service, run backend and frontend manually:
+
+```bash
+# Terminal 1: backend
+python -m uvicorn fastapi_app.main:app --host 0.0.0.0 --port 8000
+```
+
+```bash
+# Terminal 2: frontend
+cd frontend
+npm run dev -- --host 0.0.0.0 --port 3001
+```
+
+Then open:
+
+```text
+http://localhost:3001
+```
+
+Health check:
+
+```bash
+curl http://localhost:8000/health
+```
+
+Expected response:
+
+```json
+{"status":"ok"}
+```
+
+---
+
+## ⚙️ Configuration
+
+All backend configuration lives in `fastapi_app/.env`. The example file uses placeholders only; replace them with your own provider settings.
+
+### LLM
+
+```bash
+LLM_API_URL=https://api.example.com/v1
+LLM_API_KEY=your_llm_api_key
+LLM_MODEL=your_model_name
+```
+
+### Embedding
+
+OpenAI-compatible or ApiYi-compatible embedding:
+
+```bash
+EMBEDDING_PROVIDER=apiyi
+EMBEDDING_API_URL=https://api.example.com/v1
+EMBEDDING_API_KEY=your_embedding_api_key
+EMBEDDING_MODEL=text-embedding-3-small
+EMBEDDING_DIMENSION=1536
+```
+
+Local embedding service:
+
+```bash
+EMBEDDING_PROVIDER=local
+EMBEDDING_API_URL=http://localhost:8899/v1
+EMBEDDING_API_KEY=
+EMBEDDING_MODEL=/path/to/your/embedding-model
+EMBEDDING_DIMENSION=1024
+```
+
+> [!NOTE]
+> `scripts/start.sh` will try to launch `scripts/start_embedding_4b.sh` when port `8899` is free. If the default local model path is not available on your machine, either set `EMBEDDING_MODEL` and `EMBEDDING_PYTHON_BIN`, or use an external embedding provider.
+
+### VLM and visual embedding
+
+These settings enable image attachments, PDF image retrieval, and multimodal answer grounding:
+
+```bash
+KB_VLM_MODEL=your_multimodal_chat_model
+VISUAL_EMBEDDING_API_URL=https://api.example.com/v1
+VISUAL_EMBEDDING_API_KEY=your_visual_embedding_api_key
+VISUAL_EMBEDDING_MODEL=your_visual_embedding_model
+```
+
+If `VISUAL_EMBEDDING_API_KEY` is empty, the visual embedding client can fall back to the normal embedding key. If VLM or visual embedding is not configured, text RAG, source ingestion, documents, and standard outputs can still run.
+
+### TTS, search, image generation, and video
+
+```bash
+TTS_PROVIDER=apiyi
+TTS_API_URL=https://api.example.com/v1
+TTS_API_KEY=your_tts_api_key
+TTS_MODEL=qwen-tts
+
+SEARCH_PROVIDER=serper
+SERPER_API_KEY=your_serper_key_here
+SERPAPI_KEY=your_serpapi_key_here
+BOCHA_API_KEY=your_bocha_key_here
+
+IMAGE_GEN_API_URL=https://api.example.com/v1
+IMAGE_GEN_API_KEY=your_image_gen_api_key
+IMAGE_GEN_MODEL=your_image_model
+
+GUI_PLUS_API_KEY=your_dashscope_or_bailian_key
+LIVEPORTRAIT_KEY=your_liveportrait_key
+```
+
+### Supabase authentication
+
+Supabase is optional. If it is not configured, the app can still run with local workspace data under `outputs/`.
+
+```bash
+SUPABASE_URL=https://your-project-id.supabase.co
+SUPABASE_ANON_KEY=your_supabase_anon_key
+SUPABASE_SERVICE_ROLE_KEY=your_supabase_service_role_key
+```
+
+---
+
+## 📂 Project Structure
+
+```text
+.
+├── fastapi_app/ # FastAPI backend: auth, notebooks, sources, documents, outputs, search, TTS
+├── frontend/ # React + Vite frontend for the ThinkFlow workspace
+├── workflow_engine/ # Workflow orchestration, multimodal tools, prompt templates, output pipelines
+├── docs/ # Product docs, architecture notes, walkthroughs, and README assets
+│ └── assets/ # Screenshots, logo, and showcase video used by README/docs
+├── scripts/ # Start/stop scripts, monitor, and local embedding service launcher
+├── static/ # Static README/product assets
+├── requirements.txt # Standard Python dependency entrypoint
+├── requirements-base.txt # Backend runtime dependency list
+└── requirements-dev.txt # Test/development dependencies
+```
+
+---
+
+## 🧪 Development Commands
+
+```bash
+# Backend tests
+pytest -q
+
+# Backend syntax check
+python -m compileall fastapi_app workflow_engine scripts
+
+# Frontend build
+cd frontend && npm run build
+
+# Frontend tests
+cd frontend && npm test
+
+# Service health
+curl http://localhost:8000/health
+
+# Stop script-started services
+./scripts/stop.sh
+```
+
+---
+
+## 📦 Local Data Locations
+
+ThinkFlow stores runtime data inside the project directory by default, which makes local trials, debugging, and migration straightforward:
+
+- `outputs/`: notebooks, uploaded sources, generated outputs, vector indexes, and local workspace state. Back up this directory before moving or cleaning the project.
+- `logs/`: backend, frontend, and embedding service logs generated by `scripts/start.sh`, useful for debugging startup, retrieval, and generation issues.
+- `fastapi_app/.env`: local environment configuration copied from `fastapi_app/.env.example`, where you set model, embedding, TTS, search, image generation, and other provider settings.
+
+---
+
+## 🗺️ Roadmap
+
+| Status | Area | Direction |
+| --- | --- | --- |
+| ✅ | Source-grounded workspace | Notebook, sources, conversations, citations, documents, and outputs |
+| ✅ | Multimodal retrieval | VLM mode, image attachments, visual embedding, PDF image gallery |
+| ✅ | Knowledge assets | Summary cards, editable documents, output guidance, document references |
+| ✅ | Multi-output generation | Report, mindmap, PPT, podcast, flashcards, quiz, video |
+| 🚧 | Editable output workflows | More structured review and edit loops for presentation/video/report artifacts |
+| 🚧 | Deployment recipes | Clearer Docker/production setup and provider-specific configuration guides |
+| 🚧 | Evaluation and tracing | Better generation traces, source coverage checks, and output quality diagnostics |
+
+---
+
+## 📚 More Docs
+
+- [docs/](docs/)
+
+You can use Claude Code / Codex to read `docs/` and understand the project.
+
+---
+
+## 📄 License
+
+This project is licensed under the [Apache License 2.0](LICENSE).
diff --git "a/docs/assets/showcase/VLM\346\250\241\345\274\217.png" "b/docs/assets/showcase/VLM\346\250\241\345\274\217.png"
new file mode 100644
index 0000000..47b5cb0
Binary files /dev/null and "b/docs/assets/showcase/VLM\346\250\241\345\274\217.png" differ
diff --git a/docs/assets/showcase/ppt1.png b/docs/assets/showcase/ppt1.png
new file mode 100644
index 0000000..8faa74b
Binary files /dev/null and b/docs/assets/showcase/ppt1.png differ
diff --git a/docs/assets/showcase/ppt2.png b/docs/assets/showcase/ppt2.png
new file mode 100644
index 0000000..1b63669
Binary files /dev/null and b/docs/assets/showcase/ppt2.png differ
diff --git a/docs/assets/showcase/ppt3.png b/docs/assets/showcase/ppt3.png
new file mode 100644
index 0000000..fd144d5
Binary files /dev/null and b/docs/assets/showcase/ppt3.png differ
diff --git a/docs/assets/showcase/video-demo.mp4 b/docs/assets/showcase/video-demo.mp4
new file mode 100644
index 0000000..1c4b1c5
Binary files /dev/null and b/docs/assets/showcase/video-demo.mp4 differ
diff --git "a/docs/assets/showcase/\345\215\241\347\211\2071.png" "b/docs/assets/showcase/\345\215\241\347\211\2071.png"
new file mode 100644
index 0000000..e8e4e6f
Binary files /dev/null and "b/docs/assets/showcase/\345\215\241\347\211\2071.png" differ
diff --git "a/docs/assets/showcase/\345\215\241\347\211\2072.png" "b/docs/assets/showcase/\345\215\241\347\211\2072.png"
new file mode 100644
index 0000000..6041ca4
Binary files /dev/null and "b/docs/assets/showcase/\345\215\241\347\211\2072.png" differ
diff --git "a/docs/assets/showcase/\345\215\241\347\211\2073.png" "b/docs/assets/showcase/\345\215\241\347\211\2073.png"
new file mode 100644
index 0000000..91a5a45
Binary files /dev/null and "b/docs/assets/showcase/\345\215\241\347\211\2073.png" differ
diff --git "a/docs/assets/showcase/\345\215\241\347\211\2074.png" "b/docs/assets/showcase/\345\215\241\347\211\2074.png"
new file mode 100644
index 0000000..c95c3be
Binary files /dev/null and "b/docs/assets/showcase/\345\215\241\347\211\2074.png" differ
diff --git "a/docs/assets/showcase/\345\255\246\344\271\240\345\215\241\347\211\207\347\273\223\346\236\234.png" "b/docs/assets/showcase/\345\255\246\344\271\240\345\215\241\347\211\207\347\273\223\346\236\234.png"
new file mode 100644
index 0000000..d50b575
Binary files /dev/null and "b/docs/assets/showcase/\345\255\246\344\271\240\345\215\241\347\211\207\347\273\223\346\236\234.png" differ
diff --git "a/docs/assets/showcase/\346\200\235\347\273\2641.png" "b/docs/assets/showcase/\346\200\235\347\273\2641.png"
new file mode 100644
index 0000000..c4865ed
Binary files /dev/null and "b/docs/assets/showcase/\346\200\235\347\273\2641.png" differ
diff --git "a/docs/assets/showcase/\346\212\245\345\221\2121.png" "b/docs/assets/showcase/\346\212\245\345\221\2121.png"
new file mode 100644
index 0000000..fa0b81b
Binary files /dev/null and "b/docs/assets/showcase/\346\212\245\345\221\2121.png" differ
diff --git "a/docs/assets/showcase/\346\222\255\345\256\2421.png" "b/docs/assets/showcase/\346\222\255\345\256\2421.png"
new file mode 100644
index 0000000..104a80d
Binary files /dev/null and "b/docs/assets/showcase/\346\222\255\345\256\2421.png" differ
diff --git "a/docs/assets/showcase/\346\226\207\346\234\254\346\250\241\345\274\217.png" "b/docs/assets/showcase/\346\226\207\346\234\254\346\250\241\345\274\217.png"
new file mode 100644
index 0000000..9bc17df
Binary files /dev/null and "b/docs/assets/showcase/\346\226\207\346\234\254\346\250\241\345\274\217.png" differ
diff --git "a/docs/assets/showcase/\346\235\245\346\272\220\345\261\225\347\244\272.png" "b/docs/assets/showcase/\346\235\245\346\272\220\345\261\225\347\244\272.png"
new file mode 100644
index 0000000..9dfd183
Binary files /dev/null and "b/docs/assets/showcase/\346\235\245\346\272\220\345\261\225\347\244\272.png" differ
diff --git "a/docs/assets/showcase/\346\262\211\346\267\200_\346\224\257\346\214\201\345\244\232\346\235\241\344\277\241\346\201\257.png" "b/docs/assets/showcase/\346\262\211\346\267\200_\346\224\257\346\214\201\345\244\232\346\235\241\344\277\241\346\201\257.png"
new file mode 100644
index 0000000..2c13aa6
Binary files /dev/null and "b/docs/assets/showcase/\346\262\211\346\267\200_\346\224\257\346\214\201\345\244\232\346\235\241\344\277\241\346\201\257.png" differ
diff --git "a/docs/assets/showcase/\346\262\211\346\267\200\344\270\272\346\226\207\346\241\243\344\271\213\345\220\216\345\217\257\344\273\245\345\213\276\351\200\211\345\274\225\347\224\250\344\272\206.png" "b/docs/assets/showcase/\346\262\211\346\267\200\344\270\272\346\226\207\346\241\243\344\271\213\345\220\216\345\217\257\344\273\245\345\213\276\351\200\211\345\274\225\347\224\250\344\272\206.png"
new file mode 100644
index 0000000..95318dc
Binary files /dev/null and "b/docs/assets/showcase/\346\262\211\346\267\200\344\270\272\346\226\207\346\241\243\344\271\213\345\220\216\345\217\257\344\273\245\345\213\276\351\200\211\345\274\225\347\224\250\344\272\206.png" differ
diff --git "a/docs/assets/showcase/\346\262\211\346\267\200\344\272\247\345\207\272\346\214\207\345\257\274.png" "b/docs/assets/showcase/\346\262\211\346\267\200\344\272\247\345\207\272\346\214\207\345\257\274.png"
new file mode 100644
index 0000000..646c0d7
Binary files /dev/null and "b/docs/assets/showcase/\346\262\211\346\267\200\344\272\247\345\207\272\346\214\207\345\257\274.png" differ
diff --git "a/docs/assets/showcase/\346\262\211\346\267\200\346\221\230\350\246\201.png" "b/docs/assets/showcase/\346\262\211\346\267\200\346\221\230\350\246\201.png"
new file mode 100644
index 0000000..a9f20e4
Binary files /dev/null and "b/docs/assets/showcase/\346\262\211\346\267\200\346\221\230\350\246\201.png" differ
diff --git "a/docs/assets/showcase/\346\262\211\346\267\200\346\223\215\344\275\234.png" "b/docs/assets/showcase/\346\262\211\346\267\200\346\223\215\344\275\234.png"
new file mode 100644
index 0000000..560ecbf
Binary files /dev/null and "b/docs/assets/showcase/\346\262\211\346\267\200\346\223\215\344\275\234.png" differ
diff --git "a/docs/assets/showcase/\346\262\211\346\267\200\346\226\207\346\241\243.png" "b/docs/assets/showcase/\346\262\211\346\267\200\346\226\207\346\241\243.png"
new file mode 100644
index 0000000..4fddf18
Binary files /dev/null and "b/docs/assets/showcase/\346\262\211\346\267\200\346\226\207\346\241\243.png" differ
diff --git "a/docs/assets/showcase/\350\247\206\351\242\2210.png" "b/docs/assets/showcase/\350\247\206\351\242\2210.png"
new file mode 100644
index 0000000..fb73e85
Binary files /dev/null and "b/docs/assets/showcase/\350\247\206\351\242\2210.png" differ
diff --git "a/docs/assets/showcase/\350\247\206\351\242\2211.png" "b/docs/assets/showcase/\350\247\206\351\242\2211.png"
new file mode 100644
index 0000000..df46fd1
Binary files /dev/null and "b/docs/assets/showcase/\350\247\206\351\242\2211.png" differ
diff --git "a/docs/assets/showcase/\350\247\206\351\242\2212.png" "b/docs/assets/showcase/\350\247\206\351\242\2212.png"
new file mode 100644
index 0000000..d716c25
Binary files /dev/null and "b/docs/assets/showcase/\350\247\206\351\242\2212.png" differ
diff --git "a/docs/assets/showcase/\350\247\206\351\242\2213.png" "b/docs/assets/showcase/\350\247\206\351\242\2213.png"
new file mode 100644
index 0000000..0f0c295
Binary files /dev/null and "b/docs/assets/showcase/\350\247\206\351\242\2213.png" differ
diff --git "a/docs/assets/showcase/\351\227\256\345\215\2671.png" "b/docs/assets/showcase/\351\227\256\345\215\2671.png"
new file mode 100644
index 0000000..675cc71
Binary files /dev/null and "b/docs/assets/showcase/\351\227\256\345\215\2671.png" differ
diff --git a/docs/assets/thinkflow/thinkflow-logo.png b/docs/assets/thinkflow/thinkflow-logo.png
new file mode 100644
index 0000000..6471e46
Binary files /dev/null and b/docs/assets/thinkflow/thinkflow-logo.png differ
diff --git a/fastapi_app/routers/kb.py b/fastapi_app/routers/kb.py
index 0f75982..fd70a14 100644
--- a/fastapi_app/routers/kb.py
+++ b/fastapi_app/routers/kb.py
@@ -1279,6 +1279,15 @@ async def _vlm_describe_base64_image(
return ""
+def _resolve_kb_vlm_model(fallback_model: Optional[str] = None) -> str:
+ return (
+ (getattr(settings, "KB_VLM_MODEL", "") or "").strip()
+ or (getattr(settings, "LLM_MODEL", "") or "").strip()
+ or str(fallback_model or "").strip()
+ or "gemini-2.5-flash"
+ )
+
+
def _load_image_as_data_url(path: str) -> str:
"""Load a local image file and return it as a base64 data URL."""
import base64
@@ -1547,8 +1556,9 @@ async def event_generator():
"message": "正在分析图片内容",
"message_en": "Analyzing attached images",
})
+ vlm_model = _resolve_kb_vlm_model(req.model)
descs = await asyncio.gather(*[
- _vlm_describe_base64_image(url, req.chat_api_url, req.api_key, req.model)
+ _vlm_describe_base64_image(url, req.chat_api_url, req.api_key, vlm_model)
for url in image_attachments
])
image_descriptions = [d for d in descs if d]
@@ -1643,10 +1653,7 @@ async def event_generator():
)
chat_model = req.model
if has_images:
- vlm_model = (
- (getattr(settings, "KB_VLM_MODEL", "") or "").strip()
- or (getattr(settings, "LLM_MODEL", "") or "").strip()
- )
+ vlm_model = _resolve_kb_vlm_model(req.model)
if vlm_model:
chat_model = vlm_model
log.info(f"[VLM] Switching to multimodal model: {chat_model}")
@@ -3058,6 +3065,12 @@ async def generate_mindmap_from_kb(
mermaid_code = getattr(result_state, "mermaid_code", "")
result_path = getattr(result_state, "result_path", "")
+ if str(mermaid_code or "").lstrip().startswith("# Error"):
+ raise HTTPException(
+ status_code=500,
+ detail=str(mermaid_code).replace("# Error", "", 1).strip() or "Mindmap generation failed",
+ )
+
mindmap_path = ""
if result_path:
mmd_path = Path(result_path) / "mindmap.mmd"
diff --git a/fastapi_app/services/output_v2_service.py b/fastapi_app/services/output_v2_service.py
index 3c07031..22a5202 100644
--- a/fastapi_app/services/output_v2_service.py
+++ b/fastapi_app/services/output_v2_service.py
@@ -818,11 +818,406 @@ def _normalize_ppt_outline_item(self, item: Dict[str, Any], index: int) -> Dict[
):
if item.get(key):
normalized[key] = item.get(key)
+ for key in (
+ "generation_failed",
+ "generation_error",
+ "mode",
+ "page_idx",
+ "review_status",
+ "confirmed",
+ ):
+ if key in item:
+ normalized[key] = item.get(key)
return normalized
def _normalize_ppt_outline(self, outline: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
return [self._normalize_ppt_outline_item(item or {}, index) for index, item in enumerate(outline or [])]
+ def _normalize_ppt_output_info(
+ self,
+ output_info: Optional[Dict[str, Any]],
+ *,
+ item: Dict[str, Any],
+ ) -> Dict[str, Any]:
+ raw = output_info if isinstance(output_info, dict) else {}
+ page_count = raw.get("page_count") or item.get("page_count") or len(item.get("outline") or []) or 0
+ try:
+ page_count = int(page_count)
+ except (TypeError, ValueError):
+ page_count = 0
+ return {
+ "type": str(raw.get("type") or item.get("target_type") or "ppt"),
+ "title": str(raw.get("title") or item.get("title") or "PPT").strip() or "PPT",
+ "page_count": page_count,
+ "source_names": list(raw.get("source_names") or item.get("source_names") or []),
+ "bound_document_titles": list(raw.get("bound_document_titles") or item.get("bound_document_titles") or []),
+ }
+
+ def _normalize_ppt_style_info(self, style_info: Optional[Dict[str, Any]]) -> Dict[str, Any]:
+ raw = style_info if isinstance(style_info, dict) else {}
+ preset = str(raw.get("preset") or "clean").strip() or "clean"
+ labels = {
+ "clean": "简洁干净",
+ "business": "商务汇报",
+ "academic": "学术严谨",
+ }
+ supplement = raw.get("supplement_prompt")
+ if isinstance(supplement, list):
+ supplement_prompt = [str(item or "").strip() for item in supplement if str(item or "").strip()]
+ elif str(supplement or "").strip():
+ supplement_prompt = [str(supplement).strip()]
+ else:
+ supplement_prompt = []
+ return {
+ "preset": preset,
+ "label": str(raw.get("label") or labels.get(preset) or preset).strip(),
+ "tone": str(raw.get("tone") or "").strip(),
+ "visual_style": str(raw.get("visual_style") or "").strip(),
+ "audience_assumption": str(raw.get("audience_assumption") or "").strip(),
+ "supplement_prompt": supplement_prompt,
+ }
+
+ def _build_outline_chat_intent_summary(
+ self,
+ *,
+ message: str,
+ active_slide_index: Optional[int],
+ ) -> Dict[str, Any]:
+ text = str(message or "").strip()
+ mode = "style" if self._is_ppt_style_only_message(
+ message=text,
+ intent_summary={"mode": "none"},
+ active_slide_index=active_slide_index,
+ ) else "outline"
+ slide_targets = []
+ if active_slide_index is not None:
+ slide_targets.append({"index": active_slide_index})
+ return {
+ "mode": mode,
+ "message": text,
+ "global_directives": [],
+ "slide_targets": slide_targets,
+ }
+
+ def _is_ppt_style_only_message(
+ self,
+ *,
+ message: str,
+ intent_summary: Optional[Dict[str, Any]],
+ active_slide_index: Optional[int],
+ ) -> bool:
+ if active_slide_index is not None:
+ return False
+ mode = str((intent_summary or {}).get("mode") or "").strip()
+ if mode == "style":
+ return True
+ text = str(message or "").strip()
+ if not text:
+ return False
+ style_words = ("风格", "商务", "配色", "视觉", "语气", "版式", "style")
+ outline_words = ("第", "页", "标题", "要点", "大纲", "顺序", "删除", "新增", "拆成", "合并")
+ return any(word in text for word in style_words) and not any(word in text for word in outline_words)
+
+ def _active_outline_chat_session(self, item: Dict[str, Any]) -> Dict[str, Any]:
+ sessions = item.get("outline_chat_sessions")
+ if not isinstance(sessions, list):
+ sessions = []
+ item["outline_chat_sessions"] = sessions
+ active_id = str(item.get("outline_chat_active_session_id") or "").strip()
+ for session in sessions:
+ if isinstance(session, dict) and session.get("id") == active_id:
+ return session
+ for session in sessions:
+ if isinstance(session, dict) and session.get("status") == "active":
+ item["outline_chat_active_session_id"] = session.get("id")
+ return session
+ now = self._now()
+ session = {
+ "id": f"session_{uuid4().hex}",
+ "status": "active",
+ "messages": [],
+ "draft_outline": self._normalize_ppt_outline(item.get("outline") or []),
+ "draft_output_info": item.get("output_info"),
+ "draft_style_info": item.get("style_info"),
+ "has_pending_changes": False,
+ "created_at": now,
+ "updated_at": now,
+ }
+ sessions.append(session)
+ item["outline_chat_active_session_id"] = session["id"]
+ return session
+
+ def _sync_outline_chat_state(self, item: Dict[str, Any]) -> tuple[Dict[str, Any], bool]:
+ changed = False
+ outline = self._normalize_ppt_outline(item.get("outline") or [])
+ if item.get("outline") != outline:
+ item["outline"] = outline
+ changed = True
+
+ output_info = self._normalize_ppt_output_info(item.get("output_info"), item=item)
+ if item.get("output_info") != output_info:
+ item["output_info"] = output_info
+ changed = True
+
+ style_info = self._normalize_ppt_style_info(item.get("style_info"))
+ supplement = list(style_info.get("supplement_prompt") or [])
+ guidance_text = str(item.get("guidance_snapshot_text") or "").strip()
+ if guidance_text and guidance_text not in supplement:
+ supplement.append(guidance_text)
+ directives = item.get("outline_global_directives") if isinstance(item.get("outline_global_directives"), list) else []
+ for directive in directives:
+ if not isinstance(directive, dict):
+ continue
+ text = str(directive.get("instruction") or directive.get("label") or "").strip()
+ if text and text not in supplement:
+ supplement.append(text)
+ if guidance_text or directives:
+ joined = " ".join(supplement)
+ if "商务" in joined and style_info.get("preset") == "clean":
+ style_info["preset"] = "business"
+ style_info["label"] = "商务汇报"
+ style_info["supplement_prompt"] = supplement
+ if item.get("style_info") != style_info:
+ item["style_info"] = style_info
+ changed = True
+ if directives:
+ item["outline_global_directives"] = []
+ item["outline_chat_draft_global_directives"] = []
+ changed = True
+ return item, changed
+
+ async def _apply_outline_chat(
+ self,
+ *,
+ item: Dict[str, Any],
+ outline: List[Dict[str, Any]],
+ output_info: Dict[str, Any],
+ style_info: Dict[str, Any],
+ global_directives: List[Dict[str, Any]],
+ intent_summary: Dict[str, Any],
+ history: List[Dict[str, Any]],
+ conversation_history: Optional[List[Dict[str, Any]]],
+ context_snapshot: Optional[Dict[str, Any]],
+ message: str,
+ active_slide_index: Optional[int],
+ email: str,
+ api_url: Optional[str],
+ api_key: Optional[str],
+ model: Optional[str],
+ ) -> Dict[str, Any]:
+ if self._is_ppt_style_only_message(
+ message=message,
+ intent_summary=intent_summary,
+ active_slide_index=active_slide_index,
+ ):
+ draft_style = self._normalize_ppt_style_info(style_info)
+ text = str(message or "").strip()
+ if "商务" in text:
+ draft_style.update({
+ "preset": "business",
+ "label": "商务汇报",
+ "tone": draft_style.get("tone") or "简洁、清晰、结论先行",
+ "visual_style": draft_style.get("visual_style") or "浅色背景、少量强调色、图文平衡",
+ })
+ supplement = list(draft_style.get("supplement_prompt") or [])
+ if text and text not in supplement:
+ supplement.append(text)
+ draft_style["supplement_prompt"] = supplement
+ return {
+ "outline": outline,
+ "draft_output_info": output_info,
+ "draft_style_info": draft_style,
+ "draft_global_directives": [],
+ "assistant_message": "已整理为风格信息候选修改,确认后会应用到 PPT 产出信息。",
+ "applied_scope": "style",
+ "applied_slide_index": None,
+ "change_summary": "更新风格信息",
+ "intent_summary": intent_summary,
+ }
+
+ return {
+ "outline": outline,
+ "draft_output_info": output_info,
+ "draft_style_info": style_info,
+ "draft_global_directives": global_directives,
+ "assistant_message": "已整理为候选大纲修改,确认后会应用到 PPT 产出文档。",
+ "applied_scope": "outline",
+ "applied_slide_index": active_slide_index,
+ "change_summary": "生成候选大纲",
+ "intent_summary": intent_summary,
+ }
+
+ async def outline_chat(
+ self,
+ *,
+ notebook_id: str,
+ notebook_title: str,
+ user_id: str,
+ email: str,
+ output_id: str,
+ message: str,
+ active_slide_index: Optional[int],
+ conversation_history: Optional[List[Dict[str, Any]]],
+ api_url: Optional[str],
+ api_key: Optional[str],
+ model: Optional[str],
+ ) -> tuple[Dict[str, Any], str, str, Optional[int], str, Dict[str, Any]]:
+ manifest_path = self._manifest_path(notebook_id, notebook_title, user_id)
+ manifest = self._read_manifest(manifest_path)
+ index, item = self._find_output(manifest, output_id)
+ item, _ = self._sync_outline_chat_state(item)
+ outline = self._normalize_ppt_outline(item.get("outline") or [])
+ output_info = self._normalize_ppt_output_info(item.get("output_info"), item=item)
+ style_info = self._normalize_ppt_style_info(item.get("style_info"))
+ intent_summary = self._build_outline_chat_intent_summary(
+ message=message,
+ active_slide_index=active_slide_index,
+ )
+ session = self._active_outline_chat_session(item)
+ messages = session.setdefault("messages", [])
+ now = self._now()
+ messages.append({
+ "id": f"message_{uuid4().hex}",
+ "role": "user",
+ "content": str(message or "").strip(),
+ "created_at": now,
+ })
+ mutation = await self._apply_outline_chat(
+ item=item,
+ outline=outline,
+ output_info=output_info,
+ style_info=style_info,
+ global_directives=[],
+ intent_summary=intent_summary,
+ history=messages,
+ conversation_history=None,
+ context_snapshot=None,
+ message=message,
+ active_slide_index=active_slide_index,
+ email=email,
+ api_url=api_url,
+ api_key=api_key,
+ model=model,
+ )
+ draft_outline = self._normalize_ppt_outline(mutation.get("outline") or outline)
+ draft_output_info = self._normalize_ppt_output_info(mutation.get("draft_output_info"), item=item)
+ draft_style_info = self._normalize_ppt_style_info(mutation.get("draft_style_info") or style_info)
+ draft_global_directives = list(mutation.get("draft_global_directives") or [])
+ assistant_message = str(mutation.get("assistant_message") or "已生成候选修改。")
+ messages.append({
+ "id": f"message_{uuid4().hex}",
+ "role": "assistant",
+ "content": assistant_message,
+ "created_at": self._now(),
+ })
+ session.update({
+ "draft_outline": draft_outline,
+ "draft_output_info": draft_output_info,
+ "draft_style_info": draft_style_info,
+ "draft_global_directives": draft_global_directives,
+ "has_pending_changes": True,
+ "intent_summary": mutation.get("intent_summary") or intent_summary,
+ "updated_at": self._now(),
+ })
+ item.update({
+ "outline_chat_draft_outline": draft_outline,
+ "outline_chat_draft_output_info": draft_output_info,
+ "outline_chat_draft_style_info": draft_style_info,
+ "outline_chat_draft_global_directives": draft_global_directives,
+ "outline_chat_has_pending_changes": True,
+ "updated_at": self._now(),
+ })
+ manifest[index] = item
+ self._write_manifest(manifest_path, manifest)
+ return (
+ item,
+ assistant_message,
+ str(mutation.get("applied_scope") or "outline"),
+ mutation.get("applied_slide_index"),
+ str(mutation.get("change_summary") or ""),
+ mutation.get("intent_summary") or intent_summary,
+ )
+
+ async def apply_outline_chat(
+ self,
+ *,
+ notebook_id: str,
+ notebook_title: str,
+ user_id: str,
+ output_id: str,
+ ) -> tuple[Dict[str, Any], str]:
+ manifest_path = self._manifest_path(notebook_id, notebook_title, user_id)
+ manifest = self._read_manifest(manifest_path)
+ index, item = self._find_output(manifest, output_id)
+ item, _ = self._sync_outline_chat_state(item)
+ session = self._active_outline_chat_session(item)
+ draft_outline = self._normalize_ppt_outline(session.get("draft_outline") or item.get("outline") or [])
+ draft_output_info = self._normalize_ppt_output_info(session.get("draft_output_info"), item=item)
+ draft_style_info = self._normalize_ppt_style_info(session.get("draft_style_info") or item.get("style_info"))
+ item["outline"] = draft_outline
+ item["output_info"] = draft_output_info
+ item["style_info"] = draft_style_info
+ item["outline_global_directives"] = []
+ item["outline_chat_draft_outline"] = draft_outline
+ item["outline_chat_draft_output_info"] = draft_output_info
+ item["outline_chat_draft_style_info"] = draft_style_info
+ item["outline_chat_draft_global_directives"] = []
+ item["outline_chat_has_pending_changes"] = False
+ session["draft_outline"] = draft_outline
+ session["draft_output_info"] = draft_output_info
+ session["draft_style_info"] = draft_style_info
+ session["draft_global_directives"] = []
+ session["has_pending_changes"] = False
+ session.setdefault("messages", []).append({
+ "id": f"message_{uuid4().hex}",
+ "role": "system",
+ "content": "已应用候选修改到正式大纲。",
+ "created_at": self._now(),
+ })
+ session["updated_at"] = self._now()
+ item["updated_at"] = self._now()
+ manifest[index] = item
+ self._write_manifest(manifest_path, manifest)
+ return item, "已应用候选修改到正式大纲。"
+
+ def discard_outline_chat(
+ self,
+ *,
+ notebook_id: str,
+ notebook_title: str,
+ user_id: str,
+ output_id: str,
+ ) -> tuple[Dict[str, Any], str]:
+ manifest_path = self._manifest_path(notebook_id, notebook_title, user_id)
+ manifest = self._read_manifest(manifest_path)
+ index, item = self._find_output(manifest, output_id)
+ item, _ = self._sync_outline_chat_state(item)
+ confirmed_outline = self._normalize_ppt_outline(item.get("outline") or [])
+ confirmed_output_info = self._normalize_ppt_output_info(item.get("output_info"), item=item)
+ confirmed_style_info = self._normalize_ppt_style_info(item.get("style_info"))
+ session = self._active_outline_chat_session(item)
+ session["draft_outline"] = confirmed_outline
+ session["draft_output_info"] = confirmed_output_info
+ session["draft_style_info"] = confirmed_style_info
+ session["draft_global_directives"] = []
+ session["has_pending_changes"] = False
+ session.setdefault("messages", []).append({
+ "id": f"message_{uuid4().hex}",
+ "role": "system",
+ "content": "已放弃上一版候选大纲,继续基于当前正式大纲讨论。",
+ "created_at": self._now(),
+ })
+ session["updated_at"] = self._now()
+ item["outline_chat_draft_outline"] = confirmed_outline
+ item["outline_chat_draft_output_info"] = confirmed_output_info
+ item["outline_chat_draft_style_info"] = confirmed_style_info
+ item["outline_chat_draft_global_directives"] = []
+ item["outline_chat_has_pending_changes"] = False
+ item["updated_at"] = self._now()
+ manifest[index] = item
+ self._write_manifest(manifest_path, manifest)
+ return item, "已放弃上一版候选大纲,继续基于当前正式大纲讨论。"
+
def _attach_ppt_page_images_from_disk(
self,
outline: List[Dict[str, Any]],
@@ -1969,6 +2364,7 @@ def save_outline(
outline: List[Dict[str, Any]],
pipeline_stage: Optional[str] = None,
enable_images: Optional[bool] = None,
+ manual_edit_log: Optional[List[Dict[str, Any]]] = None,
) -> Dict[str, Any]:
manifest_path = self._manifest_path(notebook_id, notebook_title, user_id)
manifest = self._read_manifest(manifest_path)
@@ -2021,6 +2417,11 @@ def save_outline(
item["status"] = pipeline_stage
if enable_images is not None:
item["enable_images"] = bool(enable_images)
+ if manual_edit_log:
+ existing_log = item.get("manual_edit_log")
+ if not isinstance(existing_log, list):
+ existing_log = []
+ item["manual_edit_log"] = [*existing_log, *manual_edit_log]
if deck in ("ppt", "video"):
if should_reset_deck_state:
item = self._reset_ppt_generation_state(item)
@@ -2224,7 +2625,7 @@ async def _generate_via_existing_endpoint(
generate_quiz,
)
- payload_model = model or settings.KB_CHAT_MODEL
+ payload_model = model or settings.KB_CHAT_MODEL or "deepseek-v3.2"
if target_type == "mindmap":
return await generate_mindmap_from_kb(
file_paths=[str(md_path)],
@@ -2235,6 +2636,8 @@ async def _generate_via_existing_endpoint(
api_url=api_url,
api_key=api_key,
model=payload_model,
+ max_depth=10,
+ language="zh",
)
if target_type == "podcast":
return await generate_podcast_from_kb(
diff --git a/requirements-base.txt b/requirements-base.txt
index 1919501..2d3281d 100644
--- a/requirements-base.txt
+++ b/requirements-base.txt
@@ -1,6 +1,6 @@
# ============================================
# Open-NotebookLM - 核心依赖
-# 环境: conda env thinkflow (Python 3.11)
+# 环境: Python 3.11 / 3.12
# ============================================
# ------ Core Framework ------
@@ -17,11 +17,11 @@ supabase>=2.0.0
# ------ LLM / LangChain / LangGraph ------
openai>=1.50.0
langchain==0.3.27
-langchain-openai>=0.3.0
-langchain-chroma>=0.2.0
-langchain-community>=0.3.0
-langchain-text-splitters>=0.3.0
-langgraph>=0.3.0
+langchain-openai>=0.3.35,<0.4.0
+langchain-chroma>=0.2.6,<0.3.0
+langchain-community>=0.3.31,<0.4.0
+langchain-text-splitters>=0.3.11,<0.4.0
+langgraph>=1.0.1,<1.1.0
tiktoken>=0.7.0
# ------ Vector Store / Embedding ------
@@ -49,7 +49,7 @@ moviepy>=1.0.3,<2.0
sqlalchemy>=2.0.0
sqlmodel>=0.0.21
duckdb>=1.1.0
-pandas>=2.2.0
+pandas>=2.2.0,<4.0.0
# ------ Web Scraping ------
beautifulsoup4>=4.12.0
diff --git a/requirements-dev.txt b/requirements-dev.txt
new file mode 100644
index 0000000..f431f61
--- /dev/null
+++ b/requirements-dev.txt
@@ -0,0 +1,5 @@
+# Development and test dependencies.
+-r requirements-base.txt
+
+pytest>=8.0.0
+pytest-asyncio>=0.23.0
diff --git a/requirements.txt b/requirements.txt
new file mode 100644
index 0000000..17bf728
--- /dev/null
+++ b/requirements.txt
@@ -0,0 +1,3 @@
+# Standard Python dependency entrypoint.
+# Keep the full backend dependency list in requirements-base.txt.
+-r requirements-base.txt
diff --git a/scripts/local_embedding_server.py b/scripts/local_embedding_server.py
new file mode 100644
index 0000000..9fe4d47
--- /dev/null
+++ b/scripts/local_embedding_server.py
@@ -0,0 +1,69 @@
+import os
+from typing import List, Union
+
+import torch
+import torch.nn.functional as F
+from fastapi import FastAPI
+from pydantic import BaseModel
+from transformers import AutoModel, AutoTokenizer
+
+
+MODEL_PATH = os.environ.get("EMBEDDING_MODEL", "/root/user/ldh/models/Qwen3-Embedding-0.6B")
+MAX_LENGTH = int(os.environ.get("EMBEDDING_MAX_LENGTH", "1024"))
+
+app = FastAPI()
+
+tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
+model = AutoModel.from_pretrained(
+ MODEL_PATH,
+ trust_remote_code=True,
+ torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
+)
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+model.to(device)
+model.eval()
+
+
+class EmbeddingRequest(BaseModel):
+ input: Union[str, List[str]]
+ model: str | None = None
+
+
+def _last_token_pool(last_hidden_states: torch.Tensor, attention_mask: torch.Tensor) -> torch.Tensor:
+ sequence_lengths = attention_mask.sum(dim=1) - 1
+ batch_size = last_hidden_states.shape[0]
+ return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]
+
+
+@app.get("/health")
+def health():
+ return {"status": "ok", "model": MODEL_PATH}
+
+
+@app.post("/v1/embeddings")
+def embeddings(request: EmbeddingRequest):
+ texts = request.input if isinstance(request.input, list) else [request.input]
+ encoded = tokenizer(
+ texts,
+ padding=True,
+ truncation=True,
+ max_length=MAX_LENGTH,
+ return_tensors="pt",
+ )
+ encoded = {key: value.to(device) for key, value in encoded.items()}
+
+ with torch.inference_mode():
+ output = model(**encoded)
+ pooled = _last_token_pool(output.last_hidden_state, encoded["attention_mask"])
+ pooled = F.normalize(pooled, p=2, dim=1)
+
+ vectors = pooled.float().cpu().tolist()
+ return {
+ "object": "list",
+ "data": [
+ {"object": "embedding", "index": index, "embedding": vector}
+ for index, vector in enumerate(vectors)
+ ],
+ "model": request.model or MODEL_PATH,
+ "usage": {"prompt_tokens": 0, "total_tokens": 0},
+ }
diff --git a/scripts/start_embedding_4b.sh b/scripts/start_embedding_4b.sh
index 4a35300..a5fdc71 100755
--- a/scripts/start_embedding_4b.sh
+++ b/scripts/start_embedding_4b.sh
@@ -1,21 +1,23 @@
#!/usr/bin/env bash
-# 启动 Octen-Embedding-4B vLLM OpenAI API 服务
+# 启动本地 OpenAI-compatible embedding API 服务
# 用法: bash start_4b.sh [port] [gpu_mem]
# 示例: bash start_4b.sh 8899 0.8
PORT="${1:-8899}"
GPU_MEM="${2:-0.8}"
-MODEL_PATH="/root/models/Octen-Embedding-4B"
+MODEL_PATH="${EMBEDDING_MODEL:-/root/user/ldh/models/Qwen3-Embedding-0.6B}"
+EMBEDDING_PYTHON_BIN="${EMBEDDING_PYTHON_BIN:-/opt/conda/bin/python}"
export TORCHDYNAMO_DISABLE=1
export PADDLE_PDX_DISABLE_MODEL_SOURCE_CHECK=True
export HF_ENDPOINT=https://hf-mirror.com
+export EMBEDDING_MODEL="${MODEL_PATH}"
# 检查 config.json 是否已 patch
-ARCH=$(python3 -c "import json; print(json.load(open('${MODEL_PATH}/config.json'))['architectures'][0])")
+ARCH=$("${EMBEDDING_PYTHON_BIN}" -c "import json; print(json.load(open('${MODEL_PATH}/config.json'))['architectures'][0])")
if [ "$ARCH" != "Qwen3ForCausalLM" ]; then
echo "Patching config.json: ${ARCH} -> Qwen3ForCausalLM"
- python3 -c "
+ "${EMBEDDING_PYTHON_BIN}" -c "
import json
cfg = json.load(open('${MODEL_PATH}/config.json'))
cfg['architectures'] = ['Qwen3ForCausalLM']
@@ -24,26 +26,11 @@ json.dump(cfg, open('${MODEL_PATH}/config.json', 'w'), indent=2)
fi
echo "Model: ${MODEL_PATH}"
+echo "Python: ${EMBEDDING_PYTHON_BIN}"
echo "Port: ${PORT}"
echo "GPU mem: ${GPU_MEM}"
echo "---"
-exec python3 -c "
-import torch
-torch.compile = lambda fn=None, *a, **kw: fn if fn else (lambda f: f)
-
-import sys
-sys.argv = [
- 'vllm', 'serve', '${MODEL_PATH}',
- '--task', 'embed',
- '--trust-remote-code',
- '--enforce-eager',
- '--dtype', 'auto',
- '--gpu-memory-utilization', '${GPU_MEM}',
- '--max-model-len', '1024',
- '--host', '0.0.0.0',
- '--port', '${PORT}',
-]
-from vllm.scripts import main
-main()
-"
+exec "${EMBEDDING_PYTHON_BIN}" -m uvicorn scripts.local_embedding_server:app \
+ --host 0.0.0.0 \
+ --port "${PORT}"
diff --git a/workflow_engine/toolkits/multimodaltool/ppt_tool.py b/workflow_engine/toolkits/multimodaltool/ppt_tool.py
index e7e55d1..b0467ff 100644
--- a/workflow_engine/toolkits/multimodaltool/ppt_tool.py
+++ b/workflow_engine/toolkits/multimodaltool/ppt_tool.py
@@ -298,7 +298,7 @@ def images_to_pdf(image_paths: Sequence[str], output_pdf_path: str) -> str:
imgs.append(im)
if not imgs:
raise ValueError("No images for PDF.")
- imgs[0].save(output_pdf_path, save_all=True, append_images=imgs[1:])
+ imgs[0].save(output_pdf_path, format="PDF", save_all=True, append_images=imgs[1:])
return output_pdf_path
diff --git a/workflow_engine/toolkits/multimodaltool/req_tts.py b/workflow_engine/toolkits/multimodaltool/req_tts.py
index 094d2ab..07357f1 100644
--- a/workflow_engine/toolkits/multimodaltool/req_tts.py
+++ b/workflow_engine/toolkits/multimodaltool/req_tts.py
@@ -3,6 +3,7 @@
import wave
import base64
import io
+import asyncio
from typing import Optional, List
import httpx
from workflow_engine.logger import get_logger
@@ -13,6 +14,8 @@
log = get_logger(__name__)
+_RETRYABLE_TTS_STATUS_CODES = {408, 409, 425, 429, 500, 502, 503, 504}
+
# 全局模型缓存
_qwen_native_model = None
@@ -191,13 +194,29 @@ async def _call_dashscope_qwen_tts(
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
}
- log.info(f"[TTS] DashScope Qwen TTS POST {url}")
+ max_attempts = max(1, int(kwargs.get("max_attempts") or 1))
async with httpx.AsyncClient(timeout=httpx.Timeout(timeout), http2=False) as client:
- resp = await client.post(url, headers=headers, json=payload)
- log.info(f"[TTS] dashscope status={resp.status_code}")
- if resp.status_code >= 400:
- log.error(f"[TTS] dashscope body={resp.text[:1000]}")
- resp.raise_for_status()
+ for attempt in range(1, max_attempts + 1):
+ log.info(f"[TTS] DashScope Qwen TTS POST {url} attempt={attempt}/{max_attempts}")
+ resp = await client.post(url, headers=headers, json=payload)
+ log.info(f"[TTS] dashscope status={resp.status_code}")
+ if resp.status_code >= 400:
+ log.error(f"[TTS] dashscope body={resp.text[:1000]}")
+ if resp.status_code in _RETRYABLE_TTS_STATUS_CODES and attempt < max_attempts:
+ retry_after = resp.headers.get("Retry-After")
+ try:
+ delay = float(retry_after) if retry_after else min(2 ** attempt, 20)
+ except (TypeError, ValueError):
+ delay = min(2 ** attempt, 20)
+ log.warning(
+ "[TTS] DashScope temporary error status=%s, retrying in %.1fs",
+ resp.status_code,
+ delay,
+ )
+ await asyncio.sleep(delay)
+ continue
+ resp.raise_for_status()
+ break
data = resp.json()
audio = (data.get("output") or {}).get("audio") or {}
audio_b64 = audio.get("data")
diff --git a/workflow_engine/toolkits/p2vtool/p2v_tool.py b/workflow_engine/toolkits/p2vtool/p2v_tool.py
index 136caa7..95a0960 100644
--- a/workflow_engine/toolkits/p2vtool/p2v_tool.py
+++ b/workflow_engine/toolkits/p2vtool/p2v_tool.py
@@ -817,7 +817,7 @@ def _validate_talking_video_output(
audio_dur = get_audio_length(audio_path)
except Exception as e:
log.warning("[talking-video] cannot get duration for validation: %s", e)
- return False
+ return True
if audio_dur <= 0:
return True
ratio = video_dur / audio_dur
diff --git a/workflow_engine/toolkits/p2vtool/red.png b/workflow_engine/toolkits/p2vtool/red.png
index 960905a..32f848f 100644
Binary files a/workflow_engine/toolkits/p2vtool/red.png and b/workflow_engine/toolkits/p2vtool/red.png differ
diff --git a/workflow_engine/workflow/wf_paper2video.py b/workflow_engine/workflow/wf_paper2video.py
index 6caf47b..18d7382 100644
--- a/workflow_engine/workflow/wf_paper2video.py
+++ b/workflow_engine/workflow/wf_paper2video.py
@@ -13,8 +13,9 @@
import asyncio
import json
import multiprocessing
+import os
import subprocess
-from concurrent.futures import ThreadPoolExecutor
+import time
from pathlib import Path
from workflow_engine.agentroles import create_vlm_agent
@@ -267,9 +268,34 @@ def generate_speech(state: Paper2VideoState) -> Paper2VideoState:
)
)
- max_workers = min(3, len(all_tasks)) if all_tasks else 1
- with ThreadPoolExecutor(max_workers=max_workers) as ex:
- results = list(ex.map(P2V.speech_task_wrapper_with_cloud_tts, all_tasks))
+ try:
+ request_interval = max(
+ 0.0,
+ float(os.getenv("PAPER2VIDEO_TTS_INTERVAL_SECONDS", "1.2")),
+ )
+ except ValueError:
+ request_interval = 1.2
+ results = []
+ for task_index, task in enumerate(all_tasks, start=1):
+ slide_idx, idx, _prompt, out_wav, *_ = task
+ if Path(out_wav).is_file():
+ try:
+ duration = P2V.get_audio_length(out_wav)
+ if duration > 0:
+ log.info(
+ "paper2video_continue: 复用已生成 TTS slide=%s idx=%s wav=%s",
+ slide_idx,
+ idx,
+ out_wav,
+ )
+ results.append((slide_idx, idx, duration, out_wav))
+ continue
+ except Exception as exc:
+ log.warning("paper2video_continue: 已有 TTS wav 不可用,将重新生成 %s: %s", out_wav, exc)
+ Path(out_wav).unlink(missing_ok=True)
+ results.append(P2V.speech_task_wrapper_with_cloud_tts(task))
+ if request_interval and task_index < len(all_tasks):
+ time.sleep(request_interval)
log.info("paper2video_continue: TTS 完成,共 %s 个片段,开始合并每页 wav", len(all_tasks))
organized: dict = {}