Skip to content

add data analysis workflow#13

Open
Anecdote-Q wants to merge 39 commits into
OpenDCAI:mainfrom
Anecdote-Q:qry-dev
Open

add data analysis workflow#13
Anecdote-Q wants to merge 39 commits into
OpenDCAI:mainfrom
Anecdote-Q:qry-dev

Conversation

@Anecdote-Q

Copy link
Copy Markdown

按照那个录屏会议的内容,我应该只要提交后端workflow的代码就行,前端的内容我就没有推上来。
除了workflow部分,我还改了一点多数据上传的代码,可以看情况要不要保留。
请师兄看看有没有什么问题,有问题随时联系我。

Huangdingcheng and others added 30 commits February 10, 2026 10:53
- Add flashcard data models and API endpoint
- Implement LLM-based flashcard generation service
- Create FlashcardGenerator and FlashcardViewer components
- Integrate flashcard feature into NotebookView
- Add development startup scripts
- Update Vite config for remote development

Co-Authored-By: Huangdingcheng <Apollo6662023@outlook.com>
Implement comprehensive Quiz functionality similar to NotebookLM:
- Single-choice questions with 4 options (A/B/C/D)
- Skip functionality for questions
- Statistics tracking (Right/Wrong/Skipped)
- Review Quiz with highlighted correct answers and detailed explanations
- Retake Quiz capability
- LLM-based question generation with quality prompts

Backend changes:
- Add Quiz data models (QuizOption, QuizQuestion, etc.) in schemas.py
- Create quiz_service.py for LLM-based question generation
- Add /generate-quiz API endpoint in kb.py
- Implement JSON parsing with error recovery for truncated responses

Frontend changes:
- Create QuizGenerator component for quiz creation
- Create QuizQuestion component for single question display
- Create QuizResults component with circular progress and statistics
- Create QuizReview component with answer explanations
- Create QuizContainer component as main orchestrator
- Integrate Quiz into NotebookView with Brain icon
- Update types to include 'quiz' in ToolType

Co-Authored-By: Huangdingcheng <Apollo6662023@outlook.com>
feat: Add Quiz feature with LLM-generated questions
…-to-load

Add backend read endpoints (list-flashcard-sets, list-quiz-sets,
get-flashcard-set, get-quiz-set) and wire frontend outputFeed items
to load saved sets from disk, surviving page refresh.

Co-Authored-By: Claude <noreply@anthropic.com>
…/flashcard/quiz)

- Rewrite /outputs disk fallback to scan notebook-centric paths via get_notebook_paths
- Frontend fetchOutputHistory now also calls list-flashcard-sets & list-quiz-sets
- mergeOutputFeeds preserves setId field during dedup
- Pass notebook_title to /outputs for reliable directory resolution

Co-Authored-By: Claude <noreply@anthropic.com>
…tart guide

- Rename Chinese image filenames to English for GitHub compatibility
- Add screenshots for all current features (12 total, collapsible)
- Update feature table: add flashcards, quizzes, web search, deep research
- Rewrite Quick Start with clear API config instructions (LLM/Search/Supabase)
- Add model configuration section (3-layer system)
- Restructure requirements-base.txt with categorized dependencies and version pins

Co-Authored-By: Claude <noreply@anthropic.com>
- Fix introduce modal title (zh): remove outdated audio/video reference
- Add local embedding server (Octen-Embedding-0.6B) and download script
- Add logo assets for both frontends
- Update flashcard/quiz UI styling across en/zh frontends
- Update Dashboard and NotebookView with latest UI improvements
- Add docs guides (context_optimization, kb_source_flow)
- Remove obsolete requirements files (paper, win-base)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
- Fix openai version constraint (>=1.104.2,<2.0.0) to match langchain-openai==0.3.33
- Add missing deps: paddlepaddle, paddleocr, mineru_vl_utils, loguru
- Align requirements-backup.txt versions to notebook env (100 packages updated)

Co-Authored-By: Claude <noreply@anthropic.com>
完整集成 Alibaba DeepResearch 到 Open-NotebookLM 项目,支持双模式研究:
- 完整版 DeepResearch:多轮 ReAct 推理,支持工具调用
- 简化版:快速搜索和内容爬取

主要变更:
- 添加 DeepResearch 核心模块到 fastapi_app/deep_research/
- 实现 MultiTurnReactAgent 支持多轮推理
- 集成 5 个工具:Search, Visit, Python, Scholar, FileParser
- 修改 token 计数使用 tiktoken 替代 HuggingFace tokenizer
- 修复 API 调用支持完整 URL 和正确的 API key 传递
- 修复工具环境变量动态读取,避免模块加载时的静态绑定
- 添加 Jina API 错误处理,防止无限重试
- 实现 DeepResearchIntegration 服务类
- 更新 API 端点支持参数传递和双模式切换
- 减少重试次数从 10 次到 3 次

技术细节:
- 使用 qwen-agent 框架作为基础
- 支持 Serper API(搜索)和 Jina API(网页访问)
- 兼容 OpenAI API 格式的 LLM 服务
- 最大迭代次数可配置(默认 50 次)
- Token 上限 110K,超出自动截断

配置支持:
- 通过 API 请求传递配置参数
- 通过 .env 文件设置默认值
- 支持自定义 LLM API、搜索 API 和网页访问 API

Co-Authored-By: Huangdingcheng <Apollo6662023@outlook.com>
feat: 集成阿里巴巴 DeepResearch 深度研究功能
fix: resolve frontend and backend default port conflict
…v3.2

Merged PR OpenDCAI#6 and corrected port references across all docs and configs
to use port 8000. Reverted default LLM model change from gpt-4o back
to deepseek-v3.2 in frontend_zh.

Co-Authored-By: Claude <noreply@anthropic.com>
…ana 2 support

- Embedding server: auto-pick free GPU via nvidia-smi, fallback to CPU
- Main app: skip re-launching embedding subprocess on --reload if already running
- DeepResearch: add missing deps (qwen-agent, alibabacloud-docmind, sandbox-fusion, etc.)
- DeepResearch: set API_KEY/API_BASE/SUMMARY_MODEL_NAME env vars for visit tool
- Frontend: always send search_api_key for all providers (including Serper)
- Frontend: show Search API Key input for Serper provider in settings
- PPT generation: add Nano Banana 2 image model option
- Docs: remove invalid `pip install -e .`, add changelog

Co-Authored-By: Claude <noreply@anthropic.com>
- Implement block-based editor with drag-and-drop support
- Add slash command menu with multiple block types (text, headings, lists, code, etc.)
- Integrate AI chat feature for in-note assistance
- Support text auto-wrap and Enter key text splitting
- Add block operations (add, delete, type conversion)
- Implement both English and Chinese versions

Co-Authored-By: Huangdingcheng <Apollo6662023@outlook.com>
Major Features:
- AI-assisted note editing: polish, rewrite, and source-based AI generation
- AI Panel: organize content with presets (summarize, outline, FAQ, etc.)
- Text Selection Toolbar: quick AI actions on selected text with diff preview
- Memory context: maintain conversation history for better AI responses

Bug Fixes:
- Fix reversed input issue by replacing contentEditable with textarea
- Fix first character deletion issue by consolidating event handling
- Fix numbered list indexing for mixed block types
- Simplify Backspace logic to avoid conflicts

Technical Improvements:
- Consolidate onChange/onInput handling to prevent event conflicts
- Use textarea.value consistently instead of mixed value/textContent
- Support both English and Chinese versions

Co-Authored-By: Hunagdingcheng <Apollo6662023@outlook.com>
feat: add Notion-style block editor for notes
- Phase 1: Split env config into .env and .env.models
- Phase 2: Remove 19 unused workflows, keep only 3 active
- Phase 3: Remove unused TTS adapters, keep Qwen and FireRed
- Update settings.py to load both env files
- Add TTS config fields to AppSettings
- Refactored 3 workflows (intelligent_qa, mindmap, podcast) to use file_ids instead of file paths
- Created features/ directory structure for better code organization
- Added ProcessedDataLoader for unified access to processed files from manifest
- Enhanced manifest structure to support multiple parsers (mineru, sam3, ocr, etc.)
- Removed direct file parsing from consumption layer (no fitz.open, Document, Presentation)
- Created shared utilities (extract_text_result, ProcessedDataLoader)
- Updated State definitions to use file_ids and vector_store_base_dir
- Extended workflow registration to auto-discover features/ directory

Architecture:
- Ingestion layer: Parses files once using tools (MinerU, SAM3, etc.), stores in manifest
- Consumption layer: Workflows read processed data via ProcessedDataLoader, never re-parse

Co-Authored-By: Claude <noreply@anthropic.com>
…code

- Renamed dataflow_agent/ to workflow_engine/
- Updated all import paths from dataflow_agent to workflow_engine
- Deleted 34 unused agent files from paper2any_agents/
- Deleted unused toolkits (drawio_tools, dockertool, image2drawio, p2vtool, etc.)
- Deleted unused prompt templates (kept only pt_qa_agent_repo.py)
- Deleted unused directories (states, storage, trajectory, templates, resources)
- Removed common_agents and infra_agents directories
- Cleaned up agentroles/__init__.py exports

Result: Reduced codebase by ~60%, kept only 3 active workflows and their dependencies

Co-Authored-By: Claude <noreply@anthropic.com>
主要修改:
1. 前端 AIPanel 改进
   - 添加 notebook prop 传递,支持笔记本级别的 RAG 检索
   - 默认自动选中所有来源文件,提升用户体验
   - 添加详细的前端调试日志

2. 后端 /api/v1/kb/chat 端点修复
   - 添加 email 和 notebook_id 参数支持
   - 修复 vector_store 路径查找逻辑,支持新的 outputs/{email}/{title}_{id}/vector_store 结构
   - 自动扫描匹配的 notebook 目录
   - 添加详细的后端调试日志

3. workflow_engine/wf_intelligent_qa.py 优化
   - 增强 _try_rag_retrieve 函数的调试日志
   - 改进路径匹配逻辑
   - 支持无文件模式(仅使用 query 上下文)

4. 清理文档和旧代码
   - 删除旧的 workflow_engine/features/ 目录
   - 删除过时的文档和数据库脚本

测试:笔记 AI 现在可以正确检索向量库中的文件片段

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
## 设计系统改造
- 实现 Editorial Workspace 设计方向(杂志风格、温暖中性色、珊瑚色强调)
- 创建 design-tokens.ts 集中管理设计令牌
- 更新 tailwind.config.js 应用编辑风格主题
- 重写 index.css 实现编辑风格样式

## UI 组件库
新增 src/components/ui/ 组件:
- Button: 4种变体(primary, secondary, ghost, accent)
- Input: 支持标签、错误、图标
- Card: 包含 Header, Content, Footer 子组件
- Modal: 带动画的对话框
- Badge: 状态指示器和标签

## 视觉更新
- 颜色: #007AFF (蓝色) → #F43F5E (珊瑚色)
- 中性色: 纯灰色 → 温暖中性色 (#FAFAF9 - #1C1917)
- 字体: 系统字体 → Newsreader (标题) + Inter (正文)
- 效果: 扁平 → 提升阴影效果(lifted shadow)

## 组件应用
- AIPanel 重新设计使用编辑风格颜色和 Badge 组件
- 应用于 frontend_zh 和 frontend_en

## Bug 修复
- 修复 kb.py Path 重复导入导致的 UnboundLocalError
- 修复 index.css @import 顺序警告

## 文档
- 新增 EDITORIAL_DESIGN_SYSTEM.md 完整设计系统文档

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- 添加 start-services.sh 用于在 tmux 中启动前后端服务
- 添加 stop-services.sh 用于停止所有服务
- 自动激活 conda 环境 szl-dev
- 自动启动 cpolar 并获取公网域名
- 后端端口 8213,前端端口 3001

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
asd765973346 and others added 9 commits March 11, 2026 23:31
- 重构代码:实行严格的功能分层架构
- 修复前端拖拽面板 bug (NotebookView.tsx)
- 优化配置文件:合并 .env 配置,删除冗余模型配置
- 删除未使用的 KB_EMBEDDING_MODEL 配置
- 更新 .gitignore,添加完整的忽略规则
- 更新 README:添加 TTS 配置说明,优化 Supabase 说明
- 清理废弃文件:.env_mock, .env.models, EDITORIAL_DESIGN_SYSTEM.md

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Change frontend dev server port from 3001 to 3000
- Update backend proxy target from 8213 to 8000
- Ensures consistency with documented setup in README
- Change frontend dev server port from 26202 to 3000
- Update backend proxy target from 8213 to 8000
- Ensures consistency with documented setup
- 新增 start-simple.sh: 后台启动服务并显示端口信息
- 优化 start-services.sh: 增加 cpolar 重试逻辑
- 更新 stop-services.sh: 支持停止后台服务

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- 补充 Deep Research 相关依赖(qwen-agent, alibabacloud SDK, sandbox-fusion)
- 添加 TTS 依赖(qwen-tts, fireredtts2)
- 补充数据库客户端依赖(supabase 及相关包)
- 添加 sentence-transformers 用于 embedding
- 同步更新 requirements-base.txt 和 requirements-backup.txt

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- 日志系统:添加 contextvars 支持,自动追踪 request_id 和用户信息
- 中间件:新增 LoggingMiddleware,自动记录所有请求
- 异常处理:统一错误处理,记录详细日志但返回安全消息
- CORS 修复:APIKeyMiddleware 跳过 OPTIONS 预检请求
- 前端配置:支持 VITE_API_BASE_URL 配置后端地址
- 启动脚本:简化为 start.sh 和 stop.sh,自动清理端口
- 文档更新:README 添加前端配置说明

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants