Skip to content

Commit 3308318

Browse files
authored
Merge branch 'OpenDCAI:main' into main
2 parents 2332582 + eba7912 commit 3308318

93 files changed

Lines changed: 475 additions & 20636 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.dockerignore

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Git
2+
.git
3+
.gitignore
4+
5+
# Docker
6+
.dockerignore
7+
8+
# IDE settings
9+
.vscode/
10+
.idea/
11+
12+
# Python artifacts
13+
__pycache__/
14+
*.pyc
15+
*.pyo
16+
*.pyd
17+
*.egg-info/
18+
dist/
19+
build/
20+
*.egg
21+
*.whl
22+
23+
# Virtual environments
24+
.venv
25+
venv/
26+
env/

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
/dataflow/example/ReasoningPipeline/pipeline_math_step*.jsonl
77
!/dataflow/example/ReasoningPipeline/pipeline_math.json
88

9+
.DS_Store
910

1011
dataflow/example/KBCleaningPipeline/raw
1112
requirement_added.txt

Dockerfile

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# Base Image
2+
FROM --platform=linux/amd64 nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04
3+
4+
# Environment variables
5+
ENV DEBIAN_FRONTEND=noninteractive \
6+
LANG=C.UTF-8 LC_ALL=C.UTF-8 \
7+
PYTHONUNBUFFERED=1 \
8+
PIP_NO_CACHE_DIR=1
9+
10+
# System dependencies
11+
RUN apt-get update && apt-get install -y --no-install-recommends \
12+
python3 python3-venv python3-pip python3-dev \
13+
git build-essential pkg-config \
14+
ffmpeg libgl1 libglib2.0-0 \
15+
ca-certificates curl \
16+
&& rm -rf /var/lib/apt/lists/*
17+
18+
# Python environment
19+
RUN python3 -m venv /opt/venv
20+
ENV PATH="/opt/venv/bin:${PATH}"
21+
RUN python -m pip install --upgrade pip wheel
22+
23+
# Setup pip mirror
24+
RUN mkdir -p /etc && \
25+
printf "[global]\nindex-url = https://pypi.tuna.tsinghua.edu.cn/simple\ntimeout = 120\ntrusted-host = pypi.tuna.tsinghua.edu.cn\n" > /etc/pip.conf
26+
27+
# Set up the application directory and copy source code into a subdirectory
28+
# DataFlow Commit b27d6bc24cf86835fda7bc6fe1a289cb9eb63bd2
29+
WORKDIR /app
30+
COPY . ./DataFlow/
31+
32+
# Set the working directory to the project source
33+
WORKDIR /app/DataFlow
34+
35+
# Install the project in editable mode with its dependencies
36+
RUN pip install -e ".[vllm]"
37+
38+
# Set the container's default command to an interactive shell
39+
CMD ["/bin/bash"]

README-zh.md

Lines changed: 29 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -120,45 +120,50 @@ open-dataflow codebase version: 1.0.0
120120
You are using the latest version: 1.0.0.
121121
```
122122

123-
### 🚀 5.2 使用Gradio Web界面
123+
#### 🐳 5.1.1 Docker安装(可选方式)
124124

125-
DataFlow提供了两个交互式Web界面,帮助你使用算子、流水线和智能体:
125+
我们还提供了 **Dockerfile** 以便于部署,同时也提供了**预构建的 Docker 镜像**供您直接使用。
126126

127-
#### 5.2.1 DataFlow算子界面
127+
##### 方式一:使用预构建的 Docker 镜像
128128

129-
启动DataFlow算子界面来测试和可视化所有算子和流水线
129+
您可以直接拉取并使用我们预构建的 Docker 镜像
130130

131-
```bash
132-
dataflow webui
133-
```
131+
```shell
132+
# 拉取预构建镜像
133+
docker pull molyheci/dataflow:cu124
134134

135-
该命令将启动一个交互式 Web 界面,使你能够可视化并灵活使用所有算子和流水线。
135+
# 使用 GPU 支持运行容器
136+
docker run --gpus all -it molyheci/dataflow:cu124
136137

137-
#### 5.2.2 DataFlow智能体界面
138+
# 在容器内验证安装
139+
dataflow -v
140+
```
138141

139-
启动DataFlow智能体界面进行算子编写和流水线设计:
142+
##### 方式二:从 Dockerfile 构建
140143

141-
```bash
142-
dataflow webui agent
143-
```
144+
或者,您也可以从项目提供的 Dockerfile 构建镜像:
144145

145-
该命令将启动 DataFlow-Agent 界面,提供自动化算子编写功能和流水线推荐服务。
146+
```shell
147+
# 克隆代码仓库(HTTPS 方式)
148+
git clone https://github.com/OpenDCAI/DataFlow.git
149+
# 或使用 SSH 方式
150+
# git clone git@github.com:OpenDCAI/DataFlow.git
146151

147-
https://github.com/user-attachments/assets/5c6aa003-9504-4e2a-9f4e-97bae739894a
152+
cd DataFlow
148153

149-
### 🌐 5.3 ADP智能数据平台
154+
# 构建 Docker 镜像
155+
docker build -t dataflow:custom .
150156

151-
除了本地Gradio界面,DataFlow还提供了基于Web的ADP智能数据平台:[https://adp.originhub.tech/login](https://adp.originhub.tech/login)
157+
# 运行容器
158+
docker run --gpus all -it dataflow:custom
152159

153-
ADP是OriginHub推出的智能数据平台,具备四大核心能力:DataFlow数据准备全流程自动化、融合大规模多模态知识库的知识系统、多Agent协同的智能协作,以及支撑数据全链路管理的AI数据库,旨在加速企业通过AI能力充分发挥独有数据的价值。
160+
# 在容器内验证安装
161+
dataflow -v
162+
```
154163

155-
<p align="center">
156-
<a href="https://adp.originhub.tech/login">
157-
<img src="https://github.com/user-attachments/assets/c63ac954-f0c8-4a1a-bfc8-5752c25a22cf" alt="ADP Platform Interface" width="75%">
158-
</a>
159-
</p>
164+
> **注意**:Docker 镜像包含 CUDA 12.4.1 支持,并预装了 vLLM 用于 GPU 加速。请确保您已安装 [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) 以使用 GPU 功能。
160165
161-
### 📖 5.4 参考DataFlow项目文档
166+
### 📖 5.2 参考DataFlow项目文档
162167

163168
详细**使用说明****入门指南**,请参考我们的 [项目文档](https://OpenDCAI.github.io/DataFlow-Doc/)
164169

README.md

Lines changed: 28 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -129,57 +129,50 @@ open-dataflow codebase version: 1.0.0
129129
You are using the latest version: 1.0.0.
130130
```
131131

132-
### 🚀 5.2 Using the Gradio Web Interface
132+
#### 🐳 5.1.1 Docker Installation (Alternative)
133133

134-
DataFlow provides two interactive web interfaces to help you use operators, pipelines, and agents:
134+
We also provide a **Dockerfile** for easy deployment and a **pre-built Docker image** for immediate use.
135135

136-
#### 5.2.1 DataFlow Operators Interface
136+
##### Option 1: Use Pre-built Docker Image
137137

138-
Launch the DataFlow operator interface to test and visualize all operators and pipelines:
138+
You can directly pull and use our pre-built Docker image:
139139

140-
```bash
141-
dataflow webui
142-
```
143-
144-
This command will start an interactive web interface, allowing you to visualize and flexibly use all operators and pipelines.
145-
146-
#### 5.2.2 DataFlow Agent Interface
140+
```shell
141+
# Pull the pre-built image
142+
docker pull molyheci/dataflow:cu124
147143

148-
Launch the DataFlow agent interface for operator authoring and pipeline design:
144+
# Run the container with GPU support
145+
docker run --gpus all -it molyheci/dataflow:cu124
149146

150-
```bash
151-
dataflow webui agent
147+
# Inside the container, verify installation
148+
dataflow -v
152149
```
153150

154-
This command will start the DataFlow-Agent interface, providing automated operator authoring and pipeline recommendation services.
155-
156-
https://github.com/user-attachments/assets/fda1ad47-a9f3-447a-b5c0-cf4c9ad64763
151+
##### Option 2: Build from Dockerfile
157152

158-
### 🌐 5.3 ADP Intelligent Data Platform
153+
Alternatively, you can build the Docker image from the provided Dockerfile:
159154

160-
Beyond the local Gradio interface, **DataFlow** is also available as a fully-managed SaaS solution on the **ADP Intelligent Data Platform**.
161-
162-
[**ADP**](https://adp.originhub.tech) is an end-to-end system by OriginHub, designed to help enterprises accelerate the development of custom Agents and Models by integrating Large Language Models (LLMs) with private data.
163-
164-
#### Core Capabilities:
155+
```shell
156+
# Clone the repository (HTTPS)
157+
git clone https://github.com/OpenDCAI/DataFlow.git
158+
# Or use SSH
159+
# git clone git@github.com:OpenDCAI/DataFlow.git
165160

166-
* 🤖 **Automated Data Preparation**: Leverage DataFlow for full-process automation of your data workflows.
167-
* 📚 **Unified Knowledge System**: Integrate and manage large-scale, multimodal knowledge bases.
168-
* 🤝 **Intelligent Collaboration**: Build and orchestrate powerful multi-agent systems.
169-
* 🗄️ **AI-Native Database**: Manage the full lifecycle of your multimodal data with a purpose-built AI database.
161+
cd DataFlow
170162

171-
<p align="center">
172-
<a href="https://adp.originhub.tech/login">
173-
<img src="https://github.com/user-attachments/assets/c63ac954-f0c8-4a1a-bfc8-5752c25a22cf" alt="ADP Platform Interface" width="75%">
174-
</a>
175-
</p>
163+
# Build the Docker image
164+
docker build -t dataflow:custom .
176165

177-
#### Get Started for Free
166+
# Run the container
167+
docker run --gpus all -it dataflow:custom
178168

169+
# Inside the container, verify installation
170+
dataflow -v
171+
```
179172

180-
👉 **[Sign up now to claim your free compute credits!](https://adp.originhub.tech)**
173+
> **Note**: The Docker image includes CUDA 12.4.1 support and comes with vLLM pre-installed for GPU acceleration. Make sure you have [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) installed to use GPU features.
181174
182-
### 📖 5.4 Reference Project Documentation
175+
### 📖 5.2 Reference Project Documentation
183176

184177
For detailed **usage instructions** and **getting started guide**, please visit our [Documentation](https://OpenDCAI.github.io/DataFlow-Doc/).
185178

dataflow/agent/__init__.py

Whitespace-only changes.

dataflow/agent/agentrole/__init__.py

Lines changed: 0 additions & 9 deletions
This file was deleted.

dataflow/agent/agentrole/analyst.py

Lines changed: 0 additions & 123 deletions
This file was deleted.

0 commit comments

Comments
 (0)