Skip to content

Commit a7679a3

Browse files
committed
update github action
1 parent b18fe2c commit a7679a3

3 files changed

Lines changed: 52 additions & 134 deletions

File tree

.github/workflows/test.yml

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -32,12 +32,17 @@ jobs:
3232
gosu postgres /usr/lib/postgresql/17/bin/initdb -D /var/lib/postgresql/data
3333
gosu postgres /usr/lib/postgresql/17/bin/pg_ctl -D /var/lib/postgresql/data start -w
3434
35+
- name: Start Mock Embedding Server
36+
run: |
37+
python3 test/mock_embedding_server.py 8080 &
38+
sleep 2
39+
echo "Mock embedding server started on port 8080"
40+
3541
- name: Setup test environment
36-
env:
37-
SILICONFLOW_API_KEY: ${{ secrets.SILICONFLOW_API_KEY }}
3842
run: |
3943
cp test/.env.example test/.env
40-
sed -i "s/your-api-key/$SILICONFLOW_API_KEY/g" test/.env
44+
sed -i "s|https://api.siliconflow.cn/v1/embeddings|http://localhost:8080/v1/embeddings|g" test/.env
45+
sed -i "s/your-api-key/mock-api-key/g" test/.env
4146
echo "Test environment configured"
4247
4348
- name: Run test script

README.md

Lines changed: 16 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -20,13 +20,15 @@ PostgreSQL extension for automatic vector embedding using external embedding ser
2020
## Prerequisites
2121

2222
- PostgreSQL 9.5+ with `vector` extension
23-
- `http` extension
24-
- `pg_background` extension
25-
- `pgTAP` extension (for testing)
23+
- [http](https://github.com/pramsey/pgsql-http) extension
24+
- [pg_background](https://github.com/vibhorkum/pg_background) extension
25+
- [pgTAP](https://pgtap.org/documentation.html) extension (for testing)
2626

2727
## Installation
2828

2929
```bash
30+
git clone https://github.com/hank-cp/pg_vector_embedding.git
31+
cd pg_vector_embedding
3032
make
3133
sudo make install
3234
```
@@ -95,36 +97,8 @@ LIMIT 10;
9597
SELECT ve_disable('public', 'documents');
9698
```
9799

98-
## Functions
99-
100-
### Configuration
101-
102-
- `ve_config(key TEXT) RETURNS TEXT` - Get configuration value from database settings
103-
104-
### Table Management
105-
106-
- `ve_enable(schema TEXT, table TEXT, info_columns TEXT[], vector_column TEXT)` - Register table for auto-embedding
107-
- `ve_disable(schema TEXT, table TEXT)` - Unregister table
108-
109-
### Embedding
110-
111-
- `ve_compute_embedding(text TEXT) RETURNS VECTOR` - Compute embedding synchronously
112-
- `ve_compact_row_data(record ANYELEMENT, columns TEXT[]) RETURNS JSONB` - Extract specified columns to JSON
113-
- `ve_process_embedding(params JSONB)` - Process embedding for a specific record (used internally)
114-
115-
### Internal Functions
116-
117-
- `ve_trigger()` - Trigger function that launches background embedding tasks
118-
119100
## Testing
120101

121-
### Run All Tests
122-
123-
```bash
124-
cd test
125-
./runner.sh
126-
```
127-
128102
### Configure Test Environment
129103

130104
Create `test/.env` file:
@@ -135,6 +109,13 @@ EMBEDDING_API_KEY=your-api-key
135109
EMBEDDING_MODEL=BAAI/bge-m3
136110
```
137111

112+
### Run All Tests
113+
114+
```bash
115+
cd test
116+
./runner.sh
117+
```
118+
138119
### Test Options
139120

140121
```bash
@@ -148,10 +129,9 @@ EMBEDDING_MODEL=BAAI/bge-m3
148129
## Architecture
149130

150131
1. **Trigger-based Detection**: When a registered table is modified, `ve_trigger()` captures the change
151-
2. **Column Extraction**: The trigger extracts configured info columns as JSON using `ve_compact_row_data()`
152-
3. **Background Processing**: A background worker is launched via `pg_background_launch()` to run `ve_process_embedding()`
153-
4. **API Call**: The background task calls the embedding service via `http` extension using `ve_compute_embedding()`
154-
5. **Storage**: The returned vector is saved to the configured vector column
132+
2. **Column Extraction**: The trigger extracts configured info columns as JSON using `ve_compact_row_data()`. Column comments will also be included in the JSON to improve embedding quality.
133+
3. **Background Processing**: A background worker will be launched to process the embedding request and update the vector column.
134+
4. **Storage**: The returned vector is saved to the configured vector column. It could be leveraged in vector similarity search, like RAG.
155135

156136
## Configuration Reference
157137

@@ -220,6 +200,4 @@ LIMIT 5;
220200
2. Ensure table has a primary key (required for tracking records)
221201
3. Check trigger function exists: `\df ve_trigger`
222202

223-
## License
224-
225-
MIT
203+
## [License](LICENSE)

README.zh-CN.md

Lines changed: 28 additions & 93 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,13 @@
22
[![Tests](https://github.com/hank-cp/pg_vector_embedding/actions/workflows/test.yml/badge.svg)](https://github.com/hank-cp/pg_vector_embedding/actions/workflows/test.yml)
33
![GitHub](https://img.shields.io/github/license/hank-cp/pg_vector_embedding.svg)
44
![GitHub last commit](https://img.shields.io/github/last-commit/hank-cp/pg_vector_embedding.svg)
5+
56
# pg_vector_embedding
67

78
PostgreSQL 扩展,用于使用外部嵌入服务自动生成向量嵌入。
89

10+
[English Documentation](README.md)
11+
912
## 特性
1013

1114
- 通过数据库设置进行全局嵌入服务配置
@@ -17,13 +20,15 @@ PostgreSQL 扩展,用于使用外部嵌入服务自动生成向量嵌入。
1720
## 前置要求
1821

1922
- PostgreSQL 9.5+ 及 `vector` 扩展
20-
- `http` 扩展
21-
- `pg_background` 扩展
22-
- `pgTAP` 扩展(用于测试)
23+
- [http](https://github.com/pramsey/pgsql-http) 扩展
24+
- [pg_background](https://github.com/vibhorkum/pg_background) 扩展
25+
- [pgTAP](https://pgtap.org/documentation.html) 扩展(用于测试)
2326

2427
## 安装
2528

2629
```bash
30+
git clone https://github.com/hank-cp/pg_vector_embedding.git
31+
cd pg_vector_embedding
2732
make
2833
sudo make install
2934
```
@@ -40,12 +45,10 @@ CREATE EXTENSION pg_vector_embedding CASCADE;
4045

4146
```sql
4247
-- 设置数据库级别配置
43-
ALTER DATABASE your_database SET pg_vector_embedding.embedding_url = 'https://api.siliconflow.cn/v1/embeddings';
44-
ALTER DATABASE your_database SET pg_vector_embedding.embedding_api_key = 'your-api-key';
45-
ALTER DATABASE your_database SET pg_vector_embedding.embedding_model = 'BAAI/bge-m3';
46-
47-
-- 重新连接以应用配置
48-
\c
48+
ALTER SYSTEM SET pg_vector_embedding.embedding_url = 'https://api.siliconflow.cn/v1/embeddings';
49+
ALTER SYSTEM SET pg_vector_embedding.embedding_api_key = 'your-api-key';
50+
ALTER SYSTEM SET pg_vector_embedding.embedding_model = 'BAAI/bge-m3';
51+
-- 重启 Postgres 以应用设置
4952
```
5053

5154
### 3. 创建带向量列的表
@@ -74,7 +77,7 @@ SELECT ve_enable(
7477

7578
```sql
7679
INSERT INTO documents (title, content)
77-
VALUES ('PostgreSQL 扩展', '学习如何构建强大的 PostgreSQL 扩展');
80+
VALUES ('PostgreSQL Extensions', 'Learn how to build powerful PostgreSQL extensions');
7881
```
7982

8083
嵌入将通过 `pg_background` 异步计算并存储在 `embedding` 列中。
@@ -84,7 +87,7 @@ VALUES ('PostgreSQL 扩展', '学习如何构建强大的 PostgreSQL 扩展');
8487
```sql
8588
-- 为搜索查询计算嵌入
8689
SELECT * FROM documents
87-
ORDER BY embedding <-> ve_compute_embedding('{"title": "PostgreSQL", "content": "扩展"}'::text)
90+
ORDER BY embedding <-> ve_compute_embedding('{"title": "PostgreSQL", "content": "extensions"}'::text)
8891
LIMIT 10;
8992
```
9093

@@ -94,36 +97,8 @@ LIMIT 10;
9497
SELECT ve_disable('public', 'documents');
9598
```
9699

97-
## 函数
98-
99-
### 配置
100-
101-
- `ve_config(key TEXT) RETURNS TEXT` - 从数据库设置获取配置值
102-
103-
### 表管理
104-
105-
- `ve_enable(schema TEXT, table TEXT, info_columns TEXT[], vector_column TEXT)` - 注册表以启用自动嵌入
106-
- `ve_disable(schema TEXT, table TEXT)` - 注销表
107-
108-
### 嵌入
109-
110-
- `ve_compute_embedding(text TEXT) RETURNS VECTOR` - 同步计算嵌入
111-
- `ve_compact_row_data(record ANYELEMENT, columns TEXT[]) RETURNS JSONB` - 提取指定列为 JSON
112-
- `ve_process_embedding(params JSONB)` - 为特定记录处理嵌入(内部使用)
113-
114-
### 内部函数
115-
116-
- `ve_trigger()` - 触发器函数,启动后台嵌入任务
117-
118100
## 测试
119101

120-
### 运行所有测试
121-
122-
```bash
123-
cd test
124-
./runner.sh
125-
```
126-
127102
### 配置测试环境
128103

129104
创建 `test/.env` 文件:
@@ -134,6 +109,13 @@ EMBEDDING_API_KEY=your-api-key
134109
EMBEDDING_MODEL=BAAI/bge-m3
135110
```
136111

112+
### 运行所有测试
113+
114+
```bash
115+
cd test
116+
./runner.sh
117+
```
118+
137119
### 测试选项
138120

139121
```bash
@@ -147,10 +129,9 @@ EMBEDDING_MODEL=BAAI/bge-m3
147129
## 架构
148130

149131
1. **基于触发器的检测**:当注册的表被修改时,`ve_trigger()` 捕获变更
150-
2. **列提取**:触发器使用 `ve_compact_row_data()` 将配置的信息列提取为 JSON
151-
3. **后台处理**:通过 `pg_background_launch()` 启动后台工作进程来运行 `ve_process_embedding()`
152-
4. **API 调用**:后台任务通过 `http` 扩展使用 `ve_compute_embedding()` 调用嵌入服务
153-
5. **存储**:返回的向量保存到配置的向量列中
132+
2. **列提取**:触发器使用 `ve_compact_row_data()` 将配置的信息列提取为 JSON。列注释也将包含在 JSON 中以提高嵌入质量。
133+
3. **后台处理**:将启动后台工作进程来处理嵌入请求并更新向量列。
134+
4. **存储**:返回的向量保存到配置的向量列中。它可以在向量相似性搜索中使用,例如 RAG。
154135

155136
## 配置参考
156137

@@ -187,15 +168,15 @@ SELECT ve_enable('public', 'articles', ARRAY['title', 'content'], 'embedding');
187168

188169
-- 3. 插入数据(嵌入在后台自动计算)
189170
INSERT INTO articles (title, content) VALUES
190-
('PostgreSQL 扩展', '学习如何构建强大的 PostgreSQL 扩展'),
191-
('向量搜索', '使用 pgvector 实现语义搜索');
171+
('PostgreSQL Extensions', 'Learn how to build powerful PostgreSQL extensions'),
172+
('Vector Search', 'Implementing semantic search with pgvector');
192173

193174
-- 4. 等待后台处理(或检查嵌入是否就绪)
194175
SELECT COUNT(*) FROM articles WHERE embedding IS NOT NULL;
195176

196177
-- 5. 执行相似度搜索
197178
WITH search_query AS (
198-
SELECT ve_compute_embedding('{"title": "PostgreSQL", "content": "教程"}'::text) AS query_embedding
179+
SELECT ve_compute_embedding('{"title": "PostgreSQL", "content": "tutorial"}'::text) AS query_embedding
199180
)
200181
SELECT id, title, embedding <-> query_embedding AS distance
201182
FROM articles, search_query
@@ -219,50 +200,4 @@ LIMIT 5;
219200
2. 确保表有主键(用于跟踪记录时必需)
220201
3. 检查触发器函数是否存在:`\df ve_trigger`
221202

222-
## 特殊功能
223-
224-
### 列注释支持
225-
226-
`ve_compact_row_data()` 函数会自动提取列注释并添加到嵌入内容中,以提高嵌入准确性:
227-
228-
```sql
229-
CREATE TABLE products (
230-
id SERIAL PRIMARY KEY,
231-
name TEXT,
232-
description TEXT,
233-
embedding VECTOR(1024)
234-
);
235-
236-
COMMENT ON COLUMN products.name IS '产品名称';
237-
COMMENT ON COLUMN products.description IS '产品描述';
238-
239-
SELECT ve_enable('public', 'products', ARRAY['name', 'description'], 'embedding');
240-
241-
-- 插入数据时,生成的 JSON 会包含列注释:
242-
-- {"name": "产品名称: 笔记本电脑", "description": "产品描述: 高性能办公笔记本"}
243-
INSERT INTO products (name, description) VALUES ('笔记本电脑', '高性能办公笔记本');
244-
```
245-
246-
### JSON/JSONB/数组字段支持
247-
248-
函数会自动识别 JSON、JSONB 和数组类型字段,并保持其原始结构:
249-
250-
```sql
251-
CREATE TABLE events (
252-
id SERIAL PRIMARY KEY,
253-
title TEXT,
254-
tags TEXT[],
255-
metadata JSONB,
256-
embedding VECTOR(1024)
257-
);
258-
259-
SELECT ve_enable('public', 'events', ARRAY['title', 'tags', 'metadata'], 'embedding');
260-
261-
-- tags 作为 JSON 数组,metadata 作为 JSON 对象保存,而不是字符串
262-
INSERT INTO events (title, tags, metadata) VALUES
263-
('会议', ARRAY['技术', 'PostgreSQL'], '{"location": "北京", "duration": 120}');
264-
```
265-
266-
## 许可证
267-
268-
MIT
203+
## [许可证](LICENSE)

0 commit comments

Comments
 (0)