bubblog · vcz-Chan · Oct 26, 2025 · Oct 25, 2025
diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
@@ -41,6 +41,9 @@ jobs:
             docker stop bubblog-ai || true
             docker rm bubblog-ai || true
 
+            docker stop bubblog-ai-worker || true
+            docker rm bubblog-ai-worker || true
+
             docker pull $IMAGE_NAME:latest
 
             docker run -d --name bubblog-ai -p 8000:3000 \
@@ -51,4 +54,28 @@ jobs:
               -e ALGORITHM="${{ secrets.ALGORITHM }}" \
               -e EMBED_MODEL="${{ secrets.EMBED_MODEL }}" \
               -e CHAT_MODEL="${{ secrets.CHAT_MODEL }}" \
-              $IMAGE_NAME:latest
+              -e REDIS_URL="${{ secrets.REDIS_URL }}" \
+              -e REDIS_HOST="${{ secrets.REDIS_HOST }}" \
+              -e REDIS_PORT="${{ secrets.REDIS_PORT }}" \
+              -e EMBEDDING_QUEUE_KEY="${{ secrets.EMBEDDING_QUEUE_KEY }}" \
+              -e EMBEDDING_FAILED_QUEUE_KEY="${{ secrets.EMBEDDING_FAILED_QUEUE_KEY }}" \
+              -e EMBEDDING_WORKER_MAX_RETRIES="${{ secrets.EMBEDDING_WORKER_MAX_RETRIES }}" \
+              -e EMBEDDING_WORKER_BACKOFF_MS="${{ secrets.EMBEDDING_WORKER_BACKOFF_MS }}" \
+              $IMAGE_NAME:latest
+
+            docker run -d --name bubblog-ai-worker \
+              -e OPENAI_API_KEY="${{ secrets.OPENAI_API_KEY }}" \
+              -e DATABASE_URL="${{ secrets.DATABASE_URL }}" \
+              -e SECRET_KEY="${{ secrets.SECRET_KEY }}" \
+              -e TOKEN_AUDIENCE="${{ secrets.TOKEN_AUDIENCE }}" \
+              -e ALGORITHM="${{ secrets.ALGORITHM }}" \
+              -e EMBED_MODEL="${{ secrets.EMBED_MODEL }}" \
+              -e CHAT_MODEL="${{ secrets.CHAT_MODEL }}" \
+              -e REDIS_URL="${{ secrets.REDIS_URL }}" \
+              -e REDIS_HOST="${{ secrets.REDIS_HOST }}" \
+              -e REDIS_PORT="${{ secrets.REDIS_PORT }}" \
+              -e EMBEDDING_QUEUE_KEY="${{ secrets.EMBEDDING_QUEUE_KEY }}" \
+              -e EMBEDDING_FAILED_QUEUE_KEY="${{ secrets.EMBEDDING_FAILED_QUEUE_KEY }}" \
+              -e EMBEDDING_WORKER_MAX_RETRIES="${{ secrets.EMBEDDING_WORKER_MAX_RETRIES }}" \
+              -e EMBEDDING_WORKER_BACKOFF_MS="${{ secrets.EMBEDDING_WORKER_BACKOFF_MS }}" \
+              $IMAGE_NAME:latest node dist/worker/queue-consumer.js
diff --git a/TASK.md b/TASK.md
diff --git a/docker-compose.yml b/docker-compose.yml
@@ -0,0 +1,19 @@
+version: "3.9"
+
+services:
+  api:
+    build:
+      context: .
+    command: ["node", "dist/server.js"]
+    env_file:
+      - .env
+    ports:
+      - "3000:3000"
+    restart: unless-stopped
+  worker:
+    build:
+      context: .
+    command: ["node", "dist/worker/queue-consumer.js"]
+    env_file:
+      - .env
+    restart: unless-stopped
diff --git a/docs/history-tasks/HybridSearchUpgradePlan.md b/docs/history-tasks/HybridSearchUpgradePlan.md
@@ -0,0 +1,136 @@
+## Hybrid Search Upgrade Plan (Working Doc)
+
+### 1. Current Implementation Snapshot
+- `runHybridSearch(question, userId, plan)` (src/services/hybrid-search.service.ts)
+  - Embeds `[question, ...plan.rewrites]` and runs `findSimilarChunksV2` per embedding.
+  - Executes `textSearchChunksV2` once using the original question + keywords.
+  - Merges chunk candidates by `postId:postChunk`, keeps max vector/text score per chunk, min–max normalizes each modality, then fuses via `alpha` blend.
+  - Returns top `plan.top_k` chunks (capped 10); `plan.limit` ignored here.
+- Semantic-only fallback uses same repository call without text blending (`runSemanticSearch`).
+- Planner (`generateSearchPlan`) currently emits rewrites/keywords but keyword quality/quantity varies; schema clamps counts post-hoc.
+- Category filters from API are not wired into hybrid search; text rewrites are not reused in lexical search; chunk key uses raw text.
+
+### 2. Pain Points & Gaps
+1. **Filtering gaps** – category/time filters partially ignored, final `limit` unused, vector threshold normalization can collapse to zero when max=min.
+2. **Keyword quality** – LLM often emits multi-word phrases or duplicates; count not consistently within intended range.
+3. **Rewrite redundancy** – All rewrites treated equally; no semantic-distance-aware weighting → aggressive rewrites may be undervalued or noisy ones over-weighted.
+4. **Fused scoring sensitivity** – Min–max normalization across union is brittle when modalities have outliers; no similarity-based bonus for high-confidence hits.
+5. **Post-level UX** – Current pipeline optimized for RAG chunk retrieval; no reusable API that returns deduplicated post-level hits with pagination.
+6. **Observability** – Limited metrics around rewrite effectiveness, keyword usage, or threshold activations.
+
+### 3. Goals & Guiding Principles
+- Preserve strong recall via multi-embedding + lexical hybrid while adding stability and transparency.
+- Make rewrite/keyword generation purposeful: enforce concise tokens, staged semantic drift, and maintain question coverage.
+- Provide a standalone hybrid search endpoint for user-facing search with post-level results.
+- Instrument similarity thresholds and modality contributions to support tuning.
+
+### 4. Retrieval Quality Enhancements (Track A)
+
+4.1 **Similarity Threshold Boosting**
+- Reuse the existing retrieval bias labels (`lexical`, `balanced`, `semantic`) to derive both `alpha` and default modality thresholds (`sem_boost_threshold`, `lex_boost_threshold`) so planner output stays compact. Defaults (retain current behavior for now):
+  - `lexical`: `alpha = 0.30`, `sem_boost_threshold = 0.65`, `lex_boost_threshold = 0.80`
+  - `balanced`: `alpha = 0.50`, `sem_boost_threshold = 0.70`, `lex_boost_threshold = 0.75`
+  - `semantic`: `alpha = 0.75`, `sem_boost_threshold = 0.80`, `lex_boost_threshold = 0.65`
+- Encode the mapping as a single constants table (e.g., `RETRIEVAL_BIAS_PRESETS`) so both planner normalization and hybrid scoring reference identical values.
+- Permit optional overrides in `plan.hybrid`, but clamp to sensible bounds (e.g., 0.4–0.85) for consistency.
+- When a normalized vector/text score crosses its threshold, apply a bounded boost (e.g., multiply by 1.1–1.3 or add 0.1), log activations, and cap boosts to maintain ranking stability.
+
+4.2 **Rewrite Strategy & Weighting**
+- Update planner prompt to generate staged rewrites:
+  - `rewrite_1`: conservative paraphrase.
+  - `rewrite_2`: adds synonymous term / clarifies entity.
+  - `rewrite_3+`: higher semantic drift or alternative framing.
+- After plan normalization (`search-plan.service.ts`):
+  - Compute embedding-based cosine similarity between original question and each rewrite.
+  - Drop rewrites below a floor (e.g., <0.35) or route them to lexical-only usage.
+  - Derive per-rewrite weights (e.g., `weight = clamp(similarity, 0.6, 1.2)`) and supply to `runHybridSearch`.
+  - Similarity calculations use fresh embedding API calls (no caching) for both the question and rewrites within the request.
+- In hybrid service, apply weights when aggregating vector scores (weighted max/avg instead of pure max) so high-quality rewrites contribute proportionally.
+
+4.3 **Keyword Constraints & Quality**
+- Modify `planSchema` / prompt: keywords must be single Korean/English tokens (no spaces), trimmed, 1–5 items.
+- In normalization, enforce `.slice(0,5)`, drop tokens <2 chars or containing whitespace/punctuation (except hyphen/underscore if needed).
+- Extend text search to run over `[question, ...filtered rewrites]` for lexical recall or compute textual similarity per rewrite (optional v2 step).
+
+4.4 **Repository/Data Adjustments**
+- Update `findSimilarChunksV2` / `textSearchChunksV2` to return `chunk_index`, `post_created_at`, and optionally `post_tags` for downstream boosts.
+  - Tag aggregation via `post_tag` ⇔ `tag` should be added only if such tables exist; otherwise return `[]` and skip joins.
+- Switch dedup key to `${postId}:${chunk_index}` to avoid string-heavy keys.
+- Filters wiring: Do NOT add `filters.category_ids` to the plan. Keep the plan schema limited to `filters.time`.
+  - Use `categoryId` from the controller as a server-side pre-filter only.
+  - Derive `from/to` from the normalized plan time window (label → absolute) and apply in repositories.
+  - Respect `plan.limit` at the final slicing stage.
+- Keep retrieval as exact KNN on `pgvector` (ORDER BY `<=>`); `top_k` stays per-source fetch size while final slicing respects `plan.limit`.
+
+<!-- moved to Backlog: see section 11 -->
+
+### 5. Search API & Post-Level Experience (Track B)
+- **Service decomposition** – Extract shared primitive `buildHybridCandidates({ question, rewrites, keywords, plan, userId, categoryId })` returning chunk-level scores + metadata + diagnostic stats.
+- **Post aggregation** – Create aggregator to deduplicate by post (max score, optional average of top 2, representative snippet) and apply deterministic `limit/offset` pagination (page size default 10, max 10).
+- **Public API** – Add unauthenticated REST endpoint (JSON, no SSE) such as `GET /search/hybrid` accepting question, filters, paging params; reuse the planner or a lightweight variant as UX dictates.
+- **QA reuse** – `answerStreamV2` continues calling chunk-level layer; search endpoint uses same embeddings/threshold logic to avoid drift.
+
+### 6. Prompts & Planner Improvements
+- Update `buildSearchPlanPrompt` instructions:
+  - Require keywords to be single words, explicitly request “1~5 단일 키워드”.
+  - Outline staged rewrite roles to nudge LLM output.
+  - Remind that temporal expressions stay in `filters.time`.
+- Keep the client-facing `planSchema` minimal (no explicit `alpha`/threshold/weight fields). Server derives weights/thresholds internally from the retrieval bias label and does not surface them to the frontend.
+- Update schema docs only for keyword bounds (1–5) and any internal validation notes; no additional fields are exposed over the API.
+- In normalization, log keyword count, rewrite count, threshold values to support telemetry.
+
+### 7. Observability & Telemetry
+- Structured logs/SSE events:
+  - For each query: number of rewrites retained, similarity weights, threshold boosts triggered, counts per modality.
+  - Emit metrics for search endpoint (total posts returned, pagination info, latency).
+- Standardize a log payload (e.g., `type: 'retrieval.boost', bias, alpha, sem_thr, lex_thr, modality, original_score, boosted_score`) to simplify analysis and tuning.
+- Add debug flags to inspect per-rewrite vector/text hit lists for evaluation.
+
+### 8. Performance Considerations
+- Generate embeddings for `[question, rewrites]` with fresh API calls per request (no caching); accept the additional cost for correctness.
+- Cap total vector queries by `plan.hybrid.max_rewrites`; consider batching embeddings via OpenAI API if supported.
+- Monitor effect of threshold boosts on latency; adjust SQL to prefetch needed metadata in single round-trip.
+
+### 9. Execution Roadmap (Detailed)
+
+**Phase 0 – Foundations & Bugfixes**
+- Task 0.1: Thread request filters (`categoryId`, `limit`) through `qa.v2.service.ts`. Do NOT add `filters.category_ids` to the plan; the server applies `categoryId` as a pre-filter. Derive `from/to` solely from the normalized plan `filters.time` (label → absolute) and use in repositories.
+- Task 0.2: Honor `plan.limit` when returning hybrid results, switch dedupe key to `${postId}:${chunk_index}`, and propagate `chunk_index` through types.
+- Task 0.3: Expand `findSimilarChunksV2`/`textSearchChunksV2` to select `chunk_index`, `post_created_at`, and optionally aggregated `post_tags` (only if tag tables exist); update SQL joins and DTOs with safe fallbacks.
+- Task 0.4: Update hybrid/semantic services to surface new metadata in SSE payloads, keeping backward compatibility for existing clients.
+
+**Phase 1 – Planner & Prompt Hardening**
+- Task 1.1: Tighten `planSchema` validation (keywords 1–5 single tokens, rewrites ≤ max_rewrites) and normalize via shared helpers with telemetry hooks.
+- Task 1.2: Revise `buildSearchPlanPrompt` instructions to enforce staged rewrites, single-token keywords, and explicit temporal guidance; add regression fixtures for prompt drift.
+- Task 1.3: Implement normalization pass that cleans keywords, generates embeddings for rewrites, filters low-similarity variants, and records per-rewrite cosine similarity.
+- Task 1.4: Persist summary logs (`rewrites_len`, similarity weights, keyword counts) via structured logger for observability.
+
+**Phase 2 – Retrieval Scoring Upgrades**
+- Task 2.1: Introduce `RETRIEVAL_BIAS_PRESETS` mapping (`alpha`, `sem_boost_threshold`, `lex_boost_threshold`) and clamp overrides in normalization.
+- Task 2.2: Apply threshold-based boosts in `runHybridSearch`, logging activations and capping final scores for stability.
+- Task 2.3: Weight vector scores by rewrite similarity (e.g., weighted max/avg) and expose diagnostics per rewrite.
+- Task 2.4: Extend lexical search to iterate across `[question, rewrites]`, merging results while respecting keyword filters and avoiding redundant queries.
+- Task 2.5: Enforce post-level diversity (max N chunks/post) before final ranking and respect `plan.limit` after fusion.
+
+**Phase 3 – Search API Delivery**
+- Task 3.1: Extract `buildHybridCandidates` service returning chunk-level hits plus diagnostics; retrofit QA flow to consume it.
+- Task 3.2: Build post aggregation layer (score fusion, snippet selection, pagination respecting `limit/offset`) with deterministic ordering.
+- Task 3.3: Add `GET /search/hybrid` route, request validation, and integration tests covering filters, pagination, and telemetry events.
+- Task 3.4: Document API usage and ensure rate-limiting/auth hooks match product requirements.
+
+**Phase 4 – Tuning & Observability**
+- Task 4.1: Emit structured SSE/log events for threshold boosts, rewrite weighting, keyword pruning, and modality contributions.
+- Task 4.2: Backfill dashboards or log queries (e.g., BigQuery/Redash) to monitor latency, hit counts, and boost frequency.
+- Task 4.3: Create evaluation playbook with canonical queries, offline regression scripts, and guidance for tuning boost factors.
+- Task 4.4: Investigate alternative fusion strategies (RRF/z-score) gated behind feature flags for safe experimentation.
+
+### 10. Open Questions
+- Do we need separate planner settings for public search vs QA (e.g., higher keyword count)?
+- Should rewrite weights persist back into plan schema for transparency to the client?
+- What default boost factors strike best balance between recall and precision? Requires offline eval.
+
+---
+Use this document as the anchor before implementation; update sections as design decisions finalize or metrics inform threshold choices.
+
+### 11. Backlog
+- Normalization stability (min–max collapse): evaluate mitigations without immediate implementation. Candidates include constant fallback (e.g., 0.5), epsilon guards, rank-based fusion (RRF), z-score fusion, unimodal fallback, and telemetry for activation frequency.
diff --git a/docs/reports/REPORT-embedding-worker.md b/docs/reports/REPORT-embedding-worker.md
@@ -0,0 +1,71 @@
+# 보고서: Redis 큐 기반 임베딩 워커 도입 및 배포 구성
+
+## 1. 개요
+- 목적: Spring Boot → Redis → Node.js 파이프라인으로 임베딩 생성을 비동기 처리하고, Express API와 분리된 워커를 운영한다.
+- 상태: 워커 엔트리포인트·환경 변수 스키마·도커 컴포즈·GitHub Actions 배포 흐름까지 반영 완료.
+- 범위: 기존 API 서버 코드는 유지하면서 Redis 큐 소비 로직을 추가하고, 단일 Docker 이미지로 API/워커 컨테이너를 분리 운용한다.
+
+## 2. 워커 구조
+- 파일: `src/worker/queue-consumer.ts`
+  - Redis 연결: `REDIS_URL`(우선) 또는 `REDIS_HOST`/`REDIS_PORT`.
+  - 작업 형식: `{ postId, title?, content?, attempt? }`.
+  - 처리 순서
+    1. `BRPOP` 으로 `EMBEDDING_QUEUE_KEY` 대기.
+    2. 제목(`storeTitleEmbedding`)과 본문(`chunkText` → `createEmbeddings` → `storeContentEmbeddings`) 순차 처리.
+    3. 오류 시 재시도: `attempt` 증가, `EMBEDDING_WORKER_MAX_RETRIES`, `EMBEDDING_WORKER_BACKOFF_MS` 기반 backoff, 한계를 넘으면 `EMBEDDING_FAILED_QUEUE_KEY` 로 이동.
+  - 기타: Graceful shutdown(SIGINT/SIGTERM), DebugLogger 로 주요 이벤트 기록.
+
+## 3. 환경 변수 (추가 항목)
+| 키 | 용도 | 기본값 |
+| --- | --- | --- |
+| `REDIS_URL` | 외부 Redis 접속 URL (우선 사용) | 없음 |
+| `REDIS_HOST` / `REDIS_PORT` | URL 미지정 시 호스트/포트 | `127.0.0.1` / `6379` |
+| `EMBEDDING_QUEUE_KEY` | 작업 큐 이름 | `embedding:queue` |
+| `EMBEDDING_FAILED_QUEUE_KEY` | 실패 큐 이름 | `embedding:failed` |
+| `EMBEDDING_WORKER_MAX_RETRIES` | 최대 재시도 횟수 | `3` |
+| `EMBEDDING_WORKER_BACKOFF_MS` | 재시도 간 대기(ms) | `5000` |
+
+## 4. 도커 이미지 & 실행
+- Dockerfile 기본 CMD: `node dist/server.js` (Express API).
+- 동일 이미지를 재사용하되 `docker run ... node dist/worker/queue-consumer.js` 로 커맨드를 오버라이드하면 워커가 실행된다.
+- pm2 불필요: 컨테이너 단일 프로세스 가정 + Docker `restart` 정책으로 복구.
+
+## 5. docker-compose (개발용)
+```yaml
+services:
+  api:
+    build: .
+    command: ["node", "dist/server.js"]
+    env_file: [.env]
+    ports: ["3000:3000"]
+    restart: unless-stopped
+
+  worker:
+    build: .
+    command: ["node", "dist/worker/queue-consumer.js"]
+    env_file: [.env]
+    restart: unless-stopped
+```
+- 외부 Redis 사용이 기본 전제. 필요 시 개발 환경에서만 Redis 서비스를 추가해 `.env` 를 해당 컨테이너로 지정.
+
+## 6. GitHub Actions 배포 (main.yml)
+- 이미지: `${{ secrets.DOCKER_USERNAME }}/bubblog-ai:latest` 빌드/푸시.
+- EC2 배포 단계:
+  1. 기존 `bubblog-ai`, `bubblog-ai-worker` 컨테이너 정지/삭제.
+  2. 최신 이미지 pull.
+  3. API 컨테이너 실행(기본 CMD).
+  4. 워커 컨테이너 실행(`node dist/worker/queue-consumer.js` 명령).
+- 두 컨테이너 모두 Redis 및 재시도 관련 Secrets를 전달하여 구성 누락을 방지.
+- Secrets(예시): `REDIS_URL`, `REDIS_HOST`, `REDIS_PORT`, `EMBEDDING_QUEUE_KEY`, `EMBEDDING_FAILED_QUEUE_KEY`, `EMBEDDING_WORKER_MAX_RETRIES`, `EMBEDDING_WORKER_BACKOFF_MS` 등.
+
+## 7. 운영 참고 사항
+- Spring Boot 프로듀서는 LPUSH 로 작업을 큐에 적재(이미 구현됨).
+- Redis 는 외부 서버/매니지드 환경을 사용; 본 프로젝트 컨테이너에서는 Consumer 역할만 수행.
+- 실패 큐(`embedding:failed`) 모니터링 및 재처리(예: RPOP → LPUSH → 재시도 스케줄러) 전략 필요.
+- API 컨테이너에서 Redis 변수가 필요하지는 않지만, 비상시 커맨드 오버라이드를 대비해 공통으로 주입해 둔 상태.
+
+## 8. 향후 체크리스트
+- [ ] 스테이징 환경에서 Redis/DB 연결 및 임베딩 저장 성공 여부 검증.
+- [ ] 실패 큐 모니터링/알림 구성.
+- [ ] 워커 스케일 아웃 전략 정의 (컨테이너 수 확장 시 처리 충돌 없는지 확인).
+- [ ] Redis 접근 제어/TLS 여부 점검.