Skip to content

Commit d36e916

Browse files
SonAIengineclaude
andcommitted
feat: LongMemEval Phase 1 개선 — 다중 검색 + 유형별 프롬프트 + 시간순 정렬
## Phase 1 개선 내역 - 다중 검색: 원본 쿼리 + 키워드 서브쿼리 + 날짜/counting 특화 재검색 - 유형별 특화 프롬프트: 6유형 각각 system prompt 분리 - temporal: "Think step by step about dates" - knowledge-update: "Trust the NEWEST conversation" + 최신순 정렬 - multi-session: "Check every excerpt and combine" - 시간순 context 정렬 (knowledge-update는 최신 먼저) ## 결과 (50문항, qwen3.5:4b) | 유형 | Baseline | Phase 1 | 변화 | |------|---------|---------|------| | multi-session | 0.0% | 50.0% | +50% | | knowledge-update | 25.0% | 50.0% | +25% | | temporal-reasoning | 0.0% | 25.0% | +25% | | 전체 Accuracy | 20.8% | 22.9% | +2.1% | | Mean Correctness | 0.219 | 0.301 | +37% | | Session Recall | 0.795 | 0.828 | +4.1% | Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 9cdbc62 commit d36e916

1 file changed

Lines changed: 253 additions & 64 deletions

File tree

0 commit comments

Comments
 (0)