feat(ai): Adding Lucene & Embedding-Based Search Operators to Apache GeaFlow (incubating) for Lightweight Context Memory by Leomrlin · Pull Request #716 · apache/geaflow

Leomrlin · 2025-12-17T13:01:43Z

We're excited to introduce initial support for context-aware memory operations in Apache GeaFlow (incubating) through the integration of two key retrieval operators: Lucene-powered keyword search and embedding-based semantic search. This enhancement lays the foundational layer for building dynamic, AI-driven graph memory systems — enabling real-time, hybrid querying over structured graph data and unstructured semantic intent.

✅ Key Features Implemented

KeywordVector + Lucene Indexing: Enables fast, full-text retrieval of entities using BM25-style keyword matching. Ideal for surfacing exact or near-exact matches from entity attributes (e.g., names, emails, titles).
EmbeddingVector + Vector Index Store: Supports semantic search via high-dimensional embeddings. Queries are encoded using a configured embedding model and matched against pre-indexed node representations.
Hybrid VectorSearch Interface: Combines multiple vector types (keyword, embedding, traversal hints) into a single search context, paving the way for multimodal retrieval.
End-to-End Query Pipeline: From query ingestion → hybrid indexing → graph retrieval → context verbalization, demonstrated with LDBC-scale data.

🧪 Validated Use Cases

Our GraphMemoryTest suite demonstrates:

Resolving ambiguous queries like "Chaim Azriel" into multiple candidate persons using keyword + embedding fusion.
Traversing relationships (e.g., Comment_hasCreator_Person) in follow-up rounds via contextual refinement.
Iterative context expansion across multiple search cycles — mimicking agent memory evolution.

🔮 Why This Matters

This work represents the first step toward Graphiti-inspired, relationship-aware AI memory within GeaFlow:

Instead of treating context as static text, we model it as a dynamic, evolving subgraph, enriched by both semantic similarity and topological structure.

By leveraging GeaFlow’s native streaming graph engine, we aim to go beyond batch RAG — supporting incremental updates, temporal reasoning, and multi-hop inference at low latency.

Next Steps:
We propose incubating this as the GeaFlow Memory Engine, with upcoming support for:

Graph traversal-guided re-ranking
Agent session management with episodic memory
Integration with LLM agents for autonomous reasoning

This PR sets the stage: from graph analytics to graph-native AI memory.

Let’s build the future of contextual intelligence — on streaming graphs. 🚀

Appointat

test

Appointat

Thanks for your PR. Left some comments.

cbqiao

LGTM

DukeWangYu · 2026-01-19T06:27:32Z

LGTM

kitalkuyo-gita

LGTM

Appointat

LGTM

Appointat

LGTM

Appointat

LGTM

Appointat

LGTM

Leomrlin added 18 commits November 19, 2025 19:18

init dcp code

7d8832f

Merge remote-tracking branch 'origin/master' into dev_init_dcp

1b10d23

add lucene search

73b97d1

add prompt formatter

0da962e

add test case

18b359f

handle ldbc id conflict

3bd80f0

support llm

4c1aa15

support embedding index store

e0e983a

add embedding op

b945221

refine test case

3253a0e

delete test data

1b2fe59

add MockChatRobot

5ee48a1

fix checkstyle

a127c4b

Merge remote-tracking branch 'origin/master' into dev_init_dcp

ce2fde1

fix pom

8ccc524

fix finishReason

453dfd9

fix ci tests

a80cc46

fix ci tests

bb12777

yaozhongq requested a review from cbqiao December 29, 2025 07:52

Appointat reviewed Dec 30, 2025

View reviewed changes

Appointat suggested changes Dec 30, 2025

View reviewed changes

Leomrlin added 2 commits January 6, 2026 16:11

fix comments

3975294

fix codestyle

adffdd9

Leomrlin changed the title ~~feat(dsl): Adding Lucene & Embedding-Based Search Operators to Apache GeaFlow (incubating) for Lightweight Context Memory~~ feat(ai): Adding Lucene & Embedding-Based Search Operators to Apache GeaFlow (incubating) for Lightweight Context Memory Jan 6, 2026

Leomrlin added 2 commits January 7, 2026 15:16

support mutable graph

65835ce

Merge remote-tracking branch 'origin/master' into dev_init_dcp

9f48bf1

kitalkuyo-gita reviewed Jan 14, 2026

View reviewed changes

Comment thread geaflow-ai/src/main/java/org/apache/geaflow/ai/session/SessionManagement.java

Comment thread geaflow-ai/src/main/java/org/apache/geaflow/ai/operator/SearchUtils.java Outdated

Comment thread geaflow-ai/src/main/java/org/apache/geaflow/ai/operator/SessionOperator.java

fix comments

79e0df1

cbqiao approved these changes Jan 19, 2026

View reviewed changes

kitalkuyo-gita approved these changes Jan 19, 2026

View reviewed changes

Appointat approved these changes Jan 21, 2026

View reviewed changes

cbqiao merged commit ee4a0c5 into apache:master Jan 21, 2026
1 check passed

Conversation

Leomrlin commented Dec 17, 2025

✅ Key Features Implemented

🧪 Validated Use Cases

🔮 Why This Matters

Uh oh!

Appointat left a comment

Choose a reason for hiding this comment

Uh oh!

Appointat left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cbqiao left a comment

Choose a reason for hiding this comment

Uh oh!

DukeWangYu commented Jan 19, 2026

Uh oh!

kitalkuyo-gita left a comment

Choose a reason for hiding this comment

Uh oh!

Appointat left a comment

Choose a reason for hiding this comment

Uh oh!

Appointat left a comment

Choose a reason for hiding this comment

Uh oh!

Appointat left a comment

Choose a reason for hiding this comment

Uh oh!

Appointat left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants