Skip to content

[RFC] Implement a graphLookup command to demonstrate multi-hop queries #5088

@LantaoJin

Description

@LantaoJin

Problem Statement
We can functionally implement a PPL graphLookup command that is semantically equivalent to MongoDB’s $graphLookup pipeline aggregation query, which is to demonstrate multi-hop queries based on a graph model.

Current State
Lack of multi-hop ability in query

Long-Term Goals

  • Describe the ideal outcome or future state you aim to achieve.
    • A new PPL command to support multi-hop query with high performance.
  • What are the primary objectives of this solution should fulfill in the long run?
    • Support GraphRag query with PPL

Proposal
PPL syntax:

source = <vertex_index> 
| graphlookup
  <edge_index>
  startWith=<expression>
  fromField=<field>
  toField=<field>
  [maxDepth=<number>]
  [match=<expression>]
  as <new_field>

Approach
Approach 1
Implement with Calcite Recursive CTE: POC here


                    ┌──────────────┐                                     
                    │ Graph G=(V,E)│                                     
                    └──────┬───────┘                                     
                           │ represented as                              
                           ▼                                             
                    ┌──────────────┐                                     
                    │edges(src,dst)│                                     
                    └──────┬───────┘                                     
                           │                                             
           ┌───────────────┼───────────────┐                             
           │               │               │                             
           ▼               ▼               ▼                             
      ┌─────────┐    ┌───────────┐    ┌───────────┐                      
      │  JOIN   │    │ UNION ALL │    │ RECURSIVE │                      
      │=traverse│    │=accumulate│    │= iterate  │                      
      │one edge │    │           │    │           │                      
      └─────────┘    └───────────┘    └───────────┘                      
           │               │               │                             
           └───────────────┼───────────────┘                             
                           │                                             
                           ▼                                             
                    ┌───────────────┐                                    
                    │ Graph Search  │                                    
                    │   Result      │                                    
                    │(reachable set)│                                    
                    └───────────────┘   

Approach 2 (perferred)
Provide a native Enumerable operator to implement high performance multi-hop traverse.

Metadata

Metadata

Labels

PPLPiped processing languageRFCRequest For Comments

Type

No type

Projects

Status

New

Status

Not Started

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions