Implement substructure search with full recursion #81

scal444 · 2026-01-30T13:24:32Z

No description provided.

greptile-apps · 2026-01-30T13:27:20Z

Greptile Overview

Greptile Summary

This PR implements a comprehensive GPU-accelerated substructure search system with full recursion support for SMARTS patterns. The implementation adds ~9,300 lines across 42 files.

Key Changes

Core Search Engine: Implements pipelined batch substructure matching on GPU with multi-threaded CPU preprocessing and GPU execution orchestration
Recursive SMARTS Support: Full support for recursive/nested SMARTS patterns up to depth 4 (kMaxSmartsNestingDepth), with depth-based batching and double-buffered pinned memory for overlap
GSI Algorithm: GPU kernels implementing GSI-inspired BFS level-by-level join algorithm for substructure matching
Pipeline Architecture: Multi-stage pipeline with separate streams for recursive preprocessing, post-recursion processing, and main computation
Memory Management: Thread-safe pinned buffer pool, device vector management, and scratch buffer reuse to minimize allocations
Results Storage: Sparse results storage using unordered_map to efficiently handle large target/query matrices
Test Coverage: Comprehensive test suite (1935+ lines) validating against RDKit ground truth

Architecture Highlights

The implementation uses a sophisticated pipelined architecture:

CPU preprocessing threads prepare mini-batches and handle RDKit fallback
GPU workers execute kernels across multiple streams with recursive pattern preprocessing
Thread-safe queues coordinate work between preprocessing and execution threads
Double-buffered pinned memory enables overlap of H2D copies with kernel execution

Testing

Tests cover basic functionality, edge cases, aromatic matching, recursive patterns, multi-threading, and extensive validation against RDKit reference implementation.

Confidence Score: 4/5

This PR is safe to merge after thorough testing, with well-structured code and comprehensive test coverage
Large feature addition with good architecture, extensive testing, and proper memory management, but complexity warrants careful validation
Focus testing on src/substruct/substruct_search.cu for multi-threading edge cases and src/substruct/recursive_preprocessor.cu for deep nesting scenarios

Important Files Changed

Filename	Overview
src/substruct/substruct_search.cu	Main pipelined substructure search implementation with multi-threading and GPU execution orchestration
src/substruct/substruct_kernels.cu	CUDA kernels for label matrix computation and substructure matching with GSI algorithm
src/substruct/recursive_preprocessor.cu	Recursive SMARTS pattern preprocessing with depth-based batching and double-buffering
src/substruct/recursive_preprocessor.h	Header for recursive SMARTS preprocessing with LeafSubpatterns and RecursiveScratchBuffers
src/substruct/minibatch_planner.cpp	CPU-side mini-batch planning and pipeline scheduling for recursive patterns
tests/test_substruct_search.cu	Comprehensive test suite with 1935 lines covering various substructure search scenarios

greptile-apps

_{6 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

evasnow1992 · 2026-02-02T01:43:13Z

src/substruct/substruct_constants.h

+constexpr int kOverflowEntriesPerBuffer = 2048;
+
+/// Maximum nesting depth for recursive SMARTS patterns
+constexpr int kMaxSmartsNestingDepth = 4;


Just curious, does RDKit also assume a maximum nesting depth of 4? If not, what happens if the nesting depth exceeds 4 in our implementation? Fall back to calling RDKit?

evasnow1992 · 2026-02-02T04:54:06Z

src/substruct/substruct_kernels.cu

+
+  } else if constexpr (Algo == SubstructAlgorithm::GSI) {
+    constexpr int kBlockSizeT   = getBlockSizeForConfig<MaxTargetAtoms>();
+    constexpr int kMaxPartialsT = getMaxPartialsForSM<MaxTargetAtoms, MaxQueryAtoms>(86, kBlockSizeT);


This may be trivial, but why we choose to hard-code 86?

evasnow1992 · 2026-02-02T18:55:02Z

src/substruct/substruct_search_internal.cpp

+  std::vector<std::vector<std::vector<int>>*> matchRefs;
+  matchRefs.reserve(updates.size());
+
+  {
+    std::lock_guard<std::mutex> lock(resultsMutex);
+    for (const auto& u : updates) {
+      matchRefs.push_back(&results.getMatchesMut(u.targetIdx, u.queryIdx));
+    }
+  }
+
+  for (size_t i = 0; i < updates.size(); ++i) {
+    const auto& u             = updates[i];
+    auto&       targetMatches = *matchRefs[i];
+
+    targetMatches.reserve(targetMatches.size() + u.reportedMatches);
+    const int16_t* src = hostBuffer.matchIndices.data() + u.miniBatchLocalOffset;
+
+    for (int m = 0; m < u.reportedMatches; ++m) {
+      auto& match = targetMatches.emplace_back(u.queryAtoms);
+      for (int a = 0; a < u.queryAtoms; ++a) {
+        match[a] = src[m * u.queryAtoms + a];
+      }
+    }
+  }
+}


Just to clarify, here the creation of matchRefs is inside the lock, but the writes to targetMatches are outside the lock. Will there be a race condition concern?"

evasnow1992 · 2026-02-02T22:32:40Z

tests/test_substruct_search.cu

+// Instantiate parameterized tests for all algorithms
+INSTANTIATE_TEST_SUITE_P(AllAlgorithms,
+                         SubstructureSearchTest,
+                         ::testing::Values(SubstructAlgorithm::GSI),


Double confirm here if we only want to test GSI but not VF2. I know VF2 is mostly used internally as reference, but the use of AllAlgorithms here can be a little confusing.

evasnow1992

Thank for you putting all these together. A few clarification questions.

scal444 added 15 commits January 29, 2026 08:19

First round of files to copy into next PR

b016a6e

Tests are passing

cdfc506

Fix copyrights

d6b6c3d

Add multithreaded queue and tests, replace multiple queues with it

ea38378

Remove dead code

f3414de

Split off unit tests for algorithms

0b00a35

Extract recursive preprocessor

a7dc5b5

Extract GPU executor and consolidate with minibatch plan

7a608d0

More simplification

25e598a

Extract output handling

a3122c1

Apply refactorings

ba3cc39

More refactoring

1ab287d

Merge branch 'main' into PR_new_base_jan29

1b9509a

Consolidate some SM stuff

873e1e9

Undo accidental changes

9233e61

scal444 requested a review from evasnow1992 January 30, 2026 13:24

scal444 self-assigned this Jan 30, 2026

greptile-apps bot reviewed Jan 30, 2026

View reviewed changes

evasnow1992 reviewed Feb 2, 2026

View reviewed changes

evasnow1992 approved these changes Feb 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement substructure search with full recursion #81

Implement substructure search with full recursion #81

Uh oh!

scal444 commented Jan 30, 2026

Uh oh!

greptile-apps bot commented Jan 30, 2026

Greptile Summary

Key Changes

Architecture Highlights

Testing

Uh oh!

greptile-apps bot left a comment

Uh oh!

evasnow1992 Feb 2, 2026

Uh oh!

evasnow1992 Feb 2, 2026

Uh oh!

evasnow1992 Feb 2, 2026

Uh oh!

evasnow1992 Feb 2, 2026

Uh oh!

evasnow1992 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Implement substructure search with full recursion #81

Are you sure you want to change the base?

Implement substructure search with full recursion #81

Uh oh!

Conversation

scal444 commented Jan 30, 2026

Uh oh!

greptile-apps bot commented Jan 30, 2026

Greptile Overview

Greptile Summary

Key Changes

Architecture Highlights

Testing

Confidence Score: 4/5

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

evasnow1992 Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

evasnow1992 Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

evasnow1992 Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

evasnow1992 Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

evasnow1992 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants