Skip to content

Conversation

@scal444
Copy link
Collaborator

@scal444 scal444 commented Jan 30, 2026

No description provided.

@scal444 scal444 requested a review from evasnow1992 January 30, 2026 13:24
@scal444 scal444 self-assigned this Jan 30, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 30, 2026

Greptile Overview

Greptile Summary

This PR implements a comprehensive GPU-accelerated substructure search system with full recursion support for SMARTS patterns. The implementation adds ~9,300 lines across 42 files.

Key Changes

  • Core Search Engine: Implements pipelined batch substructure matching on GPU with multi-threaded CPU preprocessing and GPU execution orchestration
  • Recursive SMARTS Support: Full support for recursive/nested SMARTS patterns up to depth 4 (kMaxSmartsNestingDepth), with depth-based batching and double-buffered pinned memory for overlap
  • GSI Algorithm: GPU kernels implementing GSI-inspired BFS level-by-level join algorithm for substructure matching
  • Pipeline Architecture: Multi-stage pipeline with separate streams for recursive preprocessing, post-recursion processing, and main computation
  • Memory Management: Thread-safe pinned buffer pool, device vector management, and scratch buffer reuse to minimize allocations
  • Results Storage: Sparse results storage using unordered_map to efficiently handle large target/query matrices
  • Test Coverage: Comprehensive test suite (1935+ lines) validating against RDKit ground truth

Architecture Highlights

The implementation uses a sophisticated pipelined architecture:

  1. CPU preprocessing threads prepare mini-batches and handle RDKit fallback
  2. GPU workers execute kernels across multiple streams with recursive pattern preprocessing
  3. Thread-safe queues coordinate work between preprocessing and execution threads
  4. Double-buffered pinned memory enables overlap of H2D copies with kernel execution

Testing

Tests cover basic functionality, edge cases, aromatic matching, recursive patterns, multi-threading, and extensive validation against RDKit reference implementation.

Confidence Score: 4/5

  • This PR is safe to merge after thorough testing, with well-structured code and comprehensive test coverage
  • Large feature addition with good architecture, extensive testing, and proper memory management, but complexity warrants careful validation
  • Focus testing on src/substruct/substruct_search.cu for multi-threading edge cases and src/substruct/recursive_preprocessor.cu for deep nesting scenarios

Important Files Changed

Filename Overview
src/substruct/substruct_search.cu Main pipelined substructure search implementation with multi-threading and GPU execution orchestration
src/substruct/substruct_kernels.cu CUDA kernels for label matrix computation and substructure matching with GSI algorithm
src/substruct/recursive_preprocessor.cu Recursive SMARTS pattern preprocessing with depth-based batching and double-buffering
src/substruct/recursive_preprocessor.h Header for recursive SMARTS preprocessing with LeafSubpatterns and RecursiveScratchBuffers
src/substruct/minibatch_planner.cpp CPU-side mini-batch planning and pipeline scheduling for recursive patterns
tests/test_substruct_search.cu Comprehensive test suite with 1935 lines covering various substructure search scenarios

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

constexpr int kOverflowEntriesPerBuffer = 2048;

/// Maximum nesting depth for recursive SMARTS patterns
constexpr int kMaxSmartsNestingDepth = 4;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, does RDKit also assume a maximum nesting depth of 4? If not, what happens if the nesting depth exceeds 4 in our implementation? Fall back to calling RDKit?


} else if constexpr (Algo == SubstructAlgorithm::GSI) {
constexpr int kBlockSizeT = getBlockSizeForConfig<MaxTargetAtoms>();
constexpr int kMaxPartialsT = getMaxPartialsForSM<MaxTargetAtoms, MaxQueryAtoms>(86, kBlockSizeT);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be trivial, but why we choose to hard-code 86?

Comment on lines +355 to +379
std::vector<std::vector<std::vector<int>>*> matchRefs;
matchRefs.reserve(updates.size());

{
std::lock_guard<std::mutex> lock(resultsMutex);
for (const auto& u : updates) {
matchRefs.push_back(&results.getMatchesMut(u.targetIdx, u.queryIdx));
}
}

for (size_t i = 0; i < updates.size(); ++i) {
const auto& u = updates[i];
auto& targetMatches = *matchRefs[i];

targetMatches.reserve(targetMatches.size() + u.reportedMatches);
const int16_t* src = hostBuffer.matchIndices.data() + u.miniBatchLocalOffset;

for (int m = 0; m < u.reportedMatches; ++m) {
auto& match = targetMatches.emplace_back(u.queryAtoms);
for (int a = 0; a < u.queryAtoms; ++a) {
match[a] = src[m * u.queryAtoms + a];
}
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, here the creation of matchRefs is inside the lock, but the writes to targetMatches are outside the lock. Will there be a race condition concern?"

// Instantiate parameterized tests for all algorithms
INSTANTIATE_TEST_SUITE_P(AllAlgorithms,
SubstructureSearchTest,
::testing::Values(SubstructAlgorithm::GSI),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double confirm here if we only want to test GSI but not VF2. I know VF2 is mostly used internally as reference, but the use of AllAlgorithms here can be a little confusing.

Copy link
Collaborator

@evasnow1992 evasnow1992 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank for you putting all these together. A few clarification questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants