index and query prs by nick1udwig · Pull Request #82 · similigh/simili-bot

nick1udwig · 2026-02-18T18:50:15Z

Description

Index and query PRs, not just issues

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to change)
📚 Documentation update
🔧 Configuration/build change
♻️ Refactoring (no functional changes)
🧪 Test update

Related Issues

Fixes #43

Changes Made

adds PR indexing
adds PR querying (against PRs and issues)

Testing

I have run go build ./... successfully
I have run go test ./... successfully
I have run go vet ./... successfully
I have tested the changes locally

(I've tested with OpenAI models)

Screenshots (if applicable)

Checklist

My code follows the project's style guidelines
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
New and existing unit tests pass locally with my changes

Additional Notes

Summary by CodeRabbit

New Features
- PR indexing into a dedicated collection and metadata (including changed files) when enabled
- New pr-duplicate command: AI-powered duplicate detection for a PR against issues/PRs with top-k, threshold, and JSON output
Documentation
- CLI docs and workflow examples updated for PR indexing, pr-duplicate usage, and RFC3339 --since behavior
Configuration
- Sample env entries for main and optional PR collection names
Tests
- Unit tests added for PR duplicate candidate parsing and helpers

coderabbitai · 2026-02-18T18:50:36Z

📝 Walkthrough

Walkthrough

Adds optional pull-request indexing and a new simili pr-duplicate CLI command. Updates config/environment to support a separate PR Qdrant collection, extends GitHub and Gemini integrations, updates the index workflow to optionally index PR metadata, and adds CLI and tests for PR duplicate detection.

Changes

Cohort / File(s)	Summary
Env & Config `.env.sample`, `.simili.yaml`, `internal/core/config/config.go`	Add `QDRANT_COLLECTION` sample, optional `QDRANT_PR_COLLECTION`, `qdrant.pr_collection` config and `PRCollection` field with defaulting/merge logic.
CLI docs `README.md`	Document `--include-prs`, `--pr-collection`, `--since` RFC3339 note, new `simili pr-duplicate` command and example workflows showing PR indexing and duplicate checking.
Indexing pipeline `cmd/simili/commands/index.go`, `cmd/simili/commands/pr_support.go`	Add `--include-prs` flag, resolve/create PR collection, Job.IsPullRequest handling, `processPullRequest`, PR metadata/text building, changed-files collection, embedding and upsert into PR collection, dry-run support.
New command & tests `cmd/simili/commands/pr_duplicate.go`, `cmd/simili/commands/pr_duplicate_test.go`	Add `pr-duplicate` command: fetch PR, build metadata, embed, search issue+PR collections, LLM-based duplicate detection, output/JSON rendering; include unit tests for candidate parsing and extraction.
Gemini / LLM `internal/integrations/gemini/llm.go`, `internal/integrations/gemini/prompts.go`	Add PR duplicate detection types and `DetectPRDuplicate` method; add prompt builder for PR duplicate detection (JSON response contract). Note: duplicate prompt block observed in diff.
GitHub integration `internal/integrations/github/client.go`	Add `GetPullRequest`, `ListPullRequests`, and `ListPullRequestFiles` methods (paginated aggregation).
Batch & infra `cmd/simili/commands/batch.go`, `.gitignore`	Make LLM client optional at init (non-fatal failure handling); tighten `.gitignore` pattern for `/simili`.
Misc other small doc/config tweaks	Docs, sample env and manifest changes (small line additions/removals).

Sequence Diagram

sequenceDiagram
    actor User
    participant CLI as simili CLI
    participant GitHub as GitHub API
    participant Gemini as Gemini LLM
    participant Qdrant as Qdrant

    User->>CLI: simili pr-duplicate --repo owner/repo --number 123
    CLI->>GitHub: GetPullRequest(owner,repo,123)
    GitHub-->>CLI: PR metadata
    CLI->>GitHub: ListPullRequestFiles(owner,repo,123)
    GitHub-->>CLI: changed files
    CLI->>CLI: buildPRMetadataText(PR + files)
    CLI->>Gemini: embed PR text
    Gemini-->>CLI: embedding vector
    CLI->>Qdrant: search issue collection (embedding)
    Qdrant-->>CLI: issue candidates (similarity)
    CLI->>Qdrant: search PR collection (embedding)
    Qdrant-->>CLI: PR candidates (similarity)
    CLI->>Gemini: DetectPRDuplicate(PR + merged candidates)
    Gemini-->>CLI: duplicate decision (is_duplicate, id, confidence)
    CLI->>User: render results (or JSON)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat: CLI PR indexing (#46) & standardized embedding content (#47) #48 — Related changes to CLI indexing pipeline and PR indexing flags/metadata.
feat: add batch CLI and web UI for issue analysis #36 — Related edits to LLM prompts and Gemini integration for duplicate detection.
Add OpenAI support #65 — Related modifications to Gemini integration and duplicate-detection types.

Suggested labels

enhancement

Suggested reviewers

Kavirubc

Poem

🐰 A hop, a nibble, a PR in sight,
Into collections we tuck it tight.
Vectors hum and Gemini peeks,
Finding echoes across issue weeks.
Hooray — duplicates found (or not) tonight! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 15.63% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'index and query prs' clearly and concisely summarizes the main feature addition of PR indexing and querying capabilities throughout the changeset.
Linked Issues check	✅ Passed	The PR fully implements all coding requirements from issue `#43`: --include-prs flag for PR indexing, QDRANT_PR_COLLECTION configuration, pr-duplicate CLI command, processPullRequest worker, and dry-run support for PR indexing.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with PR indexing objectives: configuration additions, CLI enhancements, worker implementations, LLM integration for duplicate detection, and GitHub API client extensions for PR operations.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (5)

cmd/simili/commands/batch.go (1)
342-349: LLM initialization failure is silently swallowed in non-verbose mode, giving no indication that LLM-powered features will be skipped.

When verbose is false and LLM initialization fails, users get no message, even though downstream steps will skip duplicate detection, triage analysis, and other LLM-dependent features. This can result in confusing batch output.

Consider always logging a message when LLM is unavailable, reserving verbose mode for the detailed error:
Suggested change
 	llm, err := gemini.NewLLMClient(llmKey, llmModel)
 	if err == nil {
 		deps.LLMClient = llm
 		if verbose {
 			fmt.Printf("✓ Initialized LLM client (%s) with model: %s\n", llm.Provider(), llm.Model())
 		}
-	} else if verbose {
-		fmt.Printf("ℹ LLM client unavailable, continuing without LLM steps: %v\n", err)
+	} else {
+		fmt.Println("ℹ LLM client unavailable, continuing without LLM steps")
+		if verbose {
+			fmt.Printf("  LLM init error: %v\n", err)
+		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/simili/commands/batch.go` around lines 342 - 349, The current LLM init
branch only logs failures when verbose is true, so when err != nil and verbose
is false the fact that deps.LLMClient is nil is silent; change the else branch
that handles llm initialization to always print a concise notice like "LLM
client unavailable, continuing without LLM steps" (using the same fmt.Printf or
logger) and only append the full error details when verbose is true (e.g.,
include err in the message when verbose). Update the block around the llm
variable / deps.LLMClient assignment in batch.go so failures are always
announced but detailed error text remains controlled by the verbose flag.
cmd/simili/commands/pr_duplicate.go (2)
444-505: buildCandidateFromSearchResult has good defensive fallbacks, but silently defaults state to "open" which could be misleading.

Lines 487-489: when state is missing from the payload, it defaults to "open". A missing state could also indicate a closed or merged entity. Consider defaulting to "unknown" instead, so downstream consumers (including the LLM) aren't given a potentially incorrect signal.
Proposed change
 	state, _ := res.Payload["state"].(string)
 	if state == "" {
-		state = "open"
+		state = "unknown"
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/simili/commands/pr_duplicate.go` around lines 444 - 505, The function
buildCandidateFromSearchResult currently defaults a missing payload "state" to
"open", which can mislead downstream consumers; change the fallback to use
"unknown" instead of "open" (update the variable state handling in
buildCandidateFromSearchResult where state is read from res.Payload and set when
empty), ensure the candidate.State field receives "unknown" for missing/empty
state, and run/adjust any tests or callers that assumed "open" as the implicit
default.
192-206: New GitHub client is created per invocation — acceptable for CLI, but note the token is used without context timeout.

fetchPullRequestMetadataText uses context.Background() (inherited from caller). For a CLI tool this is fine, but if the GitHub API hangs, there's no timeout. Consider adding a context with timeout for robustness.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/simili/commands/pr_duplicate.go` around lines 192 - 206,
fetchPullRequestMetadataText currently uses the caller context directly and
creates a new GitHub client per call without a timeout; wrap the incoming ctx in
a context.WithTimeout (e.g., 10s or configurable) inside
fetchPullRequestMetadataText, defer cancel(), and use that timed context when
calling similiGithub.NewClient, ghClient.GetPullRequest and
listAllPullRequestFilePaths so API calls cannot hang indefinitely; ensure the
cancel function is called to release resources and make the timeout value
configurable if needed.
cmd/simili/commands/pr_duplicate_test.go (1)
20-45: Test uses int64 for number fields, but JSON-unmarshaled map[string]interface{} yields float64.

When Qdrant search results are deserialized from JSON into map[string]interface{}, Go's encoding/json represents JSON numbers as float64, not int64. The toInt helper handles both, so production code works, but these tests don't exercise the realistic float64 path. Consider adding a test case (or adjusting one) with float64(42) to cover the real-world deserialization type.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/simili/commands/pr_duplicate_test.go` around lines 20 - 45, The test
TestBuildCandidateFromSearchResult_Issue builds a fake qdrant.SearchResult with
"issue_number" as int64, but real JSON-unmarshaled payloads use float64; update
the test payload to use float64(42) (or add an additional test case) so
buildCandidateFromSearchResult and the toInt helper are exercised with the
realistic float64 path; locate the test and change the "issue_number" value in
the payload map used by TestBuildCandidateFromSearchResult_Issue accordingly.
cmd/simili/commands/index.go (1)
298-364: processPullRequest looks solid overall.

The function correctly fetches the PR, retrieves file paths, builds metadata text, chunks, embeds, and upserts. The empty-text guard (Line 315) and empty-chunks fallback (Lines 321-323) are good defensive measures.

One minor note: the issue *github.Issue parameter name is slightly misleading since this processes a PR. Consider renaming to something like ghIssue to clarify that it's the stub from the issues list endpoint, not an actual issue.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/simili/commands/index.go` around lines 298 - 364, Rename the misleading
parameter issue in the processPullRequest function to ghIssue (or ghIssueStub)
to clarify it's the GitHub Issue stub for a PR; update the function signature
(processPullRequest(ctx context.Context, workerID int, ghIssue *github.Issue,
...)), replace all references inside processPullRequest (e.g., prNumber :=
issue.GetNumber()) to use ghIssue, and update every call site that invokes
processPullRequest to pass the renamed parameter variable accordingly so
compilation stays consistent.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/simili/commands/index.go`:
- Around line 121-127: The PR collection is being created with
cfg.Embedding.Dimensions instead of the actual detected embeddingDimensions,
causing a vector size mismatch; modify the call to
qdrantClient.CreateCollection(ctx, prCollection, cfg.Embedding.Dimensions) to
pass embeddingDimensions instead (i.e., qdrantClient.CreateCollection(ctx,
prCollection, embeddingDimensions)) so the PR collection uses the embedder's
reported dimension; update any related log/messages if present to reflect
embeddingDimensions and ensure the variable embeddingDimensions (from where the
embedder reports dimensions) is in scope for this branch.

In `@cmd/simili/commands/pr_support.go`:
- Around line 35-53: The function listAllPullRequestFilePaths currently only
retrieves the first page (PerPage:100) from gh.ListPullRequestFiles; update it
to paginate by looping calls to gh.ListPullRequestFiles and using the response's
NextPage (or setting ListOptions.Page) to fetch subsequent pages until
NextPage==0, appending filenames from each page (trimming and skipping empty
names) into the paths slice, and then sort and return the aggregated result;
ensure you handle and propagate errors from each call and avoid duplicating
entries.

---

Nitpick comments:
In `@cmd/simili/commands/batch.go`:
- Around line 342-349: The current LLM init branch only logs failures when
verbose is true, so when err != nil and verbose is false the fact that
deps.LLMClient is nil is silent; change the else branch that handles llm
initialization to always print a concise notice like "LLM client unavailable,
continuing without LLM steps" (using the same fmt.Printf or logger) and only
append the full error details when verbose is true (e.g., include err in the
message when verbose). Update the block around the llm variable / deps.LLMClient
assignment in batch.go so failures are always announced but detailed error text
remains controlled by the verbose flag.

In `@cmd/simili/commands/index.go`:
- Around line 298-364: Rename the misleading parameter issue in the
processPullRequest function to ghIssue (or ghIssueStub) to clarify it's the
GitHub Issue stub for a PR; update the function signature
(processPullRequest(ctx context.Context, workerID int, ghIssue *github.Issue,
...)), replace all references inside processPullRequest (e.g., prNumber :=
issue.GetNumber()) to use ghIssue, and update every call site that invokes
processPullRequest to pass the renamed parameter variable accordingly so
compilation stays consistent.

In `@cmd/simili/commands/pr_duplicate_test.go`:
- Around line 20-45: The test TestBuildCandidateFromSearchResult_Issue builds a
fake qdrant.SearchResult with "issue_number" as int64, but real JSON-unmarshaled
payloads use float64; update the test payload to use float64(42) (or add an
additional test case) so buildCandidateFromSearchResult and the toInt helper are
exercised with the realistic float64 path; locate the test and change the
"issue_number" value in the payload map used by
TestBuildCandidateFromSearchResult_Issue accordingly.

In `@cmd/simili/commands/pr_duplicate.go`:
- Around line 444-505: The function buildCandidateFromSearchResult currently
defaults a missing payload "state" to "open", which can mislead downstream
consumers; change the fallback to use "unknown" instead of "open" (update the
variable state handling in buildCandidateFromSearchResult where state is read
from res.Payload and set when empty), ensure the candidate.State field receives
"unknown" for missing/empty state, and run/adjust any tests or callers that
assumed "open" as the implicit default.
- Around line 192-206: fetchPullRequestMetadataText currently uses the caller
context directly and creates a new GitHub client per call without a timeout;
wrap the incoming ctx in a context.WithTimeout (e.g., 10s or configurable)
inside fetchPullRequestMetadataText, defer cancel(), and use that timed context
when calling similiGithub.NewClient, ghClient.GetPullRequest and
listAllPullRequestFilePaths so API calls cannot hang indefinitely; ensure the
cancel function is called to release resources and make the timeout value
configurable if needed.

cmd/simili/commands/index.go

coderabbitai · 2026-02-18T18:55:04Z

cmd/simili/commands/pr_support.go

+func listAllPullRequestFilePaths(ctx context.Context, gh *similiGithub.Client, org, repo string, number int) ([]string, error) {
+	files, _, err := gh.ListPullRequestFiles(ctx, org, repo, number, &github.ListOptions{
+		PerPage: 100,
+	})
+	if err != nil {
+		return nil, err
+	}
+
+	paths := make([]string, 0, len(files))
+	for _, f := range files {
+		name := strings.TrimSpace(f.GetFilename())
+		if name != "" {
+			paths = append(paths, name)
+		}
+	}
+
+	sort.Strings(paths)
+	return paths, nil
+}


⚠️ Potential issue | 🟠 Major

listAllPullRequestFilePaths only fetches the first page — PRs with >100 changed files will have truncated file lists.

The function passes PerPage: 100 but never loops on resp.NextPage, so only the first page of files is returned. This contrasts with the paginated comment-fetching pattern in processIssue (index.go, Lines 226-239) and the commit message that mentions "paginate GitHub PR file listings."

🐛 Proposed fix: add pagination loop

func listAllPullRequestFilePaths(ctx context.Context, gh *similiGithub.Client, org, repo string, number int) ([]string, error) { - files, _, err := gh.ListPullRequestFiles(ctx, org, repo, number, &github.ListOptions{ - PerPage: 100, - }) - if err != nil { - return nil, err - } - - paths := make([]string, 0, len(files)) - for _, f := range files { - name := strings.TrimSpace(f.GetFilename()) - if name != "" { - paths = append(paths, name) + var paths []string + opts := &github.ListOptions{PerPage: 100} + for { + files, resp, err := gh.ListPullRequestFiles(ctx, org, repo, number, opts) + if err != nil { + return nil, err + } + for _, f := range files { + name := strings.TrimSpace(f.GetFilename()) + if name != "" { + paths = append(paths, name) + } + } + if resp == nil || resp.NextPage == 0 { + break } + opts.Page = resp.NextPage } sort.Strings(paths) return paths, nil }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func listAllPullRequestFilePaths(ctx context.Context, gh *similiGithub.Client, org, repo string, number int) ([]string, error) {

files, _, err := gh.ListPullRequestFiles(ctx, org, repo, number, &github.ListOptions{

PerPage: 100,

})

if err != nil {

return nil, err

}

paths := make([]string, 0, len(files))

for _, f := range files {

name := strings.TrimSpace(f.GetFilename())

if name != "" {

paths = append(paths, name)

}

}

sort.Strings(paths)

return paths, nil

}

func listAllPullRequestFilePaths(ctx context.Context, gh *similiGithub.Client, org, repo string, number int) ([]string, error) {

var paths []string

opts := &github.ListOptions{PerPage: 100}

for {

files, resp, err := gh.ListPullRequestFiles(ctx, org, repo, number, opts)

if err != nil {

return nil, err

}

for _, f := range files {

name := strings.TrimSpace(f.GetFilename())

if name != "" {

paths = append(paths, name)

}

}

if resp == nil || resp.NextPage == 0 {

break

}

opts.Page = resp.NextPage

}

sort.Strings(paths)

return paths, nil

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cmd/simili/commands/pr_support.go` around lines 35 - 53, The function listAllPullRequestFilePaths currently only retrieves the first page (PerPage:100) from gh.ListPullRequestFiles; update it to paginate by looping calls to gh.ListPullRequestFiles and using the response's NextPage (or setting ListOptions.Page) to fetch subsequent pages until NextPage==0, appending filenames from each page (trimming and skipping empty names) into the paths slice, and then sort and return the aggregated result; ensure you handle and propagate errors from each call and avoid duplicating entries.

I think this is not correct because gh.ListPullRequestFiles() paginates internally (see in internal/integrations/github/client.go)

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check the ListPullRequestFiles implementation to verify pagination cat internal/integrations/github/client.go | grep -A 50 "func.*ListPullRequestFiles"

Length of output: 1585

@nick1udwig You're absolutely right, and I apologize for the confusion!

I've verified the implementation in internal/integrations/github/client.go, and ListPullRequestFiles does indeed handle pagination internally with a loop that fetches all pages until resp.NextPage == 0. The function aggregates all files across pages before returning them.

The current implementation in listAllPullRequestFilePaths is correct as-is — no additional pagination loop is needed. My review comment was mistaken. Thank you for catching that! 🐰

/>\_/\ ( •.• ) > ^ < oops!

coderabbitai

🧹 Nitpick comments (2)

cmd/simili/commands/index.go (2)
314-323: Good defensive checks — consider parity with processIssue.

The empty-text guard (line 315) and empty-chunks fallback (lines 321–323) are sensible. processIssue lacks both — if an issue has an empty title, body, and no comments, it would pass empty chunks to EmbedBatch. Consider backporting these guards to processIssue for consistency.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/simili/commands/index.go` around lines 314 - 323, Add the same defensive
checks used in the PR flow into processIssue: after building the text for an
issue (the counterpart of buildPRMetadataText) verify
strings.TrimSpace(issueText) != "" and log+return when empty, and after calling
splitter.SplitText(issueText) ensure if len(chunks) == 0 you set chunks =
[]string{issueText} before calling EmbedBatch; reference the processIssue
function, the local issueText/variable that holds the concatenated
title/body/comments, splitter.SplitText, and the call to EmbedBatch to locate
where to add these guards.
336-356: Minor style inconsistencies with processIssue.

Two small divergences from processIssue:

map[string]interface{} (line 342) vs map[string]any (line 278 in processIssue). Since Go 1.18, any is the idiomatic alias.

[]byte(fmt.Sprintf(...)) (line 338) vs fmt.Appendf(nil, ...) (line 274 in processIssue). Both work identically, but using the same pattern avoids cognitive overhead.
♻️ Suggested alignment
-		pointID := uuid.NewMD5(uuid.NameSpaceURL, []byte(fmt.Sprintf("%s/%s/pr/%d/chunk/%d", org, repo, prNumber, i))).String()
+		pointID := uuid.NewMD5(uuid.NameSpaceURL, fmt.Appendf(nil, "%s/%s/pr/%d/chunk/%d", org, repo, prNumber, i)).String()
 		points[i] = &qdrant.Point{
 			ID:     pointID,
 			Vector: embeddings[i],
-			Payload: map[string]interface{}{
+			Payload: map[string]any{
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/simili/commands/index.go` around lines 336 - 356, In the loop that builds
Qdrant points (points, pointID, embeddings, chunks) align styles with
processIssue: change the Payload type from map[string]interface{} to
map[string]any and build the UUID input using fmt.Appendf(nil,
"%s/%s/pr/%d/chunk/%d", org, repo, prNumber, i) instead of
[]byte(fmt.Sprintf(...)); update the uuid.NewMD5 call to use that fmt.Appendf
result so the point ID generation and payload type match processIssue’s
patterns.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@cmd/simili/commands/index.go`:
- Around line 121-127: The PR collection creation now correctly uses the
auto-detected embeddingDimensions when calling qdrantClient.CreateCollection (in
the block guarded by indexIncludePRs and prCollection != cfg.Qdrant.Collection),
but you should ensure consistency everywhere: confirm
qdrantClient.CreateCollection is passed embeddingDimensions (not
cfg.Embedding.Dimensions) for PR collections and replace any remaining
occurrences that pass cfg.Embedding.Dimensions for prCollection with the
embeddingDimensions variable to avoid dimension mismatches; keep the existing
conditional that checks indexIncludePRs and prCollection !=
cfg.Qdrant.Collection and preserve error handling in that block.

---

Nitpick comments:
In `@cmd/simili/commands/index.go`:
- Around line 314-323: Add the same defensive checks used in the PR flow into
processIssue: after building the text for an issue (the counterpart of
buildPRMetadataText) verify strings.TrimSpace(issueText) != "" and log+return
when empty, and after calling splitter.SplitText(issueText) ensure if
len(chunks) == 0 you set chunks = []string{issueText} before calling EmbedBatch;
reference the processIssue function, the local issueText/variable that holds the
concatenated title/body/comments, splitter.SplitText, and the call to EmbedBatch
to locate where to add these guards.
- Around line 336-356: In the loop that builds Qdrant points (points, pointID,
embeddings, chunks) align styles with processIssue: change the Payload type from
map[string]interface{} to map[string]any and build the UUID input using
fmt.Appendf(nil, "%s/%s/pr/%d/chunk/%d", org, repo, prNumber, i) instead of
[]byte(fmt.Sprintf(...)); update the uuid.NewMD5 call to use that fmt.Appendf
result so the point ID generation and payload type match processIssue’s
patterns.

nick1udwig added 5 commits February 17, 2026 15:04

add (provisional) indexing and querying of prs

919ddc3

split up runPRDuplicate

1e25a7a

chore: reorder qdrant env vars in sample

98d2d85

fix: make batch LLM initialization non-fatal

bb72bc6

fix: paginate pull request file listing in github client

52a442c

nick1udwig requested a review from Kavirubc as a code owner February 18, 2026 18:50

coderabbitai bot reviewed Feb 18, 2026

View reviewed changes

fix: use resolved embedding dimensions for PR collection

ed8f93a

coderabbitai bot reviewed Feb 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index and query prs#82

index and query prs#82
nick1udwig wants to merge 6 commits intosimiligh:mainfrom
nick1udwig:index-and-query-prs-3

nick1udwig commented Feb 18, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 18, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot Feb 18, 2026 •

edited

Loading

Uh oh!

nick1udwig Feb 18, 2026

Uh oh!

coderabbitai bot Feb 18, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nick1udwig commented Feb 18, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Related Issues

Changes Made

Testing

Screenshots (if applicable)

Checklist

Additional Notes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nick1udwig Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nick1udwig commented Feb 18, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 18, 2026 •

edited

Loading

coderabbitai bot Feb 18, 2026 •

edited

Loading