Merge test/fix index dorivika #59
Open
Dead-Bytes wants to merge 2 commits into
Open
Conversation
[TEST] bugfix/indexed-commit-display — resolve bytebell ls commit display
There was a problem hiding this comment.
Pull request overview
This PR fixes issue #55 where GitHub repositories could display commitId: "latest" as if it were a real indexed commit. It introduces shared helpers to validate/normalize full 40-character Git SHAs, resolves the indexed commit from valid data only, and hardens both API responses and write paths to prevent invalid commit values from leaking or being persisted.
Changes:
- Added commit SHA validation/normalization helpers (
isFullCommitHash,normalizeCommitHashes,resolveIndexedCommit) and tests in@bb/types. - Normalized
/api/v1/repossource payloads (including legacy shapes) socommitId="latest"doesn’t reach CLI consumers. - Updated CLI listing/details and GitHub pull no-op/validation logic to use resolved, normalized SHAs; added write-side guards in Mongo persistence.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/types/src/knowledge.ts | Adds commit-hash helpers and extends commitHashes typing to support legacy record entries. |
| packages/types/src/knowledge.test.ts | Adds unit tests covering valid/invalid hashes, legacy shapes, normalization, and resolution behavior. |
| packages/types/src/index.ts | Re-exports new helpers/types from @bb/types. |
| packages/server/src/reposRoute.ts | Normalizes source in /api/v1/repos responses to avoid leaking legacy latest. |
| packages/server/src/knowledgeSourcePresenter.ts | Adds presenter utilities to merge/normalize legacy repo metadata into a clean KnowledgeSource. |
| packages/server/src/knowledgeSourcePresenter.test.ts | Adds tests ensuring legacy latest doesn’t leak and current sources are normalized. |
| packages/server/src/githubPullRoute.ts | Uses shared hash validation + resolved indexed commit for no-op comparisons and validation. |
| packages/mongo/src/knowledge.ts | Adds write-side commit validation/normalization guards for setKnowledgeCommit and upsertKnowledge. |
| packages/ingest-github/src/pipeline/run.ts | Validates and lowercases resolved HEAD commit before persisting. |
| packages/ingest-github/src/pipeline/pull.ts | Validates target/HEAD SHAs and uses resolved current commit for no-op checks. |
| packages/cli/src/repoSelectorPrompt.ts | Displays resolved commit and normalized commit count in repo selector details. |
| packages/cli/src/LsCommand.ts | Displays resolved commit and normalized commit count in bytebell ls. |
Comments suppressed due to low confidence (2)
packages/server/src/githubPullRoute.ts:70
- The 422 response message says the knowledge "has no commitId", but this branch now triggers when there is no resolvable full SHA (e.g. legacy
commitId="latest"with empty/invalidcommitHashes). Updating the message to reflect "no valid indexed commit" would make the error accurate and easier to debug.
const currentCommit = resolveIndexedCommit(knowledge.source);
if (currentCommit === undefined) {
res.status(422).json({
error: "knowledge not yet indexed",
message: "pull requires a previously-indexed commit; this knowledge has no commitId. Run github_index first.",
});
packages/ingest-github/src/pipeline/pull.ts:63
- The error text says the knowledge "has no commitId", but the new logic rejects when there is no resolvable full SHA (including legacy
commitId="latest"). Consider updating the message to something like "no valid indexed commit" so it matches the actual failure condition.
const currentCommit = resolveIndexedCommit(knowledge.source);
if (currentCommit === undefined) {
throw new IngestError(
knowledgeId,
"pull requires a previously-indexed commit; this knowledge has no commitId. Run github_index first.",
);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+55
to
+70
| const hashes: string[] = []; | ||
| const seen = new Set<string>(); | ||
| for (const item of value) { | ||
| const hash = | ||
| typeof item === "string" | ||
| ? item | ||
| : typeof item === "object" && item !== null && typeof (item as { hash?: unknown }).hash === "string" | ||
| ? (item as { hash: string }).hash | ||
| : ""; | ||
| if (isFullCommitHash(hash)) { | ||
| const normalized = hash.toLowerCase(); | ||
| if (!seen.has(normalized)) { | ||
| seen.add(normalized); | ||
| hashes.push(normalized); | ||
| } | ||
| } |
Comment on lines
27
to
47
| /** | ||
| * Records that this knowledge is now indexed at `commitHash`. Sets it as the | ||
| * current head pointer (`source.commitId`) and appends to the deduped history | ||
| * current indexed commit pointer (`source.commitId`) and appends to the deduped history | ||
| * array (`source.commitHashes`). Idempotent: re-recording the same commit is | ||
| * a no-op except for the `updatedAt` bump. | ||
| * | ||
| * Throws `KnowledgeNotFoundError` if the document doesn't exist. | ||
| */ | ||
| export async function setKnowledgeCommit(knowledgeId: string, commitHash: string): Promise<void> { | ||
| if (!isFullCommitHash(commitHash)) { | ||
| throw new Error(`invalid commit hash for knowledge ${knowledgeId}: ${commitHash}`); | ||
| } | ||
| const normalizedCommitHash = commitHash.toLowerCase(); | ||
| const result = await _getDb() | ||
| .collection(Collections.Knowledge) | ||
| .updateOne( | ||
| { knowledgeId }, | ||
| { | ||
| $set: { "source.commitId": commitHash, updatedAt: new Date() }, | ||
| $addToSet: { "source.commitHashes": commitHash }, | ||
| $set: { "source.commitId": normalizedCommitHash, updatedAt: new Date() }, | ||
| $addToSet: { "source.commitHashes": normalizedCommitHash }, | ||
| }, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changed
/api/v1/reposresponse shaping so legacycommitId: "latest"values do not leak intobytebell ls.bytebell lsand repo selector details to display the resolved indexed commit SHA instead of trusting rawsource.commitId."latest"cannot be persisted throughsetKnowledgeCommitorupsertKnowledge."latest"metadata, mixed commit history shapes, invalid hashes, uppercase SHAs, dedupe, and empty/missing commit history.Why
Fixes #55.
bytebell lscould showlatestas the indexed commit for GitHub repositories. The issue was in the metadata/listing path: consumers trustedsource.commitIddirectly, so a stale sentinel value like"latest"was rendered as if it were a real commit.The fix makes commit display resolve only from valid 40-character Git SHAs. If
commitIdis invalid, ByteBell falls back to the newest valid SHA incommitHashes. Existing metadata is repaired at read time without requiring a Mongo migration, and future invalid writes are rejected.