Skip to content

feat: hybrid search implementation#82

Open
barbara-celi wants to merge 6 commits into
mainfrom
feat/hybrid-search
Open

feat: hybrid search implementation#82
barbara-celi wants to merge 6 commits into
mainfrom
feat/hybrid-search

Conversation

@barbara-celi
Copy link
Copy Markdown

Description

This PR adds hybrid search backend support to @vtexdocs/components as an alternative to Algolia, enabling both Help Center and Dev Portal to use the new VTEX Docs Hybrid Search API.

Changes:

  • Extended SearchConfig with new backend option: { backend: 'hybrid', hybrid: {...} }
  • Implemented hybrid search adapter that translates InstantSearch queries to /api/search calls and transforms responses to Algolia-compatible format.
  • Exported new types: HybridSearchConfig and SearchBackendConfig.
  • Maintained full backward compatibility. Existing Algolia implementations work unchanged.
  • The hybrid backend is opt-in via configuration, with no breaking changes to component APIs.

Related:

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Requires change to documentation, which has been updated accordingly.

This commit introduces a new hybrid search adapter for the `@vtexdocs/components` package, allowing integration with the Help Center's API while maintaining backward compatibility with Algolia. Key changes include the addition of a new `HybridSearchConfig` interface, updates to the `search-config.ts` file to support hybrid search, and modifications to the `SearchConfig` function to handle both Algolia and hybrid configurations. The implementation aims for minimal code changes and reuses existing components.
This update modifies the request selection logic in the `search-config.ts` file to prioritize requests with a non-empty query. If no such request is found, it defaults to the first request in the array. This change enhances the hybrid search functionality by ensuring more relevant queries are processed.
@barbara-celi barbara-celi self-assigned this May 5, 2026
@barbara-celi barbara-celi added the release-minor Minor version bump label May 5, 2026
@barbara-celi barbara-celi changed the title [EDU-17906] - feat: hybrid search implementation feat: hybrid search implementation May 5, 2026
export interface HybridSearchConfig {
apiEndpoint: string
source: 'help-center' | 'dev-portal'
defaultLimit?: number
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The helpcenter consumer (PR vtexdocs/helpcenter#456, src/utils/libraryConfig.ts:13) passes itemsPerPage: 10 into this config object, expecting it to set the default page size. With the field named defaultLimit here, that value is silently ignored — the destructure on line 77 falls back to the hardcoded 10.

Suggest aligning the names so the contract is consistent. Two options:

  1. Rename here to itemsPerPage?: number (matches the surrounding InstantSearch / Algolia vocabulary; no consumer change needed).
  2. Keep defaultLimit and update libraryConfig.ts on the helpcenter side to use the same name.

Either works; option 1 is friendlier to existing widget terminology, but option 2 keeps the components-side semantics explicit. Worth picking one before merge so the field actually has effect.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another factor for option 2 is to keep components compatible with other portals.

Comment on lines +337 to +365
content: result.snippet || result.content || '',
hierarchy,
language: result.metadata?.locale || 'en',
type: 'content',
_highlightResult: {
content: {
value: result.snippet || result.content || '',
matchLevel: 'full',
fullyHighlighted: false,
matchedWords: [],
},
hierarchy: {
lvl0: {
value: hierarchy.lvl0,
matchLevel: 'none',
},
lvl1: {
value: hierarchy.lvl1,
matchLevel: result.title ? 'partial' : 'none',
},
},
},
_snippetResult: {
content: {
value: result.snippet || '',
matchLevel: 'full',
},
},
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: snippets render as raw markdown in the Help Center search results

A query like sku surfaces snippets such as | sku_manufacturer_code | character varying(65535) | Code used by merchant to reference the manufacturer. — literal markdown table syntax instead of plain text.

Why

The upstream /api/hybrid-search returns each hit's snippet as a raw substring of the indexed .md source. In transformHybridToAlgolia, that raw string is forwarded into the InstantSearch hit shape at three points:

  • content (line 337)
  • _highlightResult.content.value (line 343)
  • _snippetResult.content.value (line 361)

connectHighlight then renders _highlightResult.content.value as plain text inside SearchCard, so markdown characters appear verbatim. Algolia does not have this problem because its indexing pipeline strips markdown before storing the content attribute, so the search client only ever sees plain text.

Recommendation

Strip markdown inside transformHybridToAlgolia before assigning the snippet:

const cleanedSnippet = stripMarkdown(result.snippet || result.content || '')
// use `cleanedSnippet` for `content`, `_highlightResult.content.value`, `_snippetResult.content.value`

A small regex pass (headings, emphasis, links, code fences, table pipes) or a strip-markdown + remark round-trip is enough.

Doing it here — rather than in customHighlight.tsx — avoids corrupting InstantSearch's highlight boundaries, which by that point are already split fragments. The adapter is the single choke point through which every hybrid hit flows, so the fix stays isolated.

Long-term

Proper fix is upstream in vtexdocs/vtexdocs-mcp-app — either index plain text or return a sanitized snippet. Same surface as vtexdocs-mcp-app#46 (server-side facet counts) and tracked in EDU-18399. Once that lands, the adapter-level stripping can be removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-minor Minor version bump

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants