feat: hybrid search implementation#82
Conversation
This commit introduces a new hybrid search adapter for the `@vtexdocs/components` package, allowing integration with the Help Center's API while maintaining backward compatibility with Algolia. Key changes include the addition of a new `HybridSearchConfig` interface, updates to the `search-config.ts` file to support hybrid search, and modifications to the `SearchConfig` function to handle both Algolia and hybrid configurations. The implementation aims for minimal code changes and reuses existing components.
This update modifies the request selection logic in the `search-config.ts` file to prioritize requests with a non-empty query. If no such request is found, it defaults to the first request in the array. This change enhances the hybrid search functionality by ensuring more relevant queries are processed.
| export interface HybridSearchConfig { | ||
| apiEndpoint: string | ||
| source: 'help-center' | 'dev-portal' | ||
| defaultLimit?: number |
There was a problem hiding this comment.
The helpcenter consumer (PR vtexdocs/helpcenter#456, src/utils/libraryConfig.ts:13) passes itemsPerPage: 10 into this config object, expecting it to set the default page size. With the field named defaultLimit here, that value is silently ignored — the destructure on line 77 falls back to the hardcoded 10.
Suggest aligning the names so the contract is consistent. Two options:
- Rename here to
itemsPerPage?: number(matches the surrounding InstantSearch / Algolia vocabulary; no consumer change needed). - Keep
defaultLimitand updatelibraryConfig.tson the helpcenter side to use the same name.
Either works; option 1 is friendlier to existing widget terminology, but option 2 keeps the components-side semantics explicit. Worth picking one before merge so the field actually has effect.
There was a problem hiding this comment.
Another factor for option 2 is to keep components compatible with other portals.
| content: result.snippet || result.content || '', | ||
| hierarchy, | ||
| language: result.metadata?.locale || 'en', | ||
| type: 'content', | ||
| _highlightResult: { | ||
| content: { | ||
| value: result.snippet || result.content || '', | ||
| matchLevel: 'full', | ||
| fullyHighlighted: false, | ||
| matchedWords: [], | ||
| }, | ||
| hierarchy: { | ||
| lvl0: { | ||
| value: hierarchy.lvl0, | ||
| matchLevel: 'none', | ||
| }, | ||
| lvl1: { | ||
| value: hierarchy.lvl1, | ||
| matchLevel: result.title ? 'partial' : 'none', | ||
| }, | ||
| }, | ||
| }, | ||
| _snippetResult: { | ||
| content: { | ||
| value: result.snippet || '', | ||
| matchLevel: 'full', | ||
| }, | ||
| }, | ||
| } |
There was a problem hiding this comment.
Issue: snippets render as raw markdown in the Help Center search results
A query like sku surfaces snippets such as | sku_manufacturer_code | character varying(65535) | Code used by merchant to reference the manufacturer. — literal markdown table syntax instead of plain text.
Why
The upstream /api/hybrid-search returns each hit's snippet as a raw substring of the indexed .md source. In transformHybridToAlgolia, that raw string is forwarded into the InstantSearch hit shape at three points:
content(line 337)_highlightResult.content.value(line 343)_snippetResult.content.value(line 361)
connectHighlight then renders _highlightResult.content.value as plain text inside SearchCard, so markdown characters appear verbatim. Algolia does not have this problem because its indexing pipeline strips markdown before storing the content attribute, so the search client only ever sees plain text.
Recommendation
Strip markdown inside transformHybridToAlgolia before assigning the snippet:
const cleanedSnippet = stripMarkdown(result.snippet || result.content || '')
// use `cleanedSnippet` for `content`, `_highlightResult.content.value`, `_snippetResult.content.value`A small regex pass (headings, emphasis, links, code fences, table pipes) or a strip-markdown + remark round-trip is enough.
Doing it here — rather than in customHighlight.tsx — avoids corrupting InstantSearch's highlight boundaries, which by that point are already split fragments. The adapter is the single choke point through which every hybrid hit flows, so the fix stays isolated.
Long-term
Proper fix is upstream in vtexdocs/vtexdocs-mcp-app — either index plain text or return a sanitized snippet. Same surface as vtexdocs-mcp-app#46 (server-side facet counts) and tracked in EDU-18399. Once that lands, the adapter-level stripping can be removed.
Description
This PR adds hybrid search backend support to
@vtexdocs/componentsas an alternative to Algolia, enabling both Help Center and Dev Portal to use the new VTEX Docs Hybrid Search API.Changes:
SearchConfigwith new backend option: { backend: 'hybrid', hybrid: {...} }InstantSearchqueries to/api/searchcalls and transforms responses to Algolia-compatible format.HybridSearchConfigandSearchBackendConfig.Related:
Types of changes