fix: tsquery only returns results if all keywords match#268
fix: tsquery only returns results if all keywords match#268raphaelgurtner wants to merge 1 commit intolangchain-ai:mainfrom
Conversation
produces ts queries with logical OR instead of logical AND
| else "" | ||
| ) | ||
| query_tsv = f"plainto_tsquery({lang} :fts_query)" | ||
| query_tsv = fr"websearch_to_tsquery({lang} regexp_replace(:fts_query, '\s', ' OR ', 'g'))" |
There was a problem hiding this comment.
I don't agree we should be using the regex to enforce OR. This will not match the user's expectation of using FTS with the user's query.
There was a problem hiding this comment.
Thanks for the quick answer! Interesting, I’ve been discussing expected behavior with a few of my colleagues and all seemed to expect the OR behavior.
The current behavior seems unexpected for queries like
- “very specific terms that result in perfect chunk retrieval with FTS”
vs.
- “very specific terms that result in perfect chunk retrieval with FTS, thank you”
As long as these queries/user prompts are passed to the retriever as-is it’s highly likely that the second query returns no result at all.
We found that behavior odd one loses the benefit of the FTS part of the hybrid search in most cases, at least in our testing.
what would you think about making the behavior configurable in HybridSearchConfig?
The current implementation for creating ts queries uses plainto_tsquery(). This produces the following TSV query:
Because of that, these queries will only return any results if ALL keywords match. This can lead to counterintuitive cases where queries might return perfect results but none if some random filler word is added.
This tiny PR changes the query creation to the following approach:
Which fixes the counterintuitive results. Unfortunately postgres does not support creating OR queries out of the box, which is why the query needs to be rewritten using regexp_replace. websearch_to_tsquery is used as it supports some operators (including OR for our case) and is recommended for raw user-supplied input https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES
PR could be extended to make the search behavior configurable via HybridSearchConfig