Skip to content

Vector Search: hybrid search#3060

Merged
shanbady merged 32 commits intoshanbady/qdrant-upgradefrom
shanbady/sparse-hybrid-search
Mar 19, 2026
Merged

Vector Search: hybrid search#3060
shanbady merged 32 commits intoshanbady/qdrant-upgradefrom
shanbady/sparse-hybrid-search

Conversation

@shanbady
Copy link
Contributor

@shanbady shanbady commented Mar 17, 2026

What are the relevant tickets?

Closes https://github.com/mitodl/hq/issues/10380

Description (What does it do?)

This PR integrates and enables the following:

  • generation of sparse embeddings for local and deployed environments (sklearn.HashingVectorizer for local and bm25 for deployed environments via qdrant cloud inferencing)
  • use of hybrid search when searching the contentfile and resource vector endpoints
  • some configuration changes/optimizations for performance in qdrant collections

How can this be tested?

testing local hybrid search

  1. checkout this branch.
  2. make sure settings.QDRANT_SPARSE_MODEL defaults to "sklearn/hashing_vectorizer_sparse_model" and settings.QDRANT_SPARSE_ENCODER defaults to "vector_search.encoders.sparse_hash.SparseHashEncoder"
  3. rebuild your web and celery containers and do a down/up on them
  4. delete your local qdrant collections from your local qdrant dashboard
  5. make sure you have resources and contentfiles locally and generate embeddings via ./manage.py generate_embeddings --all
  6. go back to your qdrant dashboard and see that the collections have been created with hashing_vectorizer_sparse_model as the sparse model and whatever your settings.QDRANT_DENSE_MODEL has been set to as the dense model.
  7. go into the contentfiles collection on the dashboard and grab some qdrant point id
  8. run the following in the qdrant console replacing the point id with the one you found:
GET collections/resource_embeddings.content_files/points/00e468bb-93dc-576f-9df1-f045eb6c394c
  1. under the "vector" attribute of the response you should see that both the sparse and dense vectors have values populated
  2. performing searches using the vector endpoints should behave as expected although this time they are using hybrid search. the "hybrid_search=true" parameter toggles hybrid search

testing deployed/clound inferenced hybrid search

  1. perform steps 1 & 2 from the previous instructions above
  2. signup/login on qdrant
  3. you may need to ask @blarghmatey to add you to our cloud account if it is not visible
Screenshot 2026-03-17 at 11 43 05 AM
  1. open the mitol-learn-qa cluster and go to the "api keys" section. create a new api key and set settings.QDRANT_API_KEY. Set settings.QDRANT_HOST to "https://3cd6878c-6d1a-4c75-9056-840e277a0f8b.us-east-1-0.aws.cloud.qdrant.io"
  2. set settings.QDRANT_SPARSE_MODEL to "qdrant/bm25" and settings.QDRANT_SPARSE_ENCODER to "vector_search.encoders.qdrant_cloud.QdrantCloudEncoder"
  3. restart celery
  4. run ./manage.py generate_embeddings --all
  5. you should see new collections appear in the qdrant cluster (named resource_embeddings.content_files resource_embeddings.resources etc - you may need to paginate to see them in the list).
  6. perform the same console search from step 7 in the previous step and you should see the point has a bm25 vector in addition to a dense vector populated
  7. when finished testing make sure you delete the api key and collections you just created (be careful to delete the correct ones!)

Additional Context

  1. for local environments we use sklearn.HashingVectorizer as the sparse vector which is fast, does not require pre-fitting to an entire dataset and is good for local testing
  2. deployed environments use qdrant cloud inferencing which we get for free since we use their paid offering
  3. when performing search, the use of hybrid search is activated via a url GET param "hybrid_search"

@shanbady shanbady changed the base branch from main to shanbady/qdrant-upgrade March 17, 2026 15:51
@github-actions
Copy link

github-actions bot commented Mar 17, 2026

OpenAPI Changes

Show/hide 2 changes: 0 error, 0 warning, 2 info
2 changes: 0 error, 0 warning, 2 info
info	[new-optional-request-parameter] at head/openapi/specs/v0.yaml	
	in API GET /api/v0/vector_content_files_search/
		added the new optional 'query' request parameter 'hybrid_search'

info	[new-optional-request-parameter] at head/openapi/specs/v0.yaml	
	in API GET /api/v0/vector_learning_resources_search/
		added the new optional 'query' request parameter 'hybrid_search'


Unexpected changes? Ensure your branch is up-to-date with main (consider rebasing).

@shanbady shanbady marked this pull request as ready for review March 17, 2026 19:01
@shanbady shanbady added the Needs Review An open Pull Request that is ready for review label Mar 17, 2026
@abeglova abeglova self-assigned this Mar 18, 2026
"""
Return the sparse encoder based on settings
"""
Encoder = import_string(settings.QDRANT_SPARSE_ENCODER)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does import_string do here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

settings.QDRANT_SPARSE_ENCODER specifies the encoder class to instantiate (this is also how the dense encoder works)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i get it! That makes sense

collection_name=search_collection,
count_filter=search_filter,
exact=True,
exact=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this make the counts incorrect for non hybrid queries? Will that be a problem for paging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be an issue for collections with very large number of points (the contentfile chunks collection) however I don't that is not a collection i expect we will ever need to accurately paginate through (or rely on this count number). The performance tradeoff in having it inexact I think outweighs the need for an accurate count of chunks



def vector_search(
def vector_search( # noqa: PLR0913
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test for vector_search for hybrid_search=True or just group_by= null in general?

Also, this does not need to be addressed in this pr but vector_search shouldn't be in utils and should be in it's own file since it's only called by the views and is not a utility function

Copy link
Contributor

@abeglova abeglova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works great locally. I'm having trouble joining the group Tobias set up to test the qdrant cloud encoder

Comment on lines +21 to +26
try:
self.token_encoding_name = tiktoken.encoding_name_for_model(model_name)
except KeyError:
msg = f"Model {model_name} not found in tiktoken. defaulting to None"
log.warning(msg)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: In QdrantCloudEncoder.__init__, a KeyError exception logs a fallback to None but doesn't assign self.token_encoding_name, leading to a potential AttributeError.
Severity: MEDIUM

Suggested Fix

In the except KeyError block of the QdrantCloudEncoder.__init__ method, add the line self.token_encoding_name = None after the log warning. This will ensure the attribute is set as intended by the log message and prevent subsequent AttributeError exceptions.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: vector_search/encoders/qdrant_cloud.py#L21-L26

Potential issue: In the `QdrantCloudEncoder.__init__` method, if
`tiktoken.encoding_name_for_model(model_name)` raises a `KeyError`, the `except` block
logs a warning message stating it is "defaulting to None" but fails to actually assign
`self.token_encoding_name = None`. Unlike `LiteLLMEncoder`, which has a class-level
fallback, `QdrantCloudEncoder` has no such safety net. Consequently, any downstream code
attempting to access the `token_encoding_name` attribute on the instance will trigger an
`AttributeError`, causing a runtime crash.

Comment on lines +929 to 938
else:
# fallback to dense only search
search_params["using"] = encoder_dense.model_short_name()
search_params["query"] = encoder_dense.embed_query(query_string)

if "group_by" in params:
search_params.pop("search_params", None)
search_params["group_by"] = params.get("group_by")
search_params["group_size"] = params.get("group_size", 1)
group_result = client.query_points_groups(**search_params)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The vector_search function incorrectly passes with_payload and with_vectors parameters to client.query_points_groups() during grouped searches, which will cause a TypeError.
Severity: HIGH

Suggested Fix

Before calling client.query_points_groups within the if "group_by" in params: block, remove the with_payload and with_vectors keys from the search_params dictionary, similar to how search_params is removed. Add search_params.pop("with_payload", None) and search_params.pop("with_vectors", None).

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: vector_search/utils.py#L929-L938

Potential issue: In the `vector_search` function, when a `group_by` parameter is
present, the code correctly removes the `search_params` key before calling
`client.query_points_groups`. However, it fails to also remove the `with_payload` and
`with_vectors` keys from the parameters dictionary. These keys are not intended for
grouped queries in this context. Passing these extraneous keyword arguments will cause
`client.query_points_groups` to raise a `TypeError` at runtime, causing grouped searches
to fail.

@abeglova
Copy link
Contributor

works now!

@shanbady shanbady merged commit 299752c into shanbady/qdrant-upgrade Mar 19, 2026
10 checks passed
@shanbady shanbady deleted the shanbady/sparse-hybrid-search branch March 19, 2026 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Needs Review An open Pull Request that is ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants