-
Notifications
You must be signed in to change notification settings - Fork 82
IBX-9846: Describe Embeddings search API #3029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 5.0
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,45 @@ | ||
| --- | ||
| month_change: true | ||
| description: Embedding queries, embedding configuration, providers, and embedding search fields | ||
| --- | ||
|
|
||
| # Embeddings search reference | ||
|
|
||
| Embeddings provide vector representations of content or text, enabling semantic similarity search. | ||
| Foundational abstractions are provided for embedding-based search, while embedding providers generate vector representations. | ||
|
|
||
| ## EmbeddingQuery | ||
|
|
||
| - [`Ibexa\Contracts\Core\Repository\Values\Content\EmbeddingQuery`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-Values-Content-EmbeddingQuery.html): Represents a semantic similarity search request. | ||
|
Check failure on line 13 in docs/search/embeddings_reference/embeddings_reference.md
|
||
| It encapsulates an [Embedding](#embedding) instance and supports pagination and aggregations through the same API as standard content queries. | ||
| Embedding queries do not support criteria, sort clauses, facet builders, or spellcheck | ||
|
Check failure on line 15 in docs/search/embeddings_reference/embeddings_reference.md
|
||
|
|
||
| ## Embedding | ||
|
|
||
| - [`Ibexa\Contracts\Core\Repository\Values\Content\Query\Embedding`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-Values-Content-Query-Embedding.html): Represents the semantic input used for similarity search. | ||
|
Check failure on line 19 in docs/search/embeddings_reference/embeddings_reference.md
|
||
| Depending on the embedding provider, it can encapsulate text or vector data | ||
|
|
||
| ## Embedding providers | ||
|
|
||
| Embedding providers generate vector representations for inputs. | ||
|
|
||
| ### Provider contracts | ||
|
|
||
| - [`Ibexa\Contracts\Core\Search\Embedding\EmbeddingProviderInterface`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingProviderInterface.html): Generates embeddings | ||
|
Check failure on line 28 in docs/search/embeddings_reference/embeddings_reference.md
|
||
|
|
||
| - [`Ibexa\Contracts\Core\Search\Embedding\EmbeddingProviderRegistryInterface`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingProviderRegistryInterface.html): Lists available embedding providers | ||
|
Check failure on line 30 in docs/search/embeddings_reference/embeddings_reference.md
|
||
|
|
||
| - [`Ibexa\Contracts\Core\Search\Embedding\EmbeddingProviderResolverInterface`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingProviderResolverInterface.html): Resolves the provider for a given embedding configuration | ||
|
Check failure on line 32 in docs/search/embeddings_reference/embeddings_reference.md
|
||
|
|
||
| ## Embedding fields | ||
|
|
||
| - [`Ibexa\Contracts\Core\Search\FieldType\EmbeddingFieldFactory`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-FieldType-EmbeddingFieldFactory.html): Creates dedicated search fields that store embedding vectors | ||
|
Check failure on line 36 in docs/search/embeddings_reference/embeddings_reference.md
|
||
|
|
||
| ## Validation | ||
|
|
||
| - [`Ibexa\Contracts\Core\Repository\Values\Content\QueryValidatorInterface`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-Values-Content-QueryValidatorInterface.html): Validates embedding queries and configurations are validated before reaching the search engine | ||
|
Check failure on line 40 in docs/search/embeddings_reference/embeddings_reference.md
|
||
|
|
||
| !!! note "Taxonomy embeddings" | ||
|
|
||
| Searching for embeddings can be used to support the [Taxonomy suggestions](taxonomy.md#taxonomy-suggestions) feature. | ||
| The [`Ibexa\Contracts\Taxonomy\Search\Query\Value\TaxonomyEmbedding`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Taxonomy-Search-Query-Value-TaxonomyEmbedding.html) allows embedding queries to target taxonomy data. | ||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -1,4 +1,5 @@ | ||||||
| --- | ||||||
| month_change: true | ||||||
| description: You can search for content, locations and products by using the PHP API. Fine-tune the search with Search Criteria, Sort Clauses and Aggregations. | ||||||
| --- | ||||||
|
|
||||||
|
|
@@ -18,7 +19,7 @@ | |||||
|
|
||||||
| `SearchService` is also used in the back office of [[= product_name =]], in components such as Universal Discovery Widget or Sub-items List. | ||||||
|
|
||||||
| ### Performing a search | ||||||
| ### Perform a search | ||||||
|
|
||||||
| To search through content you need to create a [`LocationQuery`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-Values-Content-LocationQuery.html) and provide your Search Criteria as a series of Criterion objects. | ||||||
|
|
||||||
|
|
@@ -70,7 +71,7 @@ | |||||
| The difference between `query` and `filter` is only relevant when using Solr or Elasticsearch search engine. | ||||||
| With the Legacy search engine both properties give identical results. | ||||||
|
|
||||||
| #### Processing large result sets | ||||||
| #### Process large result sets | ||||||
|
|
||||||
| To process a large result set, use [`Ibexa\Contracts\Core\Repository\Iterator\BatchIterator`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-Iterator-BatchIterator.html). | ||||||
| `BatchIterator` divides the results of search or filtering into smaller batches. | ||||||
|
|
@@ -175,7 +176,7 @@ | |||||
| It's recommended to use an IDE that can recognize type hints when working with Repository Filtering. | ||||||
| If you try to use an unsupported Criterion or Sort Clause, the IDE indicates an issue. | ||||||
|
|
||||||
| ## Searching in a controller | ||||||
| ## Search in controller | ||||||
|
|
||||||
| You can use the `SearchService` or repository filtering in a controller, as long as you provide the required parameters. | ||||||
| For example, in the code below, `locationId` is provided to list all children of a location by using the `SearchService`. | ||||||
|
|
@@ -196,7 +197,7 @@ | |||||
| [[= include_file('code_samples/api/public_php_api/src/Controller/CustomFilterController.php', 16, 31) =]] | ||||||
| ``` | ||||||
|
|
||||||
| ### Paginating search results | ||||||
| ### Paginate search results | ||||||
|
|
||||||
| To paginate search or filtering results, it's recommended to use the [Pagerfanta library](https://github.com/BabDev/Pagerfanta) and [[[= product_name =]]'s adapters for it.](https://github.com/ibexa/core/blob/main/src/lib/Pagination/Pagerfanta/Pagerfanta.php) | ||||||
|
|
||||||
|
|
@@ -258,7 +259,7 @@ | |||||
| [[= include_file('code_samples/api/public_php_api/src/Command/FindComplexCommand.php', 46, 54) =]] | ||||||
| ``` | ||||||
|
|
||||||
| ### Combining independent Criteria | ||||||
| ### Combine independent Criteria | ||||||
|
|
||||||
| Criteria are independent of one another. | ||||||
| This can lead to unexpected behavior, for instance because content can have multiple locations. | ||||||
|
|
@@ -281,7 +282,7 @@ | |||||
| - the content item is visible (it has the visible location A) | ||||||
|
|
||||||
|
|
||||||
| ## Sorting results | ||||||
| ## Sort results | ||||||
|
|
||||||
| To sort the results of a query, use one of more [Sort Clauses](sort_clause_reference.md). | ||||||
|
|
||||||
|
|
@@ -295,27 +296,6 @@ | |||||
|
|
||||||
| For the full list and details of available Sort Clauses, see [Sort Clause reference](sort_clause_reference.md). | ||||||
|
|
||||||
| ## Searching in trash | ||||||
|
|
||||||
| In the user interface, on the **Trash** screen, you can search for content items, and then sort the results based on different criteria. | ||||||
| To search the trash with the API, use the `TrashService::findInTrash` method to submit a query for content items that are held in trash. | ||||||
| Searching in trash supports a limited set of Criteria and Sort Clauses. | ||||||
| For a list of supported Criteria and Sort Clauses, see [Search in trash reference](search_in_trash_reference.md). | ||||||
|
|
||||||
| !!! note | ||||||
|
|
||||||
| Searching through the trashed content items operates directly on the database, therefore you cannot use external search engines, such as Solr or Elasticsearch, and it's impossible to reindex the data. | ||||||
|
|
||||||
| ``` php | ||||||
| [[= include_file('code_samples/api/public_php_api/src/Command/FindInTrashCommand.php', 4, 6) =]]//... | ||||||
| [[= include_file('code_samples/api/public_php_api/src/Command/FindInTrashCommand.php', 35, 42) =]] | ||||||
| ``` | ||||||
|
|
||||||
| !!! caution | ||||||
|
|
||||||
| Make sure that you set the Criterion on the `filter` property. | ||||||
| It's impossible to use the `query` property, because the search in trash operation filters the database instead of querying. | ||||||
|
|
||||||
| ## Aggregation | ||||||
|
|
||||||
| !!! caution "Feature support" | ||||||
|
|
@@ -378,4 +358,147 @@ | |||||
| `null` means that a range doesn't have an end. | ||||||
| In the example all values above (and including) 60 are included in the last range. | ||||||
|
|
||||||
| See [Agrregation reference](aggregation_reference.md) for details of all available aggregations. | ||||||
| See [Aggregation reference](aggregation_reference.md) for details of all available aggregations. | ||||||
|
|
||||||
| ## Search with embeddings | ||||||
|
|
||||||
| Embeddings are numerical representations that capture the meaning of text, images, or other content. | ||||||
| Embeddings are generated by AI by converting words or documents into lists of numbers, instead of treating them as plain text. | ||||||
| Such lists, aka. vectors, can then be compared to find content with similar meaning. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
|
||||||
| Searching with embeddings enables matching content based on meaning rather than exact text matches. | ||||||
| Instead of comparing keywords, the system compares vectors that represent the semantic meaning of content and the query input. | ||||||
|
|
||||||
| !!! note "Taxonomy suggestions" | ||||||
|
|
||||||
| Embedding queries have been introduced primarily to support the [Taxonomy suggestions](taxonomy.md#taxonomy-suggestions) feature but you use them in other scenarios. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
|
||||||
|
Check warning on line 375 in docs/search/search_api.md
|
||||||
| Searching with embeddings can be combined with traditional search criteria and filters, which allows the semantic search to be constrained by content type, location, permissions, or other search criteria. | ||||||
|
Check failure on line 376 in docs/search/search_api.md
|
||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. a little bit misleading. |
||||||
|
|
||||||
| An embedding query is represented by the `Ibexa\Contracts\Core\Repository\Values\Content\EmbeddingQuery` value object. | ||||||
| The object encapsulates the vector to search for, along with configuration such as the embedding model and similarity threshold. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||||||
| The query is validated before being executed to ensure that the embedding configuration is consistent with the system setup. | ||||||
|
|
||||||
| The following components are used to build and validate embedding-based queries: | ||||||
|
|
||||||
| - [EmbeddingQuery](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-Values-Content-EmbeddingQuery.html): | ||||||
| Represents a semantic similarity search request. | ||||||
| It contains the input vector and configuration parameters such as the embedding model. | ||||||
|
Check failure on line 386 in docs/search/search_api.md
|
||||||
|
|
||||||
| - [EmbeddingQueryBuilder](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-Values-Content-EmbeddingQueryBuilder.html): | ||||||
| A fluent builder for constructing `EmbeddingQuery` instances. | ||||||
| It enforces required parameters and integrates embedding queries with the search query pipeline. | ||||||
|
Check failure on line 390 in docs/search/search_api.md
|
||||||
|
|
||||||
| - [QueryValidatorInterface](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-Values-Content-QueryValidatorInterface.html): | ||||||
| Validates embedding queries before they are passed to the search engine. | ||||||
| Implementations ensure that the embedding model exists and that vector dimensions match the configured embedding field. | ||||||
|
Check failure on line 394 in docs/search/search_api.md
|
||||||
|
|
||||||
|
|
||||||
| ### Use embedding queries in search | ||||||
|
|
||||||
| Embedding queries are executed through the search API in the same way as other search requests. | ||||||
| You build an `EmbeddingQuery` instance by using a builder and pass it to the search service. | ||||||
| Embedding queries can also be combined with filters and search criteria to narrow down results, such as by content type, location, or permissions. | ||||||
|
Check failure on line 401 in docs/search/search_api.md
|
||||||
|
|
||||||
| ``` php | ||||||
| use Ibexa\Contracts\Core\Repository\Values\Content\EmbeddingQueryBuilder; | ||||||
| use Ibexa\Contracts\Core\Repository\Values\Content\Embedding; | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no such thing like But
Suggested change
|
||||||
| use Ibexa\Contracts\Core\Repository\Values\Content\Query\Aggregation; | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||||||
|
|
||||||
| // Create an embedding object that represents the search input | ||||||
| $embedding = new Embedding('Find content similar to this text'); | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This class accepts |
||||||
|
|
||||||
| // Build the embedding query by using the fluent builder | ||||||
| $embeddingQuery = EmbeddingQueryBuilder::create() | ||||||
| ->withEmbedding($embedding) | ||||||
| ->setLimit(10) // maximum number of results | ||||||
| ->setOffset(0) // result offset for pagination | ||||||
| ->setPerformCount(true) // optionally count total matching items | ||||||
| ->setAggregations([ | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we could remove aggregations from the snippet (focus on embeddings) or: |
||||||
| new Aggregation('count_by_type'), | ||||||
| ]) | ||||||
| ->build(); | ||||||
|
|
||||||
| // Execute the query via the repository | ||||||
| $results = $repository->findContent($embeddingQuery); | ||||||
| ``` | ||||||
|
|
||||||
| The `EmbeddingQueryBuilder` ensures that the query is correctly configured before execution. | ||||||
|
|
||||||
| !!! note "Embedding query properties" | ||||||
|
|
||||||
| Embedding queries do not allow standard Query properties such as `query`, `sortClauses`, `facetBuilders`, or `spellcheck`. | ||||||
|
|
||||||
| ### Embedding configuration and providers | ||||||
|
|
||||||
| Models used to resolve embedding queries must be configured in [system configuration](configuration.md). | ||||||
| Each key defines the model's name, vector dimensionality, the field suffix used in the search index, and the embedding provider that generates vectors. | ||||||
|
|
||||||
| ``` yaml | ||||||
| ibexa: | ||||||
| system: | ||||||
| default: | ||||||
| embedding_models: | ||||||
| text-embedding-3-small: | ||||||
| name: 'text-embedding-3-small' | ||||||
| dimensions: 1536 | ||||||
| field_suffix: '3small' | ||||||
| embedding_provider: 'ibexa_openai' | ||||||
| ``` | ||||||
|
|
||||||
| For a real-life example of embedding configuration, see [Taxonomy suggestions](taxonomy.md#change-the-embedding-generation-model). | ||||||
|
|
||||||
| Embedding providers implement the contract for generating vector representations of input data. | ||||||
| At runtime, the system resolves right provider and assigns embedding generation to it. | ||||||
|
|
||||||
| - [EmbeddingConfigurationInterface](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingConfigurationInterface.html) defines how embedding models are configured in the system (model name, vector dimensionality, provider reference, field suffix). | ||||||
|
Check failure on line 454 in docs/search/search_api.md
|
||||||
|
|
||||||
| - [EmbeddingProviderInterface](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingProviderInterface.html) is the runtime contract for generating vector representations from text or other inputs. | ||||||
|
Check failure on line 456 in docs/search/search_api.md
|
||||||
|
|
||||||
| - [EmbeddingProviderRegistryInterface](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingProviderRegistryInterface.html) lists all available embedding providers. | ||||||
|
Check failure on line 458 in docs/search/search_api.md
|
||||||
|
|
||||||
| - [EmbeddingProviderResolverInterface](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingProviderResolverInterface.html) determines which provider should be used for a given embedding configuration. | ||||||
|
Check failure on line 460 in docs/search/search_api.md
|
||||||
|
|
||||||
| ### Embedding fields | ||||||
|
|
||||||
| Embedding vectors are stored in dedicated search fields that are created by `Ibexa\Contracts\Core\Search\FieldType\EmbeddingFieldFactory`. | ||||||
| These fields are then used by the search engine to perform vector similarity comparisons when embedding queries are executed. | ||||||
|
|
||||||
| ``` php | ||||||
| use Ibexa\Contracts\Core\Search\FieldType\EmbeddingFieldFactory; | ||||||
| use Ibexa\Contracts\Core\Search\Embedding\EmbeddingConfigurationInterface; | ||||||
|
|
||||||
| // $config is an existing EmbeddingConfigurationInterface | ||||||
| $factory = new EmbeddingFieldFactory($config); | ||||||
|
|
||||||
| // Create a default embedding field (type derived from config suffix) | ||||||
| $embeddingField = $factory->create(); | ||||||
| echo $embeddingField->getType(); // for example, "ibexa_dense_vector_model_123" | ||||||
|
|
||||||
| // Create a custom embedding field with a specific type | ||||||
| $customField = $factory->create('custom_embedding_type'); | ||||||
| echo $customField->getType(); // "custom_embedding_type" | ||||||
| ``` | ||||||
|
|
||||||
| For more information, see [Embeddings reference](embeddings_reference.md). | ||||||
|
|
||||||
| ## Search in trash | ||||||
|
|
||||||
| In the user interface, on the **Trash** screen, you can search for content items, and then sort the results based on different criteria. | ||||||
| To search the trash with the API, use the `TrashService::findInTrash` method to submit a query for content items that are held in trash. | ||||||
| Searching in trash supports a limited set of Criteria and Sort Clauses. | ||||||
| For a list of supported Criteria and Sort Clauses, see [Search in trash reference](search_in_trash_reference.md). | ||||||
|
|
||||||
| !!! note | ||||||
|
|
||||||
| Searching through the trashed content items operates directly on the database, therefore you cannot use external search engines, such as Solr or Elasticsearch, and it's impossible to reindex the data. | ||||||
|
|
||||||
| ``` php | ||||||
| [[= include_file('code_samples/api/public_php_api/src/Command/FindInTrashCommand.php', 4, 6) =]]//... | ||||||
| [[= include_file('code_samples/api/public_php_api/src/Command/FindInTrashCommand.php', 35, 42) =]] | ||||||
| ``` | ||||||
|
|
||||||
| !!! caution | ||||||
|
|
||||||
| Make sure that you set the Criterion on the `filter` property. | ||||||
| It's impossible to use the `query` property, because the search in trash operation filters the database instead of querying. | ||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not validate configuration. Maybe
Validates embedding queries before they reach the search engine