Replies: 2 comments 1 reply
-
For tests? But those could probably also work with some Search method. I'm anything but an expert on AI and vector databases, but searching for a specific vector value does not seem like the most common use case in my limited experience.
I agree. A
I think that sounds like a good idea. If the indexer thread has to remove dupes and needs to check all vectors in the list, that could take a significant amount of time if there are a lot of vectors. Maybe benchmark how long a single run might take with a large number of vectors? In my mind, it would likely have to lock access to the list for modifications, so that this might look like an online index rebuild with a table lock in an RDBMS. |
Beta Was this translation helpful? Give feedback.
-
|
I left this thread hanging... You bring up a reasonable point about equality. I think true Equality will require three functions:
I'm conflating Compare and Equals because I see them as having the same purpose here. What do you think? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
@hangy, I saw your comment, but it got hidden in the PR comment thread after the MemoryMappedFiles merged.
TLDR: It's not needed in MemoryMappedList since it does not inherit the Ilist or ICollection interfaces. However, it should be checked for duplication prior to rebuilding the index.
Contains() (in VectorList) came to be when I wanted interface support with IList and ICollection.
As it works now, it's a compliance function. The root question is: Why would someone call Contains() at all?
Even if it did a true equality check across the board (except for Id), it's not obvious why this is needed.
Unintentionally adding duplicate Vectors (if comparing by Values and OriginalText) would be a good thing to prevent. Duplicate data, aside from being, well, duplicate, can throw off the search tree. However, calling Contains() from within the Add() and AddRange() functions would get O(n) slower.
It would be even better to do this asynchronously, even at the cost of allowing duplicate data for brief periods.
In
StartIndexService(), when a modified event is invoked, enumerate through the database and cull duplicate records before rebuilding the index.On mobile devices, where the background thread is not available, this can be added as an optional function for the
VectorDatabase.RebuildSearchIndexAsync()andRebuildSearchIndexesAsync().What do you think?
Beta Was this translation helpful? Give feedback.
All reactions