Skip to content

Commit 05ac00b

Browse files
Merge pull request #83 from Build5Nines/dev
v2.2.0
2 parents 905786d + 0089157 commit 05ac00b

12 files changed

Lines changed: 1664 additions & 762 deletions

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,12 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## v2.2.0
9+
10+
Add:
11+
12+
- Added `BasicDiskVectorDatabase` to provide a basic automatically disk persistent vector database, and associated `BasicDiskVectorStore` and `BasicDiskVocabularyStore`.
13+
814
## v2.1.3
915

1016
Add:

docs/docs/persistence/index.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@ title: Data Persistence
55

66
The `Build5Nines.SharpVector` library provides easy-to-use methods for saving a memory-based vector database to a file or stream and loading it again later. This is particularly useful for caching indexed content between runs, deploying pre-built vector stores, or shipping databases with your application.
77

8+
---
9+
810
## :material-file: File Persistence
911

1012
`Build5Nines.SharpVector` supports persisting the vector database to a file.
@@ -51,6 +53,8 @@ vdb.LoadFromFile(filePath);
5153
await vdb.LoadFromFileAsync(filePath);
5254
```
5355

56+
---
57+
5458
## :material-file-move: Persist to Stream
5559

5660
The underlying methods used by `SaveToFile` and `LoadFromFile` methods for serializing the vector database to a `Stream` are available to use directly. This provides support for reading/writing to `MemoryStream` (or other streams) if the vector database needs to be persisted to something other than the local file system.
@@ -92,3 +96,30 @@ vdb.DeserializeFromBinaryStream(stream);
9296
// deserialize asynchronously from JSON stream
9397
await vdb.DeserializeFromBinaryStreamAsync(stream);
9498
```
99+
100+
---
101+
102+
## :material-file-database: BasicDiskVectorDatabase
103+
104+
The `BasicDiskVectorDatabase` provides a basic vector database implementation that automatically stores the vector store and vocabulary store to disk. It's implmentation of vectorization is the same as the `BasicMemoryVectorDatabase`, but with the modification that it automatically persists the database to disk in the background to the specified folder path.
105+
106+
Here's a basic example of using `BasicDiskVectorDatabase`:
107+
108+
```csharp
109+
// specify the folder where to persist the database data on disk
110+
var vdb = new BasicDiskVectorDatabase("C:/data/content-db");
111+
foreach (var doc in documents)
112+
{
113+
vdb.AddText(doc.Id, doc.Text);
114+
}
115+
116+
var results = vdb.Search("some text");
117+
118+
```
119+
120+
### Tips
121+
122+
- Prefer absolute paths for the storage folder in production services.
123+
- Place the folder on fast storage (SSD) for best indexing/query performance.
124+
- Avoid sharing the same folder across multiple processes concurrently.
125+
- Back up the folder regularly to preserve your vector store and vocabulary.
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
using Build5Nines.SharpVector.Id;
2+
using Build5Nines.SharpVector.Preprocessing;
3+
using Build5Nines.SharpVector.Vocabulary;
4+
using Build5Nines.SharpVector.Vectorization;
5+
using Build5Nines.SharpVector.VectorCompare;
6+
using Build5Nines.SharpVector.VectorStore;
7+
8+
namespace Build5Nines.SharpVector;
9+
10+
/// <summary>
11+
/// Base class for an on-disk vector database. Mirrors MemoryVectorDatabaseBase generic composition
12+
/// while using disk-backed stores for persistence.
13+
/// </summary>
14+
public abstract class BasicDiskMemoryVectorDatabaseBase<TId, TMetadata, TVectorStore, TVocabularyStore, TVocabularyKey, TVocabularyValue, TIdGenerator, TTextPreprocessor, TVectorizer, TVectorComparer>
15+
: VectorDatabaseBase<TId, TMetadata, TVectorStore, TVocabularyStore, TVocabularyKey, TVocabularyValue, TIdGenerator, TTextPreprocessor, TVectorizer, TVectorComparer>
16+
where TId : notnull
17+
where TVocabularyKey : notnull
18+
where TVocabularyValue : notnull
19+
where TVectorStore : IVectorStoreWithVocabulary<TId, TMetadata, TVocabularyStore, TVocabularyKey, TVocabularyValue>
20+
where TVocabularyStore : IVocabularyStore<TVocabularyKey, TVocabularyValue>
21+
where TIdGenerator : IIdGenerator<TId>, new()
22+
where TTextPreprocessor : ITextPreprocessor<TVocabularyKey>, new()
23+
where TVectorizer : IVectorizer<TVocabularyKey, TVocabularyValue>, new()
24+
where TVectorComparer : IVectorComparer, new()
25+
{
26+
protected BasicDiskMemoryVectorDatabaseBase(TVectorStore vectorStore)
27+
: base(vectorStore)
28+
{ }
29+
}
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
using Build5Nines.SharpVector.Vocabulary;
2+
using Build5Nines.SharpVector.Id;
3+
using Build5Nines.SharpVector.Preprocessing;
4+
using Build5Nines.SharpVector.Vectorization;
5+
using Build5Nines.SharpVector.VectorCompare;
6+
using Build5Nines.SharpVector.VectorStore;
7+
8+
namespace Build5Nines.SharpVector;
9+
10+
/// <summary>
11+
/// A basic disk-backed vector database using Bag-of-Words, Cosine similarity,
12+
/// disk-backed vector store and vocabulary store. Uses int IDs and string metadata.
13+
/// </summary>
14+
public class BasicDiskVectorDatabase<TMetadata>
15+
: BasicDiskMemoryVectorDatabaseBase<
16+
int,
17+
TMetadata,
18+
BasicDiskVectorStore<int, TMetadata, BasicDiskVocabularyStore<string>, string, int>,
19+
BasicDiskVocabularyStore<string>,
20+
string, int,
21+
IntIdGenerator,
22+
BasicTextPreprocessor,
23+
BagOfWordsVectorizer<string, int>,
24+
CosineSimilarityVectorComparer
25+
>, IMemoryVectorDatabase<int, TMetadata>, IVectorDatabase<int, TMetadata>
26+
{
27+
public BasicDiskVectorDatabase(string rootPath)
28+
: base(
29+
new BasicDiskVectorStore<int, TMetadata, BasicDiskVocabularyStore<string>, string, int>(
30+
rootPath,
31+
new BasicDiskVocabularyStore<string>(rootPath)
32+
)
33+
)
34+
{ }
35+
36+
[Obsolete("Use DeserializeFromBinaryStreamAsync instead.")]
37+
public override async Task DeserializeFromJsonStreamAsync(Stream stream)
38+
{
39+
await DeserializeFromBinaryStreamAsync(stream);
40+
}
41+
42+
[Obsolete("Use DeserializeFromBinaryStream instead.")]
43+
public override void DeserializeFromJsonStream(Stream stream)
44+
{
45+
DeserializeFromBinaryStream(stream);
46+
}
47+
}

src/Build5Nines.SharpVector/Build5Nines.SharpVector.csproj

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
<PackageId>Build5Nines.SharpVector</PackageId>
1010
<PackageProjectUrl>https://sharpvector.build5nines.com</PackageProjectUrl>
1111
<RepositoryUrl>https://github.com/Build5Nines/SharpVector</RepositoryUrl>
12-
<Version>2.1.3</Version>
12+
<Version>2.2.0</Version>
1313
<Description>Lightweight In-memory Vector Database to embed in any .NET Applications</Description>
1414
<Copyright>Copyright (c) 2025 Build5Nines LLC</Copyright>
1515
<PackageReadmeFile>README.md</PackageReadmeFile>

0 commit comments

Comments
 (0)