Skip to content
This repository was archived by the owner on Apr 7, 2026. It is now read-only.

Latest commit

 

History

History
102 lines (75 loc) · 4.97 KB

File metadata and controls

102 lines (75 loc) · 4.97 KB

Neighborly's Import/Export Functionality: A Comprehensive Guide

Introduction

Neighborly, a powerful vector database system, offers robust import and export capabilities to facilitate data migration and interoperability with other systems. This article delves into the supported file formats, the Extract, Transform, Load (ETL) processes, and best practices for efficient data migration in Neighborly.

Supported File Formats

Neighborly supports various file formats for importing and exporting data. The ContentType enum in the VectorDatabase class indicates the supported formats:

  1. CSV (Comma-Separated Values)
  2. JSON (JavaScript Object Notation)
  3. Binary

These formats cater to different use cases and integration scenarios, providing flexibility in how data is stored and transferred.

ETL (Extract, Transform, Load) Processes

Neighborly implements a flexible ETL system to handle data import and export operations. The core of this system is the EtlFactory class, which creates appropriate ETL handlers based on the content type.

Import Process

  1. Extraction: Data is read from the source file or directory.
  2. Transformation: The data is converted into Vector objects compatible with Neighborly's internal structure.
  3. Loading: Transformed data is added to the VectorList in the database.

Export Process

  1. Extraction: Vectors are retrieved from the database.
  2. Transformation: Vectors are converted into the desired output format.
  3. Loading: Transformed data is written to the specified file or directory.

The ImportDataAsync and ExportDataAsync methods in the VectorDatabase class orchestrate these processes:

public async Task ImportDataAsync(string path, bool isDirectory, ContentType contentType, CancellationToken cancellationToken = default)
{
    using var activity = StartActivity(tags: [new("import.contentType", contentType), new("import.isDirectory", isDirectory)]);
    _rwLock.EnterWriteLock();
    try
    {
        ImportingData(contentType, path);
        IETL etl = EtlFactory.CreateEtl(contentType);
        etl.IsDirectory = isDirectory;
        await etl.ImportDataAsync(path, Vectors, cancellationToken).ConfigureAwait(false);
        ImportedData(path);
        activity?.SetStatus(ActivityStatusCode.Ok);
    }
    finally
    {
        if (_rwLock.IsWriteLockHeld)
        {
            _rwLock.ExitWriteLock();
        }
    }
}

public async Task ExportDataAsync(string path, ContentType contentType, CancellationToken cancellationToken = default)
{
    using var activity = StartActivity(tags: [new("export.contentType", contentType)]);
    _rwLock.EnterReadLock();
    try
    {
        ExportingData(Vectors.Count, path, contentType);
        IETL etl = EtlFactory.CreateEtl(contentType);
        await etl.ExportDataAsync(Vectors, path, cancellationToken).ConfigureAwait(false);
        ExportedData(path);
        activity?.SetStatus(ActivityStatusCode.Ok);
    }
    finally
    {
        if (_rwLock.IsReadLockHeld)
        {
            _rwLock.ExitReadLock();
        }
    }
}

Best Practices for Data Migration

When using Neighborly's import/export functionality for data migration, consider the following best practices:

  1. Choose the Right Format: Select the appropriate file format based on your data structure and size. CSV is suitable for simple, tabular data, while JSON offers more flexibility for complex structures. Binary format is efficient for large datasets.

  2. Use Cancellation Tokens: Both import and export methods accept CancellationToken parameters. Utilize these to implement cancellation logic for long-running operations, improving application responsiveness.

  3. Handle Concurrency: The import and export methods use reader-writer locks to ensure thread safety. Be mindful of potential deadlocks when performing multiple operations concurrently.

  4. Monitor Performance: Neighborly uses activity tracking for imports and exports. Leverage this to monitor performance and identify bottlenecks in your data migration processes.

  5. Validate Data: Implement data validation checks before importing to ensure data integrity and prevent issues in the vector database.

  6. Incremental Updates: For large datasets, consider implementing incremental update strategies to minimize the amount of data transferred during each migration.

  7. Error Handling: Implement robust error handling and logging. Neighborly provides detailed logging for import and export operations, which can be crucial for troubleshooting.

  8. Test Thoroughly: Before performing large-scale data migrations, conduct thorough testing with representative datasets to ensure the process works as expected.

Conclusion

Neighborly's import/export functionality provides a flexible and efficient means of migrating data in and out of the vector database. By understanding the supported formats, ETL processes, and following best practices, you can ensure smooth data integration and migration in your Neighborly-powered applications.