Skip to content

Latest commit

 

History

History
605 lines (445 loc) · 19.8 KB

File metadata and controls

605 lines (445 loc) · 19.8 KB

Third-Party Library Recommendations for MetaGraph

This document provides opinionated recommendations for third-party C libraries to handle foundational components of MetaGraph, allowing us to focus on the core meta-graph implementation rather than reinventing well-solved problems.

Selection Criteria

All recommendations must meet these requirements:

  • C23 Compatible: Works with modern C compilers and C23 standard
  • Pure C Code: No dependencies on C++, Rust, or other languages
  • High Performance: Suitable for high-performance asset management
  • Cross-Platform: Works on Windows, Linux, macOS at minimum
  • Battle-Tested: Used in production by multiple projects
  • Minimal Dependencies: Prefer header-only or minimal dependency libraries

1. BLAKE3 Cryptographic Hashing (F.004)

🏆 Primary Recommendation: Official BLAKE3 C Implementation

Repository: BLAKE3-team/BLAKE3 License: CC0-1.0 / Apache-2.0 Integration: Compile c/blake3.c with your project Fit Rating: ⭐⭐⭐⭐⭐ (5/5 stars)

The official BLAKE3 implementation provides the definitive C implementation of the algorithm with extensive SIMD optimizations.

Integration Guide

#include "blake3.h"

// One-shot hashing for small data
uint8_t hash[BLAKE3_OUT_LEN];
blake3_hasher hasher;
blake3_hasher_init(&hasher);
blake3_hasher_update(&hasher, data, data_len);
blake3_hasher_finalize(&hasher, hash, BLAKE3_OUT_LEN);

// Streaming hashing for large bundles
blake3_hasher stream_hasher;
blake3_hasher_init(&stream_hasher);

while (more_data_available) {
    size_t chunk_size = read_chunk(buffer, sizeof(buffer));
    blake3_hasher_update(&stream_hasher, buffer, chunk_size);
}

uint8_t final_hash[BLAKE3_OUT_LEN];
blake3_hasher_finalize(&stream_hasher, final_hash, BLAKE3_OUT_LEN);

MetaGraph-Specific Pitfalls

  • Large Bundle Streaming: For multi-GB bundles, always use streaming API to avoid memory exhaustion
  • Thread Safety: blake3_hasher is not thread-safe; use separate hasher instances per thread
  • Memory Management: Hasher contexts are stack-allocated, but ensure proper cleanup in error paths
  • Merkle Tree Integration: For bundle sections, coordinate hash computation with memory-mapped regions
  • Performance: Enable SIMD optimizations by compiling with appropriate flags (-mavx2, -msse4.1)

Alternative: blake3-c (Standalone C Port)

Repository: oconnor663/blake3-c License: CC0-1.0 Integration: Single header + implementation file Fit Rating: ⭐⭐⭐⭐ (4/5 stars)

A standalone C port that may be easier to integrate but potentially less optimized.

Roll Our Own Analysis

Fit Rating: ⭐ (1/5 stars) - Not recommended

Implementing a cryptographic hash function correctly requires extensive expertise and testing. Security-critical code should use battle-tested implementations.

Comparison Table

Aspect Official BLAKE3 blake3-c Roll Our Own
Performance Excellent (SIMD optimized) Good Unknown
Security Highest (official impl) High Risky
Maintenance Low (maintained by creators) Medium High
Integration Easy (compile .c files) Very Easy (header-only) Very High
Platform Support Excellent Good Unknown
Documentation Excellent Good None
Testing Extensive Limited Required

Decision: Use Official BLAKE3 for maximum performance and security assurance.


2. Threading Primitives (F.008)

🏆 Primary Recommendation: tinycthread + Compiler Atomics

Repository: tinycthread/tinycthread License: zlib/libpng Integration: Single header file Fit Rating: ⭐⭐⭐⭐ (4/5 stars)

Tinycthread provides C11-compatible threading on platforms that don't support it natively. Combined with compiler-specific atomic intrinsics for lock-free programming.

Integration Guide

#include "tinycthread.h"

// Basic threading
int worker_thread(void* arg) {
    mg_graph_t* graph = (mg_graph_t*)arg;
    // Process graph nodes...
    return 0;
}

thrd_t thread;
thrd_create(&thread, worker_thread, graph);
thrd_join(thread, NULL);

// Mutex synchronization
mtx_t graph_mutex;
mtx_init(&graph_mutex, mtx_plain);
mtx_lock(&graph_mutex);
// Critical section...
mtx_unlock(&graph_mutex);
mtx_destroy(&graph_mutex);

// Atomic operations (compiler intrinsics)
_Atomic(uint64_t) node_counter = 0;
uint64_t new_id = __atomic_fetch_add(&node_counter, 1, __ATOMIC_SEQ_CST);

// Lock-free pointer operations
_Atomic(mg_node_t*) head_node = NULL;
mg_node_t* old_head = __atomic_load(&head_node, __ATOMIC_ACQUIRE);

MetaGraph-Specific Pitfalls

  • Memory Ordering: Critical for lock-free graph algorithms; use __ATOMIC_SEQ_CST when unsure
  • ABA Problem: In lock-free node insertion/deletion, use generation counters or hazard pointers
  • Platform Atomics: Some platforms may not support all atomic operations; provide fallbacks
  • Graph Traversal: Ensure proper synchronization between readers and writers during graph modification
  • Work Stealing: Coordinate thread-local graph processing queues carefully to avoid race conditions

Alternative: Platform-Specific APIs with Custom Wrapper

Components: pthreads (Unix), Windows Threading APIs Integration: Custom abstraction layer Fit Rating: ⭐⭐⭐⭐⭐ (5/5 stars)

Direct use of platform threading APIs with a thin abstraction layer for portability.

Roll Our Own Analysis

Fit Rating: ⭐⭐ (2/5 stars) - Possible but risky

Threading is complex and error-prone. Custom implementation would essentially duplicate tinycthread's work.

Threading Comparison Table

Aspect tinycthread + atomics Platform APIs Roll Our Own
Performance Excellent Best Unknown
Portability Excellent Good (with wrapper) Good
Complexity Low Medium High
Maintenance Low Medium High
Standards Compliance High (C11 compatible) Platform-specific Variable
Debugging Good Excellent Unknown
Lock-free Support Yes (with compiler atomics) Yes Depends

Decision: Use tinycthread + compiler atomics for simplicity and portability.


3. Memory Allocation (F.009)

🏆 Primary Recommendation: mimalloc + Custom Arenas

Repository: microsoft/mimalloc License: MIT Integration: Compile mimalloc source files Fit Rating: ⭐⭐⭐⭐ (4/5 stars)

Microsoft's high-performance malloc replacement for general allocation, combined with custom arena allocators for specialized patterns.

Integration Guide

#include "mimalloc.h"

// Drop-in malloc replacement
void* ptr = mi_malloc(size);
mi_free(ptr);

// Heap-specific allocation for thread isolation
mi_heap_t* graph_heap = mi_heap_new();
mg_node_t* node = (mg_node_t*)mi_heap_malloc(graph_heap, sizeof(mg_node_t));
mi_heap_destroy(graph_heap);

// Custom arena on top of mimalloc
typedef struct {
    mi_heap_t* heap;
    uint8_t* arena_base;
    size_t arena_size;
    size_t arena_offset;
} mg_arena_t;

mg_arena_t* create_node_arena(size_t size) {
    mg_arena_t* arena = mi_malloc(sizeof(mg_arena_t));
    arena->heap = mi_heap_new();
    arena->arena_base = mi_heap_malloc(arena->heap, size);
    arena->arena_size = size;
    arena->arena_offset = 0;
    return arena;
}

void* arena_alloc(mg_arena_t* arena, size_t size, size_t align) {
    size_t aligned_offset = (arena->arena_offset + align - 1) & ~(align - 1);
    if (aligned_offset + size > arena->arena_size) return NULL;

    void* ptr = arena->arena_base + aligned_offset;
    arena->arena_offset = aligned_offset + size;
    return ptr;
}

MetaGraph-Specific Pitfalls

  • Thread-Local Heaps: Use separate heaps for graph construction vs. traversal threads
  • Arena Lifecycle: Coordinate arena destruction with graph component lifecycles
  • Memory Pressure: Monitor mimalloc statistics and implement pressure callbacks
  • NUMA Awareness: Use mi_option_set(mi_option_allow_large_os_pages, true) on NUMA systems
  • Fragmentation: Design arena allocation patterns to minimize fragmentation in long-running processes

Alternative: jemalloc

Repository: jemalloc/jemalloc License: BSD-2-Clause Integration: System library or compile from source Fit Rating: ⭐⭐⭐⭐ (4/5 stars)

Facebook's mature malloc implementation with excellent performance characteristics.

Roll Our Own Analysis

Fit Rating: ⭐⭐ (2/5 stars) - Not recommended for general allocation

Memory allocators are extremely complex. Custom arena allocators on top of proven general allocators make sense.

Memory Allocation Comparison Table

Aspect mimalloc + arenas jemalloc Roll Our Own
Performance Excellent Excellent Unknown
Memory Efficiency High High Unknown
Integration Easy More Complex High
Platform Support Excellent Good Variable
Debugging Support Good Excellent Depends
Fragmentation Low Low Unknown
Specialized Patterns Yes (custom arenas) Limited Yes

Decision: Use mimalloc + custom arenas for balance of performance and ease of integration.


4. Hash Tables and Data Structures (F.001)

🏆 Primary Recommendation: uthash

Repository: troydhanson/uthash License: BSD Integration: Single header file Fit Rating: ⭐⭐⭐⭐ (4/5 stars)

Macro-based hash table that's extremely flexible and widely used in C projects.

Integration Guide

#include "uthash.h"

// Define node structure with hash handle
typedef struct {
    mg_id_t id;           // Key
    mg_node_data_t data;  // Value
    UT_hash_handle hh;          // Hash handle (required)
} mg_node_entry_t;

// Hash table operations
mg_node_entry_t* node_table = NULL;

// Insert node
mg_node_entry_t* add_node(mg_id_t id, mg_node_data_t data) {
    mg_node_entry_t* entry;
    HASH_FIND(hh, node_table, &id, sizeof(mg_id_t), entry);
    if (entry == NULL) {
        entry = malloc(sizeof(mg_node_entry_t));
        entry->id = id;
        entry->data = data;
        HASH_ADD(hh, node_table, id, sizeof(mg_id_t), entry);
    }
    return entry;
}

// Find node
mg_node_entry_t* find_node(mg_id_t id) {
    mg_node_entry_t* entry;
    HASH_FIND(hh, node_table, &id, sizeof(mg_id_t), entry);
    return entry;
}

// Iterate all nodes
mg_node_entry_t* entry, *tmp;
HASH_ITER(hh, node_table, entry, tmp) {
    // Process entry...
}

MetaGraph-Specific Pitfalls

  • Memory Integration: Replace malloc/free with mimalloc or arena allocation
  • Hash Function: Asset IDs may have patterns; consider custom hash function for better distribution
  • Iteration Safety: Use HASH_ITER for safe iteration during concurrent modifications
  • Memory Layout: Entries are linked, not contiguous; consider cache implications for traversal
  • Thread Safety: uthash is not thread-safe; coordinate with reader-writer locks

Alternative: khash

Repository: attractivechaos/klib License: MIT Integration: Single header file (part of klib) Fit Rating: ⭐⭐⭐⭐⭐ (5/5 stars)

Template-based hash library that's very fast and used in many bioinformatics tools.

Roll Our Own Analysis

Fit Rating: ⭐⭐⭐⭐ (4/5 stars) - Viable for specialized cases

Hash tables are well-understood. Custom implementation could be optimized for asset ID patterns and integrate perfectly with our memory management.

Hash Table Comparison Table

Aspect uthash khash Roll Our Own
Performance Good Excellent Potentially Best
Flexibility High Medium Highest
API Complexity Medium Low Lowest (custom)
Memory Layout Good Very Good Optimal
Integration Easy Easy Perfect
Debugging Good Good Excellent
Maintenance Low Low High

Decision: Start with uthash for rapid development, consider custom implementation for optimization later.


5. Platform Abstraction (F.010)

🏆 Primary Recommendation: Custom Thin Abstraction Layer

Implementation: Custom lightweight wrapper around platform APIs Coverage: File I/O, memory mapping, basic system info Fit Rating: ⭐⭐⭐⭐⭐ (5/5 stars)

A focused abstraction layer that covers only MetaGraph's specific needs without unnecessary complexity.

Integration Guide

// mg_platform.h - Our custom abstraction
#ifdef _WIN32
    #include <windows.h>
    typedef HANDLE mg_file_t;
    typedef HANDLE mg_mutex_t;
#else
    #include <unistd.h>
    #include <pthread.h>
    typedef int mg_file_t;
    typedef pthread_mutex_t mg_mutex_t;
#endif

// Cross-platform file operations
mg_result_t mg_file_open(const char* path, mg_file_t* file);
mg_result_t mg_file_read(mg_file_t file, void* buffer, size_t size);
mg_result_t mg_file_close(mg_file_t file);

// Memory mapping abstraction
typedef struct {
    void* address;
    size_t size;
#ifdef _WIN32
    HANDLE file_handle;
    HANDLE map_handle;
#else
    int fd;
#endif
} mg_mmap_t;

mg_result_t mg_mmap_file(const char* path, mg_mmap_t* map);
mg_result_t mg_mmap_unmap(mg_mmap_t* map);

MetaGraph-Specific Pitfalls

  • Error Code Mapping: Ensure consistent error reporting across platforms
  • Path Handling: Normalize path separators and handle Unicode properly
  • Memory Alignment: Different platforms have different alignment requirements
  • Large File Support: Ensure 64-bit file operations on all platforms
  • NUMA Detection: Platform-specific code for detecting NUMA topology

Alternative: Apache Portable Runtime (APR)

Repository: apr.apache.org License: Apache-2.0 Integration: System library dependency Fit Rating: ⭐⭐⭐ (3/5 stars)

Mature, comprehensive cross-platform abstraction used by Apache HTTP Server.

Roll Our Own Analysis

Fit Rating: ⭐⭐⭐⭐⭐ (5/5 stars) - Recommended

For our specific needs, a lightweight custom abstraction provides the best balance of performance, maintainability, and platform optimization opportunities.

Platform Abstraction Comparison Table

Aspect Custom Abstraction APR Full Custom
Size/Complexity Minimal Large Minimal
Platform Optimization Excellent Good Excellent
Maintenance Low Very Low Medium
Dependencies None External library None
Feature Coverage Exact fit Comprehensive Exact fit
Learning Curve Low Medium Low
Performance Optimal Good Optimal

Decision: Implement custom thin abstraction for maximum control and minimal overhead.


6. File I/O and Memory Mapping (F.003)

🏆 Primary Recommendation: Custom Platform-Optimized Layer

Implementation: Direct platform APIs with optimization Platforms: mmap (Unix), MapViewOfFile (Windows), io_uring (Linux), DirectStorage (Windows) Fit Rating: ⭐⭐⭐⭐⭐ (5/5 stars)

Custom implementation that can leverage platform-specific optimizations like io_uring and DirectStorage.

Integration Guide

// Platform-optimized I/O layer
#ifdef __linux__
    #include <liburing.h>
    typedef struct {
        struct io_uring ring;
        struct io_uring_sqe* sqe;
        struct io_uring_cqe* cqe;
    } mg_async_context_t;
#endif

#ifdef _WIN32
    #include <dstorage.h>
    typedef struct {
        IDStorageFactory* factory;
        IDStorageQueue* queue;
        IDStorageFile* file;
    } mg_dstorage_context_t;
#endif

// High-performance async read
mg_result_t mg_read_async(
    mg_file_t file,
    uint64_t offset,
    void* buffer,
    size_t size,
    mg_completion_callback_t callback
);

// Memory-mapped bundle access
mg_result_t mg_bundle_mmap(
    const char* bundle_path,
    mg_bundle_mmap_t* bundle
) {
#ifdef _WIN32
    // Use DirectStorage for large bundles
    if (bundle_size > DIRECTSTORAGE_THRESHOLD) {
        return mg_directstorage_map(bundle_path, bundle);
    }
#endif

#ifdef __linux__
    // Use io_uring for async operations
    return mg_uring_mmap(bundle_path, bundle);
#endif

    // Fallback to standard mmap
    return mg_standard_mmap(bundle_path, bundle);
}

MetaGraph-Specific Pitfalls

  • Large File Handling: Ensure proper 64-bit offset handling for multi-GB bundles
  • Memory Mapping Lifecycle: Coordinate with graph pointer hydration carefully
  • Platform Feature Detection: Runtime detection of DirectStorage/io_uring availability
  • Error Recovery: Handle partial reads/writes and corruption gracefully
  • Hot Reload: Support atomic remapping for live bundle updates

Alternative: Portable I/O Library

Options: APR, libuv (for async), or other cross-platform libraries Trade-off: Portability vs. platform-specific optimization Fit Rating: ⭐⭐⭐ (3/5 stars)

Roll Our Own Analysis

Fit Rating: ⭐⭐⭐⭐⭐ (5/5 stars) - Recommended

I/O patterns for asset management are specific enough that custom implementation provides the best performance and platform integration.

File I/O Comparison Table

Aspect Custom Platform Layer Portable Library Basic Abstraction
Performance Excellent Good Fair
Platform Features Full access Limited Basic
Maintenance Medium Low Low
Optimization Complete control Limited None
Complexity Medium Low Low
DirectStorage Support Yes No No
io_uring Support Yes Maybe No

Decision: Implement custom platform-optimized layer for maximum I/O performance.


Summary Recommendations

✅ Use Third-Party Libraries For

  1. BLAKE3: Official implementation (security + performance)
  2. Threading: tinycthread + compiler atomics (portability + standards)
  3. Memory Allocation: mimalloc (proven performance)
  4. Hash Tables: uthash (flexibility + ease of use)

🛠️ Roll Our Own For

  1. Platform Abstraction: Thin custom layer (minimal overhead)
  2. File I/O: Platform-optimized implementation (performance)
  3. Arena Allocators: Custom on top of mimalloc (specialized patterns)
  4. Hypergraph Core: Our unique contribution 🚀

Integration Strategy

Phase 1: Rapid Prototyping

  • Use all recommended third-party libraries
  • Focus on meta-graph algorithm implementation
  • Get working system quickly

Phase 2: Optimization

  • Profile and identify bottlenecks
  • Consider custom hash table if needed
  • Optimize I/O layer for specific access patterns

Phase 3: Production Hardening

  • Comprehensive testing across platforms
  • Security audit of cryptographic components
  • Performance optimization and tuning

License Compatibility

All recommended libraries use permissive licenses compatible with both open-source and commercial use:

  • MIT: mimalloc, khash
  • BSD: uthash, jemalloc
  • zlib/libpng: tinycthread
  • CC0/Apache: BLAKE3
  • Apache-2.0: APR

Build System Integration

Recommended CMake structure:

# Third-party dependencies
add_subdirectory(3rdparty/mimalloc)
add_subdirectory(3rdparty/blake3)

# Header-only libraries
target_include_directories(mg PRIVATE
    3rdparty/uthash/include
    3rdparty/tinycthread
)

# Platform abstraction
add_subdirectory(src/platform)

This approach allows us to focus our engineering effort on the novel meta-graph algorithms while building on a foundation of proven, high-performance libraries for the infrastructure components.