Skip to content

Aksel588/OpenKL

 
 

Repository files navigation

OpenKL

OpenKL is a high-performance memory pooling library for accelerator-style workloads, with first-class C++ and Python APIs.

It supports CUDA when available and falls back to host-backed allocation stubs when CUDA is not present (useful for macOS/Windows development machines).

Highlights

  • Fixed-size MemoryPool with O(1) allocation/deallocation.
  • Variable-size SlabPool using configurable size classes.
  • Batch allocation APIs and detailed telemetry/stats.
  • Runtime exhaustion policies (Throw, Wait, Grow scaffold, FallbackRaw).
  • RAII and typed helpers (PooledPtr, allocate_t<T>()).
  • No-exception pathways (Status, ErrorCode, C API _ex calls).
  • Multi-GPU routing with affinity policies and peer-access helpers.
  • C API for embedding and Python bindings via pybind11.
  • Optional backend abstraction scaffold (host, CUDA, HIP stub).

Requirements

  • CMake 3.18+
  • C++17 compiler
  • Python 3.8+ (for Python package/tests)
  • Optional CUDA: CUDA 11+ and nvcc in PATH

Quick Start

Build (C++)

cd /path/to/OpenKL
./build.sh

This configures, builds, and runs C++ tests.

Windows helpers:

.\build_windows.ps1
build_windows.bat

Python

cd /path/to/OpenKL/python
python -m pip install -e .

or from repo root:

pip install -e ./python

Usage

C++: MemoryPool

#include "openkl/memory_pool.hpp"

openkl::MemoryPoolOptions opts;
opts.thread_safe = true;
opts.debug = true;
opts.alignment = 64;
opts.exhaustion_policy = openkl::ExhaustionPolicy::FallbackRaw;

openkl::MemoryPool pool(4096, 1024, opts);
void* ptr = pool.allocate();
pool.deallocate(ptr);

auto st = pool.stats();

C++: SlabPool

#include "openkl/slab_pool.hpp"

auto classes = openkl::SlabPool::default_classes(64, 1024 * 1024, 512);
openkl::SlabPool slab(classes);
void* p = slab.allocate(200);   // chooses best fitting class
slab.deallocate(p);

Python

import openkl

pool = openkl.MemoryPool(
    block_size=4096,
    num_blocks=1024,
    thread_safe=True,
    debug=True,
    alignment=64,
    exhaustion_policy=openkl.ExhaustionPolicy.Throw,
)

ptr = pool.allocate()
pool.deallocate(ptr)
print(pool.stats().in_use, pool.stats().free_blocks)

slab = openkl.make_default_slab(64, 1024 * 1024, 256, debug=True)
p = slab.allocate(128)
slab.deallocate(p)

C API (no-exception style)

#include "openkl/c_api.h"

openkl_pool* pool = openkl_pool_create_ex(4096, 1024, 1, 1, 64, 0);
void* ptr = NULL;
openkl_error_code ec = openkl_pool_alloc_ex(pool, &ptr);
if (ec == OPENKL_OK) {
  openkl_pool_free_ex(pool, ptr);
}
openkl_pool_destroy(pool);

API Overview

MemoryPool

Core methods:

  • allocate(), deallocate()
  • allocate_batch(n), deallocate_batch(ptrs)
  • try_allocate(out_ptr), try_deallocate(ptr)
  • allocate_owned()
  • fragmentation(), stats()
  • set_exhaustion_policy(), set_wait_timeout(), set_device_id()
  • reserve(), reset(), validate(), for_each_allocation()

Metrics helpers:

  • capacity_bytes(), in_use_bytes(), free_bytes()

CUDA-only methods:

  • allocate_async(stream)
  • deallocate_async(ptr, stream) (thread_safe=true required)

SlabPool

  • allocate(size), allocate_exact(size), deallocate(ptr)
  • default_classes(min, max, blocks_per_class)
  • set_compaction_policy(...)
  • stats(), slab_stats()
  • class_for_size(size), rebalance(...) (rebalance is currently scaffolded)

MultiGPUPool

  • allocate(device_id), allocate_auto(), allocate_on_best_fit(size_bytes)
  • deallocate(ptr, device_id), batch variants
  • set_affinity_policy(...), set_device_weight(...)
  • stats(device_id), stats_all(), stats_aggregate()
  • peer_access_supported(src, dst), copy_peer(...)

Error model

  • Exception classes for C++ high-level APIs.
  • Status + ErrorCode for no-exception workflows.
  • C API error retrieval via openkl_last_error().

Platform / Backend behavior

  • CUDA enabled: device allocations via CUDA runtime.
  • CUDA disabled: host-backed aligned allocation stubs.
  • Backend abstraction exists in source (backend.hpp + implementations):
    • host backend
    • CUDA backend
    • HIP backend scaffold (stub)

Platform helpers:

  • get_os(), get_os_name()
  • cuda_available(), get_device_count(), get_device_name(device_id)

Build and Test

C++

./build.sh

Manual:

mkdir -p build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
cmake --build .
ctest --output-on-failure

Useful CMake options:

  • -DOPENKL_USE_CUDA=OFF
  • -DOPENKL_BUILD_PYTHON=OFF
  • -DOPENKL_BUILD_TESTS=OFF
  • -DOPENKL_BUILD_BENCHMARKS=OFF
  • -DOPENKL_BUILD_SHARED=ON

Python

PYTHONPATH=python python tests/test_all.py

or

PYTHONPATH=python python -m pytest tests/test_all.py -v

Note: use the same Python interpreter for build and test when using native extensions.


Benchmark

From build/:

./bench_alloc

When CUDA is not enabled, CUDA baseline is skipped and the pool path still runs.


Project Structure

OpenKL/
├── include/openkl/      # Public C++ headers
├── src/                 # C++ implementations
├── python/              # Python package + pybind bindings
├── tests/               # C++ and Python tests
├── benchmarks/          # Benchmarks
├── build.sh             # Unix/macOS build+test helper
├── build_windows.ps1    # PowerShell build+test helper
├── build_windows.bat    # CMD build+test helper
└── CMakeLists.txt

License

Apache-2.0. See LICENSE.


Credits

Developed by Aksel AghajanyanAqwel AI.
GitHub: Aksel588

About

OpenKL is an open-source GPU memory management library for high-performance computing and ML workloads.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Makefile 87.8%
  • C++ 8.4%
  • CMake 2.2%
  • Python 1.3%
  • C 0.2%
  • PowerShell 0.1%