Skip to content

Latest commit

 

History

History
302 lines (224 loc) · 9.67 KB

File metadata and controls

302 lines (224 loc) · 9.67 KB

libCacheSim Python Binding

Build Documentation

libCacheSim is fast with the features from underlying libCacheSim lib:

  • High performance - over 20M requests/sec for a realistic trace replay
  • High memory efficiency - predictable and small memory footprint
  • Parallelism out-of-the-box - uses the many CPU cores to speed up trace analysis and cache simulations

libCacheSim is flexible and easy to use with:

  • Seamless integration with open-source cache dataset consisting of thousands traces hosted on S3
  • High-throughput simulation with the underlying libCacheSim lib
  • Detailed cache requests and other internal data control
  • Customized plugin cache development without any compilation

Prerequisites

  • OS: Linux / macOS
  • Python: 3.9 -- 3.13

Installation

Quick Install

Binary installers for the latest released version are available at the Python Package Index (PyPI).

pip install libcachesim

Recommended Installation with uv

It's recommended to use uv, a very fast Python environment manager, to create and manage Python environments:

uv venv --python 3.12 --seed
source .venv/bin/activate
uv pip install libcachesim

Advanced Features Installation

For users who want to run LRB, ThreeLCache, and GLCache eviction algorithms:

!!! important If uv cannot find built wheels for your machine, the building system will skip these algorithms by default.

To enable them, you need to install all third-party dependencies first:

git clone https://github.com/cacheMon/libCacheSim-python.git
cd libCacheSim-python
bash scripts/install_deps.sh

# If you cannot install software directly (e.g., no sudo access)
bash scripts/install_deps_user.sh

Then, you can reinstall libcachesim using the following commands (may need to add --no-cache-dir to force it to build from scratch):

# Enable LRB
CMAKE_ARGS="-DENABLE_LRB=ON" uv pip install libcachesim
# Enable ThreeLCache
CMAKE_ARGS="-DENABLE_3L_CACHE=ON" uv pip install libcachesim
# Enable GLCache
CMAKE_ARGS="-DENABLE_GLCACHE=ON" uv pip install libcachesim

Installation from sources

If there are no wheels suitable for your environment, consider building from source.

bash scripts/install.sh

Run all tests to ensure the package works.

python -m pytest tests/

Quick Start

Cache Simulation

With libcachesim installed, you can start cache simulation for some eviction algorithm and cache traces:

import libcachesim as lcs

# Step 1: Get one trace from S3 bucket
URI = "cache_dataset_oracleGeneral/2007_msr/msr_hm_0.oracleGeneral.zst"
dl = lcs.DataLoader()
dl.load(URI)

# Step 2: Open trace and process efficiently
reader = lcs.TraceReader(
    trace = dl.get_cache_path(URI),
    trace_type = lcs.TraceType.ORACLE_GENERAL_TRACE,
    reader_init_params = lcs.ReaderInitParam(ignore_obj_size=False)
)

# Step 3: Initialize cache
cache = lcs.S3FIFO(cache_size=1024*1024)

# Step 4: Process entire trace efficiently (C++ backend)
obj_miss_ratio, byte_miss_ratio = cache.process_trace(reader)
print(f"Object miss ratio: {obj_miss_ratio:.4f}, Byte miss ratio: {byte_miss_ratio:.4f}")

# Step 4.1: Process with limited number of requests
cache = lcs.S3FIFO(cache_size=1024*1024)
obj_miss_ratio, byte_miss_ratio = cache.process_trace(
    reader,
    start_req=0,
    max_req=1000
)
print(f"Object miss ratio: {obj_miss_ratio:.4f}, Byte miss ratio: {byte_miss_ratio:.4f}")

Basic Usage

import libcachesim as lcs

# Create a cache
cache = lcs.LRU(cache_size=1024*1024)  # 1MB cache

# Process requests
req = lcs.Request()
req.obj_id = 1
req.obj_size = 100

print(cache.get(req))  # False (first access)
print(cache.get(req))  # True (second access)

Trace Analysis

Here is an example demonstrating how to use TraceAnalyzer:

import libcachesim as lcs

# Step 1: Get one trace from S3 bucket
URI = "cache_dataset_oracleGeneral/2007_msr/msr_hm_0.oracleGeneral.zst"
dl = lcs.DataLoader()
dl.load(URI)

reader = lcs.TraceReader(
    trace = dl.get_cache_path(URI),
    trace_type = lcs.TraceType.ORACLE_GENERAL_TRACE,
    reader_init_params = lcs.ReaderInitParam(ignore_obj_size=False)
)

analysis_option = lcs.AnalysisOption(
        req_rate=True,  # Keep basic request rate analysis
        access_pattern=False,  # Disable access pattern analysis
        size=True,  # Keep size analysis
        reuse=False,  # Disable reuse analysis for small datasets
        popularity=False,  # Disable popularity analysis for small datasets (< 200 objects)
        ttl=False,  # Disable TTL analysis
        popularity_decay=False,  # Disable popularity decay analysis
        lifetime=False,  # Disable lifetime analysis
        create_future_reuse_ccdf=False,  # Disable experimental features
        prob_at_age=False,  # Disable experimental features
        size_change=False,  # Disable size change analysis
    )

analysis_param = lcs.AnalysisParam()

analyzer = lcs.TraceAnalyzer(
    reader, "example_analysis", analysis_option=analysis_option, analysis_param=analysis_param
)

analyzer.run()

Plugin System

libCacheSim allows you to develop your own cache eviction algorithms and test them via the plugin system without any C/C++ compilation required.

Plugin Cache Overview

The PluginCache allows you to define custom caching behavior through Python callback functions. You need to implement these callback functions:

Function Signature Description
init_hook ((common_cache_params: CommonCacheParams)) -> Any Initialize your data structure
hit_hook (data: Any, request: Request) -> None Handle cache hits
miss_hook (data: Any, request: Request) -> None Handle cache misses
eviction_hook (data: Any, request: Request) -> int Return object ID to evict
remove_hook (data: Any, obj_id: int) -> None Clean up when object removed
free_hook (data: Any) -> None [Optional] Final cleanup

Example: Implementing LRU via Plugin System

from collections import OrderedDict
from typing import Any

from libcachesim import PluginCache, LRU, CommonCacheParams, Request

def init_hook(_: CommonCacheParams) -> Any:
    return OrderedDict()

def hit_hook(data: Any, req: Request) -> None:
    data.move_to_end(req.obj_id, last=True)

def miss_hook(data: Any, req: Request) -> None:
    data.__setitem__(req.obj_id, req.obj_size)

def eviction_hook(data: Any, _: Request) -> int:
    return data.popitem(last=False)[0]

def remove_hook(data: Any, obj_id: int) -> None:
    data.pop(obj_id, None)

def free_hook(data: Any) -> None:
    data.clear()

plugin_lru_cache = PluginCache(
    cache_size=128,
    cache_init_hook=init_hook,
    cache_hit_hook=hit_hook,
    cache_miss_hook=miss_hook,
    cache_eviction_hook=eviction_hook,
    cache_remove_hook=remove_hook,
    cache_free_hook=free_hook,
    cache_name="Plugin_LRU",
)

reader = lcs.SyntheticReader(num_objects=1000, num_of_req=10000, obj_size=1)
req_miss_ratio, byte_miss_ratio = plugin_lru_cache.process_trace(reader)
ref_req_miss_ratio, ref_byte_miss_ratio = LRU(128).process_trace(reader)
print(f"plugin req miss ratio {req_miss_ratio}, ref req miss ratio {ref_req_miss_ratio}")
print(f"plugin byte miss ratio {byte_miss_ratio}, ref byte miss ratio {ref_byte_miss_ratio}")

By defining custom hook functions for cache initialization, hit, miss, eviction, removal, and cleanup, users can easily prototype and test their own cache eviction algorithms.

Getting Help


Reference

Please cite the following papers if you use libCacheSim.
@inproceedings{yang2020-workload,
    author = {Juncheng Yang and Yao Yue and K. V. Rashmi},
    title = {A large-scale analysis of hundreds of in-memory cache clusters at Twitter},
    booktitle = {14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20)},
    year = {2020},
    isbn = {978-1-939133-19-9},
    pages = {191--208},
    url = {https://www.usenix.org/conference/osdi20/presentation/yang},
    publisher = {USENIX Association},
}

@inproceedings{yang2023-s3fifo,
  title = {FIFO Queues Are All You Need for Cache Eviction},
  author = {Juncheng Yang and Yazhuo Zhang and Ziyue Qiu and Yao Yue and K.V. Rashmi},
  isbn = {9798400702297},
  publisher = {Association for Computing Machinery},
  booktitle = {Symposium on Operating Systems Principles (SOSP'23)},
  pages = {130–149},
  numpages = {20},
  year={2023}
}

@inproceedings{yang2023-qdlp,
  author = {Juncheng Yang and Ziyue Qiu and Yazhuo Zhang and Yao Yue and K.V. Rashmi},
  title = {FIFO Can Be Better than LRU: The Power of Lazy Promotion and Quick Demotion},
  year = {2023},
  isbn = {9798400701955},
  publisher = {Association for Computing Machinery},
  doi = {10.1145/3593856.3595887},
  booktitle = {Proceedings of the 19th Workshop on Hot Topics in Operating Systems (HotOS23)},
  pages = {70–79},
  numpages = {10},
}

If you used libCacheSim in your research, please cite the above papers.


License

See LICENSE for details.