This document explains how to use the profiling utilities in libCacheSim to analyze performance bottlenecks in cache simulations, particularly plugin caches and trace processing operations.
The libcachesim.profiler module provides comprehensive profiling capabilities using Python's built-in cProfile along with memory and CPU monitoring. This is particularly useful for:
- Analyzing plugin cache performance bottlenecks
- Comparing different cache implementations
- Understanding where time is spent in trace processing
- Identifying memory usage patterns
- Getting detailed function-level performance data
from libcachesim.profiler import profile_cache_operations, profile_trace_processing
# Profile cache operations (individual requests)
result = profile_cache_operations(cache, reader, num_requests=1000)
print(f"Time: {result.execution_time:.4f}s, Memory: {result.memory_peak:.2f}MB")
# Profile complete trace processing
result = profile_trace_processing(cache, reader)
print(f"Miss ratio: {result.custom_metrics['obj_miss_ratio']:.4f}")from libcachesim.profiler import CacheSimProfiler
# Create profiler instance
profiler = CacheSimProfiler("profiling_results")
# Profile specific operations
result = profiler.profile_plugin_cache_operations(
cache, reader, num_requests=1000, method_name="my_plugin_cache"
)
# Generate comprehensive reports
report_file = profiler.generate_performance_report()
json_file = profiler.export_results_json()
csv_file = profiler.export_results_csv()from libcachesim.profiler import CacheSimProfiler
profiler = CacheSimProfiler()
with profiler.profile_context("custom_operation") as profile_result:
# Your code to profile here
for request in reader:
cache.get(request)
if some_condition:
break
# Add custom metrics
profile_result.custom_metrics['requests_processed'] = request_countEach profiling operation returns a ProfileResult object with:
execution_time: Total execution time in secondsmemory_peak: Peak memory usage during execution (MB)memory_current: Memory change from start to end (MB)cpu_percent: CPU usage percentagecustom_metrics: Dictionary of custom metricsprofile_stats: Detailed cProfile statistics
The profiler generates several types of output:
-
Text Reports (
profile_*.txt): Detailed cProfile output showing:- Top functions by cumulative time
- Top functions by total time
- Function call counts and timing
-
Performance Reports (
performance_report_*.txt): Summary comparing all profiled methods -
JSON Export (
profile_results_*.json): Machine-readable results for further analysis -
CSV Export (
profile_summary_*.csv): Spreadsheet-compatible summary data
Top Functions by Cumulative Time:
----------------------------------------
11007 function calls in 0.285 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.002 0.002 0.285 0.285 profiler.py:194(run_cache_operations)
1000 0.003 0.000 0.279 0.000 plugin_cache.py:118(get)
2000 0.273 0.000 0.273 0.000 {built-in method time.sleep}
1000 0.002 0.000 0.164 0.000 lru_cache.py:253(cache_hit)
This shows that time.sleep calls are taking 0.273 out of 0.285 total seconds (96% of execution time).
Plugin caches can have performance issues due to Python callback overhead. The profiler helps identify:
- Python Callback Overhead: Each cache operation calls Python functions
- Slow Hook Functions: User-defined cache hooks that are inefficient
- Memory Allocation: Excessive object creation in Python
- Data Structure Operations: Inefficient cache data structure operations
# Compare plugin cache vs native implementation
profiler = CacheSimProfiler()
# Profile native LRU
native_cache = LRU(cache_size=1024)
result1 = profiler.profile_trace_processing(native_cache, reader, "native_lru")
# Profile plugin LRU
plugin_cache = PluginCache(
cache_size=1024,
cache_init_hook=init_hook,
cache_hit_hook=hit_hook,
cache_miss_hook=miss_hook,
# ... other hooks
)
result2 = profiler.profile_trace_processing(plugin_cache, reader, "plugin_lru")
# Compare results
comparison = profiler.compare_results()
print(f"Plugin overhead: {comparison['performance_ratio']:.2f}x slower")- Profile Early: Use profiling during plugin development to catch performance issues
- Minimize Hook Complexity: Keep cache hook functions as simple as possible
- Avoid Python Callbacks in Hot Paths: Consider C++ implementation for critical operations
- Use Efficient Data Structures: Profile different Python data structures for cache storage
- Use Representative Workloads: Profile with realistic trace data and cache sizes
- Run Multiple Iterations: Performance can vary between runs
- Focus on Hot Functions: Look at cumulative time to find the biggest bottlenecks
- Compare Implementations: Use profiling to validate optimization efforts
# Track memory usage patterns
profiler = CacheSimProfiler()
with profiler.profile_context("memory_analysis") as result:
# Code that might have memory issues
large_cache = PluginCache(cache_size=100000, ...)
result.custom_metrics['cache_size'] = large_cache.cache_size
print(f"Memory peak: {result.memory_peak:.2f}MB")The provided examples/profiling_analysis.py demonstrates:
- Creating mock plugin caches with artificial delays
- Comparing native vs plugin implementations
- Identifying specific bottlenecks in cProfile output
- Generating comprehensive performance reports
Run it with:
cd libCacheSim-python
python examples/profiling_analysis.pyThis shows a 397x performance difference between fast and slow plugin implementations, with detailed function-level analysis showing exactly where time is spent.
When the full libCacheSim package is built, the profiler integrates seamlessly:
import libcachesim as lcs
from libcachesim.profiler import CacheSimProfiler
# Create actual cache and reader
cache = lcs.S3FIFO(cache_size=1024*1024)
reader = lcs.TraceReader(trace="path/to/trace", trace_type=lcs.TraceType.ORACLE_GENERAL_TRACE)
# Profile real operations
profiler = CacheSimProfiler()
result = profiler.profile_trace_processing(cache, reader)- ModuleNotFoundError for psutil: Install with
pip install psutil - Empty cProfile output: Ensure the profiled code actually runs and takes measurable time
- High memory usage: Large traces or cache sizes can use significant memory during profiling
- For large traces, use
num_requestsparameter to profile subset of requests - Clear profiler results with
profiler.clear_results()between different test runs - Use the context manager for fine-grained profiling of specific code sections
This profiling system provides the cProfile results requested in the issue and enables comprehensive performance analysis of plugin cache and trace processing operations.