Description:
The current implementation of ferret/report.py loads the entire dataset for a given run_id into memory before processing it. specifically, the _fetch_spans method executes list(self.log_manager), which materializes every single log entry into a Python list.
For short scripts, this is fine. However, for long-running processes (e.g., a web server running for days) that generate millions of spans, this will inevitably cause an Out-of-Memory (OOM) crash, making it impossible to analyze the data.
Proposed Solution:
Refactor the reporting engine to use a Streaming Pipeline. Instead of fetching all data at once, the system should iterate over the database records one by one (using generators), update the aggregated statistics incrementally, and discard the raw records immediately.
Tasks:
Acceptance Criteria:
- The
analyze command can process a database file larger than the available RAM (e.g., a 2GB trace file on a 1GB RAM container) without crashing.
- Memory usage remains relatively flat during analysis, proportional to the number of unique function names, not the total number of logs.
Description:
The current implementation of
ferret/report.pyloads the entire dataset for a givenrun_idinto memory before processing it. specifically, the_fetch_spansmethod executeslist(self.log_manager), which materializes every single log entry into a Python list.For short scripts, this is fine. However, for long-running processes (e.g., a web server running for days) that generate millions of spans, this will inevitably cause an Out-of-Memory (OOM) crash, making it impossible to analyze the data.
Proposed Solution:
Refactor the reporting engine to use a Streaming Pipeline. Instead of fetching all data at once, the system should iterate over the database records one by one (using generators), update the aggregated statistics incrementally, and discard the raw records immediately.
Tasks:
ferret/report.py):_fetch_spansto return aGenerator[SpanModel]instead of aList[SpanModel].BeaverDBiteration is lazy (checkingself.log_managerbehavior to ensure it doesn't pre-load).analyze_runto remove thegrouped = defaultdict(list)logic, which also buffers data in memory.StreamingStatsclass that accepts a single span at a time and updatescount,total_duration,min, andmaxin-place.Acceptance Criteria:
analyzecommand can process a database file larger than the available RAM (e.g., a 2GB trace file on a 1GB RAM container) without crashing.