Skip to content

Add memory profiling support#59

Draft
Copilot wants to merge 5 commits intomainfrom
copilot/support-memory-profiling
Draft

Add memory profiling support#59
Copilot wants to merge 5 commits intomainfrom
copilot/support-memory-profiling

Conversation

Copy link
Copy Markdown

Copilot AI commented Feb 13, 2026

  • Create sample Rprof output file with memory profiling data (inst/samples/rprof/memory.out)
  • Update read_rprof_ll() to detect and parse memory profiling header and data prefix
  • Update rprof_to_ds() to extract memory data from trace lines
  • Update validate_profile() to support memory profiling data
  • Update ds_to_rprof() / write_rprof() for memory profiling roundtrip
  • Update pprof conversion for memory sample types
  • Add tests for memory profiling
  • Make format type-stable: sample_types always has 5 rows, samples always has 6 columns (memory columns are NA when no memory data is present)
Original prompt

This section details on the original issue you should resolve

<issue_title>Memory profiling</issue_title>
<issue_description>I received a request to support memory profiling in proffer (r-prof/proffer#32) and I am trying to figure out why proffer::pprof(memory.profiling = TRUE) does not show memory profiling in the pprof dashboard. I am wondering if profile is already capturing the memory data. I have a small example:

path <- tempfile()
Rprof(filename = path, memory.profiling = TRUE)
n <- 1e3
x <- data.frame(x = rnorm(n), y = rnorm(n))
for (i in seq_len(n)) {
  x[i, ] <- x[i, ] + 1
}
Rprof(filename = NULL)

It looks like Rprof() is correctly recording memory data.

head(summaryRprof(filename = path, memory = "both")$by.total)
#>                      total.time total.pct mem.total self.time self.pct
#> "Ops.data.frame"           0.10     83.33      89.6      0.00     0.00
#> "as.data.frame.list"       0.08     66.67      58.4      0.02    16.67
#> "as.data.frame"            0.08     66.67      58.4      0.02    16.67
#> "data.frame"               0.08     66.67      58.4      0.00     0.00
#> "<Anonymous>"              0.06     50.00      58.4      0.00     0.00
#> "do.call"                  0.06     50.00      58.4      0.00     0.00

But I am having trouble locating memory data in the data from profile::read_rprof(), which proffer uses to convert to pprof format..

packageVersion("profile")
#> [1] ‘1.0.3.9019’

samples <- profile::read_rprof(path)
#> Warning message:
#> Removing unexpected incomplete sampling information.

str(samples)
#> List of 6
#>  $ meta        : tibble [1 × 2] (S3: tbl_df/tbl/data.frame)
#>   ..$ key  : chr "version"
#>   ..$ value: chr "1.0"
#>  $ sample_types: tibble [1 × 2] (S3: tbl_df/tbl/data.frame)
#>   ..$ type: chr "samples"
#>   ..$ unit: chr "count"
#>  $ samples     : tibble [6 × 2] (S3: tbl_df/tbl/data.frame)
#>   ..$ value    : int [1:6] 1 1 1 1 1 1
#>   ..$ locations:List of 6
#>   .. ..$ 1: tibble [1 × 1] (S3: tbl_df/tbl/data.frame)
#>   .. .. ..$ location_id: int 4
#>   .. ..$ 2: tibble [13 × 1] (S3: tbl_df/tbl/data.frame)
#>   .. .. ..$ location_id: int [1:13] 5 11 16 12 14 9 7 6 13 8 ...
#>   .. ..$ 3: tibble [2 × 1] (S3: tbl_df/tbl/data.frame)
#>   .. .. ..$ location_id: int [1:2] 2 1
#>   .. ..$ 4: tibble [3 × 1] (S3: tbl_df/tbl/data.frame)
#>   .. .. ..$ location_id: int [1:3] 3 2 1
#>   .. ..$ 5: tibble [4 × 1] (S3: tbl_df/tbl/data.frame)
#>   .. .. ..$ location_id: int [1:4] 18 17 10 15
#>   .. ..$ 6: tibble [8 × 1] (S3: tbl_df/tbl/data.frame)
#>   .. .. ..$ location_id: int [1:8] 9 7 6 13 8 7 10 15
#>  $ locations   : tibble [18 × 3] (S3: tbl_df/tbl/data.frame)
#>   ..$ location_id: int [1:18] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ function_id: int [1:18] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ line       : int [1:18] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ functions   : tibble [18 × 6] (S3: tbl_df/tbl/data.frame)
#>   ..$ function_id: int [1:18] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ name       : chr [1:18] "[" "[_data_frame" "[[" "[<-" ...
#>   ..$ system_name: chr [1:18] "[" "[.data.frame" "[[" "[<-" ...
#>   ..$ filename   : chr [1:18] "" "" "" "" ...
#>   ..$ start_line : int [1:18] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .file_id   : int [1:18] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ .rprof      :List of 3
#>   ..$ header: chr "memory profiling: sample.interval=20000"
#>   ..$ files : chr(0) 
#>   ..$ traces: chr [1:6] ":478394:774433:53618096:9090:\"[<-.data.frame\" \"[<-\" " ":391832:614469:36974056:19040:\"mode\" \"%in%\" \"deparse\" \"paste\" \"deparse1\" \"force\" \"as.data.frame.nu"| __truncated__ ":559003:984469:64613920:22356:\"length\" \"[.data.frame\" \"[\" " ":484278:820469:52243128:22499:\"[[.data.frame\" \"[[\" \"[.data.frame\" \"[\" " ...
#>  - attr(*, "class")= chr "profile_data"
```</issue_description>

## Comments on the Issue (you are @copilot in this section)

<comments>
<comment_new><author>@krlmlr</author><body>
Thanks. I never actually looked into memory profiling, it's unlikely that the data is captured. Would you like to take a stab at extending the data model we're using here?</body></comment_new>
<comment_new><author>@wlandau</author><body>
I'm not sure if I will have the capacity, but I made a note in case I have extra time. </body></comment_new>
<comment_new><author>@krlmlr</author><body>
I think so, yes.</body></comment_new>
<comment_new><author>@krlmlr</author><body>
For a clean solution, we also need to think about extending the validation to support different versions of profile objects, with and without memory data. I'd like the format to be type stable, with empty all-NA columns as needed, with a version tag that allows distinguishing between different object versions.

We'd also need sample data for `.out` , and perhaps infrastructure to generate them. I don't remember where the current sample in the package c...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

- Fixes r-prof/profile#33

<!-- START COPILOT CODING AGENT TIPS -->
---Let Copilot coding agent [set things up for you](https://github.com/r-prof/profile/issues/new?title=+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits February 13, 2026 04:47
- Parse memory profiling data (small_v, big_v, nodes, dup_count) from Rprof output
- Extend sample_types table to include memory sample types
- Add memory columns to samples table
- Update validate_profile() to support memory profiling data
- Update ds_to_rprof / write_rprof for memory profiling roundtrip
- Update pprof read/write to handle memory sample types
- Add sample memory profiling data file (memory.out)
- Add comprehensive tests for memory profiling

Co-authored-by: krlmlr <1741643+krlmlr@users.noreply.github.com>
Co-authored-by: krlmlr <1741643+krlmlr@users.noreply.github.com>
Copilot AI changed the title [WIP] Add support for memory profiling in proffer Add memory profiling support Feb 13, 2026
Copilot AI requested a review from krlmlr February 13, 2026 04:53
@krlmlr
Copy link
Copy Markdown
Member

krlmlr commented Feb 13, 2026

@copilot: The columns in the data must remain the same, regardless of the presence of memory profiling data. Use long formats where appropriate, or populate with missing values.

sample_types always has 5 rows and samples always has 6 columns
(value, locations, small_v, big_v, nodes, dup_count). When no
memory profiling data is present, memory columns are all NA_integer_.

Co-authored-by: krlmlr <1741643+krlmlr@users.noreply.github.com>
Copy link
Copy Markdown
Author

Copilot AI commented Feb 13, 2026

@copilot: The columns in the data must remain the same, regardless of the presence of memory profiling data. Use long formats where appropriate, or populate with missing values.

Updated in f4636da. The format is now type-stable: sample_types always has 5 rows and samples always has 6 columns (value, locations, small_v, big_v, nodes, dup_count). When no memory profiling data is present, the memory columns are populated with NA_integer_. All 43 tests pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants