Skip to content

fs: Add Kernel-level VFS Performance Profiler#18607

Open
Sumit6307 wants to merge 1 commit intoapache:masterfrom
Sumit6307:vfs-profiler-gsoc
Open

fs: Add Kernel-level VFS Performance Profiler#18607
Sumit6307 wants to merge 1 commit intoapache:masterfrom
Sumit6307:vfs-profiler-gsoc

Conversation

@Sumit6307
Copy link
Contributor

Note: Please adhere to Contributing Guidelines.

Summary

Currently, assessing the latency or throughput of VFS operations requires external tools, ad-hoc test apps, or complex debug setups. This makes automated performance regression testing in CI difficult.

This PR introduces a Kernel-level VFS Performance Profiler to address this gap.
By enabling the new CONFIG_FS_PROFILER configuration, the core VFS system calls (file_read, file_write, file_open, and file_close) are instrumented to track high-resolution execution times (in nanoseconds) and invocation counts seamlessly using clock_systime_timespec().

The collected statistics are exposed dynamically via a new procfs node at /proc/fs/profile. This enables any testing script, CI workflow, or user-space application to effortlessly monitor filesystem performance bottlenecks and catch regressions.

Impact

  • Users: Can now profile filesystem performance dynamically in-kernel without side-loading debugging tools by simply reading cat /proc/fs/profile.
  • Build / Size: Minimal overhead. The feature is completely guarded by Kconfig (CONFIG_FS_PROFILER). When disabled, code size and performance impact are exactly zero.
  • Architecture: Avoids blocking mutexes during profile data updates (uses enter_critical_section) to ensure SMP (multi-core) scaling is not bottlenecked.
  • Compatibility: 100% backwards compatible. Does not modify existing public VFS API or contracts.

Testing

Tested on Host: Windows 11 (via WSL2).
Tested on Board: sim:nsh (NuttX Simulator).

Test procedure:

  1. Configured the simulator environment and enabled CONFIG_FS_PROFILER=y and CONFIG_FS_PROCFS=y.
  2. Booted the simulator.
  3. Performed sequential file operations using the NSH dd command.
  4. Read the profile node to verify accuracy.

Test Log:

NuttShell (NSH) NuttX-12.0.0
nsh> dd if=/dev/zero of=/tmp/perf.bin bs=1024 count=100
102400 bytes copied in 0.015 seconds (6826666 bytes/sec)
nsh> cat /proc/fs/profile
VFS Performance Profile:
  Reads:         100 (Total time: 7543800 ns)
  Writes:        100 (Total time: 45100340 ns)
  Opens:           2 (Total time:   180045 ns)
  Closes:          2 (Total time:   120000 ns)
nsh> 

@github-actions github-actions bot added Area: File System File System issues Size: M The size of the change in this PR is medium Size: L The size of the change in this PR is large and removed Size: M The size of the change in this PR is medium labels Mar 25, 2026
@Sumit6307 Sumit6307 force-pushed the vfs-profiler-gsoc branch 2 times, most recently from 6cbaa23 to b34527a Compare March 25, 2026 19:36
@Sumit6307
Copy link
Contributor Author

Sumit6307 commented Mar 25, 2026

@acassis @cederom Sir Please Review this PR

Copy link
Contributor

@cederom cederom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Thank you @Sumit6307 very nice idea! :-)
  • My remarks noted in the code.
  • We should align the nomenclature PROFILE vs PROFILER (second seems better suited imho), as both names are used for the same functionality. Maybe PERF or PERFPROF would clearly indicate performance profiler?
  • Please also provide simple nuttx/Documentation for the new functionality.

acassis
acassis previously approved these changes Mar 25, 2026
@acassis
Copy link
Contributor

acassis commented Mar 25, 2026

@Sumit6307 why are you including mnemofs commit here?

@cederom
Copy link
Contributor

cederom commented Mar 25, 2026

@Sumit6307 why are you including mnemofs commit here?

Yup, I would put that into a separate PR too :-P

@xiaoxiang781216
Copy link
Contributor

@Sumit6307 why not reuse sched_note syscall to profile fs performance? you can learn from Documentation

@Sumit6307 Sumit6307 requested a review from raiden00pl as a code owner March 26, 2026 06:18
@github-actions github-actions bot removed Area: File System File System issues Size: L The size of the change in this PR is large labels Mar 26, 2026
@github-actions github-actions bot added Area: Documentation Improvements or additions to documentation Size: M The size of the change in this PR is medium labels Mar 26, 2026
@Sumit6307
Copy link
Contributor Author

@xiaoxiang781216 Thank you for the detailed review! I have addressed all the inline feedback:

  • Removed the unrelated mnemofs commit by rebasing onto master.
  • Pushed all internal #ifdef macros directly into fs/vfs/vfs.h and removed fs_profile.h.
  • Switched to perf_gettime() and atomic_fetch_add for zero-overhead, lockless SMP thread safety.
  • Used procfs_snprintf and updated the Kconfig as requested by @acassis and @cederom.
  • Added a profiler.rst documentation file.

Regarding sched_note:
While sched_note is fantastic for deep, host-side system tracing, it is somewhat heavyweight for simple regression tests. The goal of this specific VFS Profiler is to provide immediate, always-on, aggregated statistics (total call counts and total elapsed nanoseconds/ticks) directly on the target device.

By exposing this purely through /proc/fs/profile, automated CI test scripts (or users in NSH) can simply run cat /proc/fs/profile before and after a filesystem workload to instantly calculate throughput and regressions, without needing to dump, decode, and aggregate transient sched_note binary traces on a host machine. This makes automated on-target benchmarking vastly simpler.

The CI checks (both check and build-html) should now be strictly passing. Let me know if there's anything else needed!

@Sumit6307
Copy link
Contributor Author

@simbit18 Thank you so much for catching this! I have just updated both CMakeLists.txt files to be in perfect alignment with the Make.defs logic.

Specifically:

  • Added fs_procfsprofile.c to fs/procfs/CMakeLists.txt under if(CONFIG_FS_PROCFS_PROFILER).
  • Replaced the legacy exclude logic in fs/procfs/Make.defs to cleanly match the newest Kconfig variables.
  • Added fs_profile.c to fs/vfs/CMakeLists.txt under if(CONFIG_FS_PROFILER).

Everything should now build smoothly across both CMake and Make infrastructures. Let me know if you spot anything else!

@Sumit6307 Sumit6307 requested a review from simbit18 March 26, 2026 10:00
@acassis
Copy link
Contributor

acassis commented Mar 26, 2026

@xiaoxiang781216 Thank you for the detailed review! I have addressed all the inline feedback:

* Removed the unrelated `mnemofs` commit by rebasing onto master.

* Pushed all internal `#ifdef` macros directly into fs/vfs/vfs.h and removed fs_profile.h.

* Switched to `perf_gettime()` and atomic_fetch_add for zero-overhead, lockless SMP thread safety.

* Used `procfs_snprintf` and updated the Kconfig as requested by @acassis and @cederom.

* Added a profiler.rst documentation file.

Regarding sched_note: While sched_note is fantastic for deep, host-side system tracing, it is somewhat heavyweight for simple regression tests. The goal of this specific VFS Profiler is to provide immediate, always-on, aggregated statistics (total call counts and total elapsed nanoseconds/ticks) directly on the target device.

By exposing this purely through /proc/fs/profile, automated CI test scripts (or users in NSH) can simply run cat /proc/fs/profile before and after a filesystem workload to instantly calculate throughput and regressions, without needing to dump, decode, and aggregate transient sched_note binary traces on a host machine. This makes automated on-target benchmarking vastly simpler.

The CI checks (both check and build-html) should now be strictly passing. Let me know if there's anything else needed!

@xiaoxiang781216 maybe @Sumit6307 idea to use it for find regressions easily during the CI test is a good idea. But I still not sure if that is a good idea to keep it enabled all the time. Maybe just select some board profiles to have it enabled by default to be validated during the CI check.

@acassis
Copy link
Contributor

acassis commented Mar 26, 2026

@Sumit6307 please move the mnemofs fix to another PR to avoid polluting this PR

This adds a kernel-level performance profiler for the VFS.
By enabling CONFIG_FS_PROFILER, the core VFS system calls
(file_read, file_write, file_open, and file_close) are
instrumented to track high-resolution execution times using
clock_systime_timespec() seamlessly.

The collected statistics are exposed dynamically via a new
procfs node at /proc/fs/profile, allowing CI regression
testing without needing external debugging tools.

Signed-off-by: Sumit6307 <sumitkesar6307@gmail.com>
@Sumit6307
Copy link
Contributor Author

@Sumit6307 please move the mnemofs fix to another PR to avoid polluting this PR

@xiaoxiang781216 @acassis I deeply apologize for the confusion! When I initially opened this PR, my local branch had accidentally inherited the mnemofs-bit-pr commit history because my HEAD was not at a clean master.

I have just performed a strict git rebase and force-pushed. The PR history is now perfectly clean—it contains only my single profiler commit, and the mnemofs pollution is completely gone!

@Sumit6307 Sumit6307 requested review from acassis and anchao March 26, 2026 11:46
@Sumit6307
Copy link
Contributor Author

@acassis Thank you for validating this! I completely agree with your approach. To prevent any bloat on production hardware, I just pushed a fix changing FS_PROCFS_PROFILER to default n, so it is now cleanly disabled by default globally.

Since you believe this is valuable for CI regression testing, I would be more than happy to explicitly enable it via CONFIG_FS_PROCFS_PROFILER=y strictly inside one of the Simulation board profiles (e.g., sim:nsh or sim:citest). This ensures it is exclusively used and validated during automated tests without affecting real targets.

Which simulator defconfig would you prefer me to add it to?

@acassis
Copy link
Contributor

acassis commented Mar 26, 2026

@acassis Thank you for validating this! I completely agree with your approach. To prevent any bloat on production hardware, I just pushed a fix changing FS_PROCFS_PROFILER to default n, so it is now cleanly disabled by default globally.

Since you believe this is valuable for CI regression testing, I would be more than happy to explicitly enable it via CONFIG_FS_PROCFS_PROFILER=y strictly inside one of the Simulation board profiles (e.g., sim:nsh or sim:citest). This ensures it is exclusively used and validated during automated tests without affecting real targets.

Which simulator defconfig would you prefer me to add it to?

I think sim:citest is the right place to include it! Please @simbit18 @lupyuen confirm it

@linguini1
Copy link
Contributor

I agree with @xiaoxiang781216 that it would be good to use the existing framework for profiling this. I'm also not sure why this type of profiling would need to exist in the kernel space? It is just a timer surrounding open/close/read/write calls, which could be done from user space applications. What regressions is this catching?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: Documentation Improvements or additions to documentation Size: M The size of the change in this PR is medium

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants