Skip to content

Rocpd part1#6

Open
sputhala-amd wants to merge 36 commits intoamd-stagingfrom
rocpd-part1
Open

Rocpd part1#6
sputhala-amd wants to merge 36 commits intoamd-stagingfrom
rocpd-part1

Conversation

@sputhala-amd
Copy link
Copy Markdown
Owner

@sputhala-amd sputhala-amd commented Sep 23, 2025

rocprofiler-systems Pull Request

Related Issue

  • Closes #

What type of PR is this? (check all that apply)

  • Bug Fix
  • Cherry Pick
  • Continuous Integration
  • Documentation Update
  • Feature
  • Optimization
  • Refactor
  • Other (please specify)

Technical Details

Have you added or updated tests to validate functionality?

  • Yes
  • No - does not apply to this PR

Added / Updated documentation?

  • Yes
  • No - does not apply to this PR

Have you updated CHANGELOG?

  • Yes
  • No - does not apply to this PR

Summary by CodeRabbit

  • New Features

    • Added optional ROCPD data export with SQLite backend, schemas, and analytic views.
    • Introduced agent management (CPU/GPU) and node information for richer metadata.
    • Added comprehensive ROCPD instrumentation across metrics, sampling, AMD SMI, CPU frequency, communication, and Kokkos.
    • New built-in benchmarking utility (category-based, env-configurable).
  • Configuration

    • New settings to enable ROCPD and benchmarking, choose internal vs system SQLite, and set database path.
  • Performance

    • More efficient metrics handling and conditional backends (ROCPD/Perfetto/Timemory).
  • Chores

    • Added SQLite as a Git submodule.
  • Documentation

    • Updated changelog with initial ROCPD support.

adjordje-amd and others added 30 commits July 14, 2025 18:49
 - cpu_frequency
 - amd_smi
 - sampling
* Fix marker api traces

* Remove space

* Formatting change

* Small change

* Update Changelog

* Add period to changelog

* Update source/lib/rocprof-sys/library/rocprofiler-sdk.cpp

Co-authored-by: David Galiffi <David.Galiffi@amd.com>

* Fix roctx tests

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>
* Update gotcha submodule from timemory

* Fix build failure and add copilot suggestions

* Fix formatting errors
A recursive submodule update, `git submodule uppdate --recursive
--init`,
would fail due to an improperly defined and empty submodule in GOTCHA.

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
…OCm#282)

* Fix: Add missing <string.h> include for C string functions in RCCL tests

* Update examples/rccl/rccl-tests/src/common.h

Yes, confirmed—<cstring> alone works in my environment. Updated the PR

Co-authored-by: David Galiffi <David.Galiffi@amd.com>

* clang-format

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>
Adds the following presets: 

- `ci` - to match the common CI settings - including tests and asserts
- `debug` - True debug build - include building tests
- `debug-optimized` - include building tests
- `release` - To match the "build-release` script - no tests.

The default build folder will be `${sourceDir}/build/<preset>`.

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>
Rename `rocpd_post_process_backtrace_metrix` to `rocpd_post_process_backtrace_metrics`

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Sep 23, 2025

Walkthrough

Adds ROCPD database output via a new SQLite-backed data layer, optional internal SQLite build, and a benchmarking facility. Introduces agent/CPU/node discovery, ROCPD data processor, SQL schema/views, and widespread integration across library components with new config flags and initialization/flush paths. Also updates build system and utilities.

Changes

Cohort / File(s) Summary
Submodule
\.gitmodules``
Adds Git submodule entry external/sqlite pointing to https://github.com/sqlite/sqlite.git.
Docs
CHANGELOG.md
Notes initial ROCPD database output controlled by ROCPROFSYS_USE_ROCPD.
Build Options & Wiring
CMakeLists.txt, cmake/Packages.cmake, cmake/SQLite3.cmake, source/lib/CMakeLists.txt, source/lib/common/CMakeLists.txt, source/lib/core/CMakeLists.txt, source/lib/core/rocpd/CMakeLists.txt, source/lib/core/rocpd/data_storage/CMakeLists.txt
Adds options: ROCPROFILER_BUILD_SQLITE3, ROCPROFSYS_ENABLE_BENCHMARK; introduces interface rocprofiler-systems-sqlite3; conditional internal SQLite build or system find; links core to SQLite3; installs ROCPD SQL schemas; exposes new public headers.
SQLite Schema
source/lib/core/rocpd/data_storage/schema/*.sql
Adds ROCPD tables, views, summary/data views, and indexes; includes placeholder-based templating and marker file.
Common Utilities
source/lib/common/{traits.hpp, md5sum.hpp}
Adds type traits (string-literal, optional) and header-only MD5 implementation with helpers.
Core Entities & Discovery
source/lib/core/{agent.hpp, agent_manager.hpp, agent_manager.cpp, cpu.hpp, cpu.cpp, gpu.cpp, node_info.hpp, node_info.cpp}
Introduces agent model/manager, CPU info parsing/agents, GPU agent insertion refactor, and node_info singleton with system/machine metadata.
Config
source/lib/core/config.{hpp,cpp}
Adds ROCPROFSYS_USE_ROCPD setting, database path resolver, and accessor get_use_rocpd().
Benchmarking
source/lib/core/benchmark/{benchmark.hpp, category.hpp}
Adds optional header-only benchmarking framework and category enum, gated by ROCPROFSYS_ENABLE_BENCHMARK macro.
ROCPD Data Layer
source/lib/core/rocpd/{data_processor.hpp, data_processor.cpp, json.hpp, json.cpp}, source/lib/core/rocpd/data_storage/{database.hpp, database.cpp, insert_query_builders.hpp, table_insert_query.hpp}
Implements ROCPD singleton data processor, JSON utility, SQLite-backed database (temp+main DB, schema loader, backup flush), prepared statement executors, and query builders.
Library Integration
source/lib/rocprof-sys/library.cpp, source/lib/rocprof-sys/library/amd_smi.cpp, source/lib/rocprof-sys/library/components/{backtrace_metrics.hpp, backtrace_metrics.cpp, comm_data.hpp, comm_data.cpp}, source/lib/rocprof-sys/library/{cpu_freq.cpp, kokkosp.cpp, sampling.cpp}, source/lib/rocprof-sys/library/rocprofiler-sdk/counters.cpp
Wires ROCPD preinit, categories/tracks/PMCs, and event/sample emission across components (AMD SMI, backtrace metrics, comm data, CPU freq, Kokkos, sampling); adds conditional paths based on get_use_rocpd(); replaces/updates perfetto/timemory gating; minor include cleanup in counters.cpp; moves comm_data::start() implementation out-of-line.
CLI Tool
source/bin/rocprof-sys-avail/avail.cpp
Defers GPU count retrieval until after library initialization.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor App
  participant Lib as rocprof-sys (init)
  participant Cfg as config
  participant Node as node_info
  participant Agents as agent_manager
  participant CPU as cpu::query
  participant DB as rocpd::data_storage::database
  participant DP as rocpd::data_processor

  App->>Lib: rocprofsys_init_library()
  Lib->>Cfg: get_use_rocpd()
  alt ROCPD enabled
    Lib->>Node: get_instance()
    Lib->>CPU: query_cpu_agents()
    CPU->>Agents: insert CPU agents
    Lib->>DP: get_instance()
    DP->>DB: get_instance()
    DB->>DB: initialize_schema()
  end
  Lib-->>App: initialized
  note over App,DP: Runtime components emit events/samples via DP
Loading
sequenceDiagram
  autonumber
  actor App
  participant Comp as Component (e.g., AMD SMI / Sampling)
  participant DP as rocpd::data_processor
  participant DB as rocpd::data_storage::database

  Comp->>DP: insert_* (metadata/events/samples)
  DP->>DB: create_statement_executor(...)(bind, exec)
  DB-->>DP: last_insert_id (as needed)
  App->>DP: flush()
  DP->>DB: flush()
  DB->>DB: sqlite backup temp -> main
  DB-->>App: flushed
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120+ minutes

Poem

I hopped through schemas, views, and gleam,
A SQLite burrow for the profiling stream.
Agents counted, nodes named right,
Benchmarks blink in compile-time light.
With ROCPD carrots neatly stashed,
We flush the garden—data cached.
Thump-thump: performance tracks are hashed! 🥕🐇

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description is just the unfilled repository template with checklist boxes unchecked and no Technical Details, so it is largely incomplete and does not document this very large, multi-file change. Required context such as related issue links, a concise summary of the implemented features (rocpd integration, SQLite submodule and CMake options, new schemas and data_processor, public API additions), testing instructions, and changelog confirmation are missing. Because the description lacks the minimal guidance needed for reviewers, the check fails. Please complete the PR template before merging: link related issue(s) or create one, check the appropriate PR type boxes, and add a clear Technical Details section summarizing the main changes (SQLite submodule and build option, rocPD data_processor and DB schema/views, new public APIs, and key files). Include build/test instructions (how to enable ROCPROFSYS_BUILD_SQLITE3 and ROCPROFSYS_ENABLE_BENCHMARK, steps to run unit/integration tests or manual validation), note any migration or runtime impacts, and update CHANGELOG.md if required; then mark the checklist items accordingly. This will provide reviewers the necessary context to focus on high-risk areas and validate the changes.
Docstring Coverage ⚠️ Warning Docstring coverage is 11.52% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title "Rocpd part1" references the ROCPD work introduced in this changeset and therefore is related to the main change (adding rocpd support, SQLite integration, and related schemas/processing). It is short and focused but terse and does not describe the scope or key components added. Because it correctly names the primary area of change, it passes the title relevance check.

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c424dac and 7a429bf.

📒 Files selected for processing (49)
  • .gitmodules (1 hunks)
  • CHANGELOG.md (1 hunks)
  • CMakeLists.txt (2 hunks)
  • cmake/Packages.cmake (2 hunks)
  • cmake/SQLite3.cmake (1 hunks)
  • source/bin/rocprof-sys-avail/avail.cpp (2 hunks)
  • source/lib/CMakeLists.txt (1 hunks)
  • source/lib/common/CMakeLists.txt (1 hunks)
  • source/lib/common/md5sum.hpp (1 hunks)
  • source/lib/common/traits.hpp (1 hunks)
  • source/lib/core/CMakeLists.txt (3 hunks)
  • source/lib/core/agent.hpp (1 hunks)
  • source/lib/core/agent_manager.cpp (1 hunks)
  • source/lib/core/agent_manager.hpp (1 hunks)
  • source/lib/core/benchmark/benchmark.hpp (1 hunks)
  • source/lib/core/benchmark/category.hpp (1 hunks)
  • source/lib/core/config.cpp (3 hunks)
  • source/lib/core/config.hpp (1 hunks)
  • source/lib/core/cpu.cpp (1 hunks)
  • source/lib/core/cpu.hpp (1 hunks)
  • source/lib/core/gpu.cpp (6 hunks)
  • source/lib/core/node_info.cpp (1 hunks)
  • source/lib/core/node_info.hpp (1 hunks)
  • source/lib/core/rocpd/CMakeLists.txt (1 hunks)
  • source/lib/core/rocpd/data_processor.cpp (1 hunks)
  • source/lib/core/rocpd/data_processor.hpp (1 hunks)
  • source/lib/core/rocpd/data_storage/CMakeLists.txt (1 hunks)
  • source/lib/core/rocpd/data_storage/database.cpp (1 hunks)
  • source/lib/core/rocpd/data_storage/database.hpp (1 hunks)
  • source/lib/core/rocpd/data_storage/insert_query_builders.hpp (1 hunks)
  • source/lib/core/rocpd/data_storage/schema/data_views.sql (1 hunks)
  • source/lib/core/rocpd/data_storage/schema/marker_views.sql (1 hunks)
  • source/lib/core/rocpd/data_storage/schema/rocpd_indexes.sql (1 hunks)
  • source/lib/core/rocpd/data_storage/schema/rocpd_tables.sql (1 hunks)
  • source/lib/core/rocpd/data_storage/schema/rocpd_views.sql (1 hunks)
  • source/lib/core/rocpd/data_storage/schema/summary_views.sql (1 hunks)
  • source/lib/core/rocpd/data_storage/table_insert_query.hpp (1 hunks)
  • source/lib/core/rocpd/json.cpp (1 hunks)
  • source/lib/core/rocpd/json.hpp (1 hunks)
  • source/lib/rocprof-sys/library.cpp (8 hunks)
  • source/lib/rocprof-sys/library/amd_smi.cpp (8 hunks)
  • source/lib/rocprof-sys/library/components/backtrace_metrics.cpp (4 hunks)
  • source/lib/rocprof-sys/library/components/backtrace_metrics.hpp (2 hunks)
  • source/lib/rocprof-sys/library/components/comm_data.cpp (16 hunks)
  • source/lib/rocprof-sys/library/components/comm_data.hpp (1 hunks)
  • source/lib/rocprof-sys/library/cpu_freq.cpp (6 hunks)
  • source/lib/rocprof-sys/library/kokkosp.cpp (6 hunks)
  • source/lib/rocprof-sys/library/rocprofiler-sdk/counters.cpp (1 hunks)
  • source/lib/rocprof-sys/library/sampling.cpp (9 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 42

🧹 Nitpick comments (48)
source/bin/rocprof-sys-avail/avail.cpp (1)

120-127: Update stale comment to reflect delayed GPU enumeration.

The comment says we initialize HIP before main, but gpu_count is now set in main after rocprofsys_init_library(). Update the comment to avoid confusion.

-// initialize HIP before main so that librocprof-sys is not HSA_TOOLS_LIB
+// GPU count is queried after rocprofsys_init_library() in main to ensure ROCm/HSA
+// initialization completes before agent enumeration.
 int gpu_count = 0;
source/lib/core/cpu.hpp (2)

32-42: Consider adding documentation and default initialization consistency.

The struct is well-designed but could benefit from:

  1. Brief comment explaining the struct's purpose
  2. Consider using brace initialization for consistency (some fields use {}, others use -1)
+// Information extracted from /proc/cpuinfo for a single CPU core
 struct cpu_info
 {
-    long        processor   = -1;
-    long        family      = -1;
-    long        model       = -1;
-    long        physical_id = -1;
-    long        core_id     = -1;
-    long        apicid      = -1;
-    std::string vendor_id   = {};
-    std::string model_name  = {};
+    long        processor   = {};
+    long        family      = {};
+    long        model       = {};
+    long        physical_id = {};
+    long        core_id     = {};
+    long        apicid      = {};
+    std::string vendor_id   = {};
+    std::string model_name  = {};
 };

44-48: Add brief function documentation.

Consider adding brief documentation to clarify the distinction between these functions.

+// Parse /proc/cpuinfo and return raw CPU information data
 std::vector<cpu_info>
 process_cpu_info_data();

+// Get cached or fresh CPU information (preferred for most use cases)
 std::vector<cpu_info>
 get_cpu_info();
source/lib/common/traits.hpp (1)

37-59: Consider using SFINAE-based approach for better template metaprogramming.

The current approach with specializations is functional but could be more elegant using SFINAE. However, since this is working code and the approach is clear, this is optional.

Alternative approach:

template <typename T>
struct is_string_literal_impl {
    template <typename U>
    static auto test(int) -> decltype(
        std::is_same_v<U, std::string> ||
        std::is_same_v<U, std::string_view> ||
        std::is_same_v<U, const char*> ||
        std::is_same_v<U, char*>, std::true_type{});
    template <typename>
    static std::false_type test(...);
    using type = decltype(test<T>(0));
};
source/lib/core/agent_manager.hpp (1)

43-46: Consider adding const overloads and error handling documentation.

The getter methods should clarify what happens when an agent is not found (exception vs. default behavior).

Consider documenting the behavior:

// Throws exception if agent not found with specified id and type
const agent& get_agent_by_id(size_t device_id, agent_type type) const;
// Throws exception if agent not found with specified handle and type  
const agent& get_agent_by_handle(size_t device_id, agent_type type) const;
// Throws exception if agent not found with specified handle
const agent& get_agent_by_handle(size_t device_handle) const;
source/lib/core/rocpd/data_storage/schema/marker_views.sql (1)

1-4: Empty placeholder file needs completion or removal.

This file contains only comments but no SQL definitions. Consider either implementing the marker views or removing this placeholder if not yet needed.

Would you like me to help implement marker view definitions based on the rocPD schema, or should this file be removed until the marker functionality is ready?

source/lib/core/rocpd/data_storage/schema/rocpd_indexes.sql (1)

1-46: Review index design for query optimization.

The comprehensive index definitions follow good practices, but there are some concerns about potential redundancy and query pattern alignment.

Positive aspects:

  • Column order considerations are important for concatenated indexes
  • Covers common query patterns with guid-based filtering
  • Includes both single and composite indexes for flexibility

Areas for improvement:

  • Lines 41-45: Creating separate single-column indexes on guid, nid, pid, start, and end for the same table may be redundant if composite indexes already cover these access patterns
  • Lines 36-39: The composite indexes on (guid, nid, pid) might duplicate filtering capabilities already provided by the (id, guid) indexes on Lines 8-28

Consider the following optimizations:

  1. Analyze actual query patterns to determine if all single-column indexes on Lines 41-45 are necessary
  2. Review composite index overlap - the (guid, nid, pid) indexes may be redundant if queries typically filter by guid first
  3. Consider query selectivity - indexes should start with columns from WHERE clauses followed by JOIN conditions

Here's a potential optimization for the region table indexes:

--- CREATE INDEX `rocpd_region{{uuid}}_guid_nid_pid_idx` ON `rocpd_region{{uuid}}` ("guid", "nid", "pid");
--- CREATE INDEX `rocpd_region{{uuid}}_guid_idx` ON `rocpd_region{{uuid}}` ("guid");
--- CREATE INDEX `rocpd_region{{uuid}}_nid_idx` ON `rocpd_region{{uuid}}` ("nid");
--- CREATE INDEX `rocpd_region{{uuid}}_pid_idx` ON `rocpd_region{{uuid}}` ("pid");
+-- CREATE INDEX `rocpd_region{{uuid}}_guid_nid_pid_idx` ON `rocpd_region{{uuid}}` ("guid", "nid", "pid");
+-- Single column indexes only if needed for queries that don't filter by guid first
source/lib/core/rocpd/data_storage/table_insert_query.hpp (1)

36-52: Prefer explicit constructor and consider move semantics.

The constructor should be marked explicit to prevent unintended implicit conversions. Also, consider taking the table name by value and moving it for better performance with temporary strings.

 struct table_insert_query
 {
-    table_insert_query()
+    explicit table_insert_query()
     : _query_columns_builder{ _ss }
     {}
 
-    query_builders::query_columns_builder& set_table_name(const std::string& tableName)
+    query_builders::query_columns_builder& set_table_name(std::string tableName)
     {
         _ss.str("");
-        _ss << "INSERT INTO " << tableName << " ";
+        _ss << "INSERT INTO " << std::move(tableName) << " ";
         return _query_columns_builder;
     }
source/lib/core/CMakeLists.txt (1)

106-120: Consider making schema file installation optional.

The schema files are being installed unconditionally. Consider adding a CMake option to control whether these files should be installed, similar to other optional components.

+option(ROCPROFSYS_INSTALL_SCHEMAS "Install ROCPD schema files" ON)
+
 file(GLOB ROCPD_SCHEMA_FILES "${CMAKE_CURRENT_LIST_DIR}/rocpd/data_storage/schema/*.sql")
 
+if(ROCPROFSYS_INSTALL_SCHEMAS)
 foreach(_SRC ${ROCPD_SCHEMA_FILES})
     cmake_path(GET _SRC FILENAME _BASE)
     configure_file(
         ${_SRC}
         ${PROJECT_BINARY_DIR}/${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/${_BASE}
         COPYONLY
     )
     install(
         FILES ${PROJECT_BINARY_DIR}/${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/${_BASE}
         DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}
         COMPONENT core
     )
 endforeach()
+endif()
source/lib/rocprof-sys/library/kokkosp.cpp (1)

190-193: Consider using string_view for const char parameters.*

The ROCPD event processing function takes multiple const char* parameters. Consider using std::string_view for better safety and to avoid potential null pointer issues.

 void
-rocpd_process_kokkos_event(const char* name, const char* event_type, const char* target,
+rocpd_process_kokkos_event(std::string_view name, std::string_view event_type, std::string_view target,
                            uint64_t timestamp_ns)
 {
     auto& data_processor = get_data_processor();
     auto  event_metadata = rocpd::json::create();
 
-    event_metadata->set("name", name);
-    event_metadata->set("event_type", event_type);
-    event_metadata->set("target", target);
+    event_metadata->set("name", std::string(name));
+    event_metadata->set("event_type", std::string(event_type));
+    event_metadata->set("target", std::string(target));
source/lib/core/agent.hpp (1)

55-56: Consider using std::optional for optional fields.

The device_id and base_id fields use 0 as a default/invalid value, which could be ambiguous since 0 is a valid device ID.

+#include <optional>
+
 struct agent
 {
     agent_type  type;
     uint64_t    id;
     uint32_t    node_id;
     int32_t     logical_node_id;
     int32_t     logical_node_type_id;
     std::string name;
     std::string model_name;
     std::string vendor_name;
     std::string product_name;
 
-    size_t device_id{ 0 };
-    size_t base_id{ 0 };
+    std::optional<size_t> device_id;
+    std::optional<size_t> base_id;
 #if ROCPROFSYS_USE_ROCM > 0
     amdsmi_processor_handle smi_handle = nullptr;
 #endif
 };
source/lib/core/rocpd/json.hpp (2)

36-41: Consider using std::any or a more robust JSON library.

The current JSON implementation using std::variant may face scalability issues as you add more types. Additionally, recursive structures (e.g., std::shared_ptr<json> in the variant) can lead to complexity.

Consider adopting a well-established JSON library like nlohmann/json or RapidJSON for better performance, feature completeness, and maintainability. If you need to keep this lightweight implementation, consider using std::any for more flexibility:

-    using json_value =
-        std::variant<std::string, int, double, long long, bool, std::vector<json>,
-                     std::nullptr_t, std::shared_ptr<json>>;
+    using json_value = std::any;

50-51: Expose stringify as a free function for better extensibility.

Making stringify static and private limits extensibility. If users need custom serialization logic or debugging capabilities, they cannot access this utility.

Consider making stringify a free function or providing it as part of the public API:

-private:
-    static std::string stringify(const std::shared_ptr<json_value>& value);
+public:
+    static std::string stringify(const json_value& value);
source/lib/core/gpu.cpp (1)

259-259: Clear does not deallocate vector memory.

Using clear() only removes elements but keeps allocated memory. If processors_list grows large and then shrinks, memory won't be released.

If memory usage is a concern, consider using shrink_to_fit() or swap with an empty vector:

-    processors::processors_list.clear();
+    processors::processors_list.clear();
+    processors::processors_list.shrink_to_fit();
source/lib/rocprof-sys/library/sampling.cpp (2)

1079-1087: Variable name 'line' is confusing for a buffer object.

The variable name line at lines 1079-1087 suggests it contains line data, but it's actually a buffer object. This makes the code harder to understand.

Rename the variable to better reflect its purpose:

-    for(auto line : _loaded_data)
+    for(auto buffer : _loaded_data)
     {
-        while(!line.is_empty())
+        while(!buffer.is_empty())
         {
             auto _v = sampler_bundle_t{};
-            line.read(&_v);
+            buffer.read(&_v);
             _raw_data.emplace_back(std::move(_v));
         }
-        line.destroy();
+        buffer.destroy();
     }

225-228: Singleton accessor pattern is repetitive.

The pattern of creating a local function to access the singleton is repeated across multiple files. This could be consolidated.

Consider creating a common header with a template function or macro to reduce duplication:

// In a common header:
template<typename T>
inline T& get_singleton_instance() {
    return T::get_instance();
}

// Then use:
auto& data_processor = get_singleton_instance<rocpd::data_processor>();
source/lib/core/rocpd/json.cpp (1)

91-92: Numeric overflow/underflow not handled.

The generic operator<< for numeric types doesn't handle special floating-point values like NaN or infinity, which would produce invalid JSON.

Add special handling for non-finite floating-point values:

 else
 {
-    // handle int + double
-    oss << arg;
+    // handle int + double
+    if constexpr(std::is_floating_point_v<T>) {
+        if (std::isnan(arg)) oss << "null";
+        else if (std::isinf(arg)) oss << (arg > 0 ? "1e308" : "-1e308");
+        else oss << arg;
+    } else {
+        oss << arg;
+    }
 }
source/lib/core/node_info.hpp (1)

45-53: Public mutable data members violate encapsulation.

Having all data members public and mutable in a singleton allows any code to modify the node information, which could lead to inconsistent state and makes the code harder to maintain.

Consider making the data members private with const getters:

+private:
     uint64_t    id          = 0;
     uint64_t    hash        = 0;
     std::string machine_id  = {};
     std::string system_name = {};
     std::string node_name   = {};
     std::string release     = {};
     std::string version     = {};
     std::string machine     = {};
     std::string domain_name = {};
+
+public:
+    uint64_t get_id() const { return id; }
+    uint64_t get_hash() const { return hash; }
+    const std::string& get_machine_id() const { return machine_id; }
+    // ... other getters
source/lib/rocprof-sys/library/amd_smi.cpp (3)

483-490: Consider reducing code duplication

The pattern of checking use_perfetto and use_rocpd flags repeats throughout the post-processing. Consider extracting these checks into helper functions for cleaner code.

-    auto use_perfetto = get_use_perfetto();
-    auto use_rocpd    = get_use_rocpd();
-
-    if(use_rocpd)
-    {
-        rocpd_initialize_smi_pmc(_dev_id);
-    }
+    auto use_perfetto = get_use_perfetto();
+    auto use_rocpd    = get_use_rocpd();
+
+    auto initialize_rocpd_if_enabled = [&]() {
+        if(use_rocpd) rocpd_initialize_smi_pmc(_dev_id);
+    };
+    
+    initialize_rocpd_if_enabled();

554-573: Check for VCN/JPEG support with bounds safety.

The code accesses itr.m_xcp_metrics[0].vcn_busy without checking if m_xcp_metrics is empty when VCN activity is supported. While the learnings indicate that at least one row is always provided when supported, adding a defensive check would prevent potential crashes.

-                else if(gpu::is_vcn_activity_supported(_dev_id))
-                {
-                    // For VCN activity, use simple indexing
-                    for(std::size_t i = 0; i < std::size(itr.m_xcp_metrics[0].vcn_busy);
-                        ++i)
-                        counter_track::emplace(_dev_id, addendum_blk(i, "VCN Activity"),
-                                               "%");
-                }
+                else if(gpu::is_vcn_activity_supported(_dev_id))
+                {
+                    // For VCN activity, use simple indexing
+                    if(!itr.m_xcp_metrics.empty())
+                    {
+                        for(std::size_t i = 0; i < std::size(itr.m_xcp_metrics[0].vcn_busy);
+                            ++i)
+                            counter_track::emplace(_dev_id, addendum_blk(i, "VCN Activity"),
+                                                   "%");
+                    }
+                }

582-588: Add bounds check for JPEG activity access.

Similar to VCN activity, add a defensive check when accessing m_xcp_metrics[0].jpeg_busy for JPEG activity support.

-                else if(gpu::is_jpeg_activity_supported(_dev_id))
-                {
-                    for(std::size_t i = 0; i < std::size(itr.m_xcp_metrics[0].jpeg_busy);
-                        ++i)
-                        counter_track::emplace(_dev_id, addendum_blk(i, "JPEG Activity"),
-                                               "%");
-                }
+                else if(gpu::is_jpeg_activity_supported(_dev_id))
+                {
+                    if(!itr.m_xcp_metrics.empty())
+                    {
+                        for(std::size_t i = 0; i < std::size(itr.m_xcp_metrics[0].jpeg_busy);
+                            ++i)
+                            counter_track::emplace(_dev_id, addendum_blk(i, "JPEG Activity"),
+                                                   "%");
+                    }
+                }
source/lib/core/agent_manager.cpp (1)

91-105: Consider unifying handle-based lookups.

Both get_agent_by_handle overloads perform nearly identical operations with the only difference being the type comparison. Consider implementing one in terms of the other to reduce duplication.

 const agent&
 agent_manager::get_agent_by_handle(size_t device_handle) const
 {
     ROCPROFSYS_VERBOSE(3, "Getting agent for device handle: %ld\n", device_handle);
-    auto _agent =
-        std::find_if(_agents.begin(), _agents.end(), [&](const auto& agent_ptr) {
-            return agent_ptr->id == device_handle;
-        });
-    if(_agent == _agents.end())
-    {
-        std::ostringstream oss;
-        oss << "Agent not found for device handle: " << device_handle;
-        throw std::out_of_range(oss.str());
-    }
-    return **_agent;
+    for(const auto& agent_ptr : _agents)
+    {
+        if(agent_ptr->id == device_handle)
+            return *agent_ptr;
+    }
+    std::ostringstream oss;
+    oss << "Agent not found for device handle: " << device_handle;
+    throw std::out_of_range(oss.str());
 }
source/lib/core/rocpd/data_storage/database.cpp (1)

73-77: Add null checks in destructor.

The destructor should check for null pointers before closing to handle partial initialization scenarios.

 database::~database()
 {
-    sqlite3_close(_sqlite3_db_temp);
-    sqlite3_close(_sqlite3_db);
+    if(_sqlite3_db_temp) sqlite3_close(_sqlite3_db_temp);
+    if(_sqlite3_db) sqlite3_close(_sqlite3_db);
 }
source/lib/rocprof-sys/library.cpp (2)

310-328: Add error handling for /proc/cmdline read.

The function reads from /proc/<pid>/cmdline but doesn't check for read errors or handle the case where the file might not exist (e.g., for non-existent PIDs).

 std::vector<std::string>
 read_command_line(pid_t _pid)
 {
     auto _cmdline = std::vector<std::string>{};
     auto fcmdline = std::stringstream{};
     fcmdline << "/proc/" << _pid << "/cmdline";
     auto ifs = std::ifstream{ fcmdline.str().c_str() };
     if(ifs)
     {
         std::string sarg;
         while(std::getline(ifs, sarg, '\0'))
         {
             _cmdline.push_back(sarg);
         }
         ifs.close();
     }
+    else
+    {
+        ROCPROFSYS_DEBUG_F("Failed to open cmdline file for PID %d\n", _pid);
+    }
 
     return _cmdline;
 }

496-506: Review remaining static _once guards for fork-safety (PID guard can wrap).

rg found multiple static bool _once instances — review and align them with a fork-safe strategy (either convert to pid_t-based guards consistently or use a reset/generation approach to avoid PID wrap issues).

Locations:

  • source/lib/rocprof-sys/library/components/comm_data.cpp:224
  • source/lib/rocprof-sys/library.cpp:382
  • source/lib/rocprof-sys/library.cpp:429
  • source/lib/rocprof-sys-dl/dl.cpp:1254
  • source/lib/core/config.cpp:210
source/lib/core/cpu.cpp (2)

18-23: Add error logging for file open failure.

When /proc/cpuinfo cannot be opened, the function silently returns an empty vector. Consider adding debug logging to help diagnose issues.

 if(!cpuinfo_file.is_open())
 {
+    ROCPROFSYS_DEBUG("Failed to open /proc/cpuinfo\n");
     return cpu_data;
 }

121-126: Thread safety concern with static variable.

The get_cpu_info() function uses a static variable without synchronization. While initialization is thread-safe in C++11+, if process_cpu_info_data() is ever called again to refresh the data, there could be race conditions.

Consider adding synchronization if the CPU info might be refreshed:

 std::vector<cpu_info>
 get_cpu_info()
 {
-    static auto _v = process_cpu_info_data();
+    static std::once_flag _init_flag;
+    static std::vector<cpu_info> _v;
+    std::call_once(_init_flag, []() { _v = process_cpu_info_data(); });
     return _v;
 }
cmake/SQLite3.cmake (1)

29-29: Build command uses non-standard syntax.

The build command ${MAKE_COMMAND} install -s is unusual. Typically, you'd have separate build and install steps. The -s flag strips symbols during installation, which might not be desirable for debugging.

Consider separating the build and install steps for clarity:

-        BUILD_COMMAND ${MAKE_COMMAND} install -s
+        BUILD_COMMAND ${MAKE_COMMAND} -j
         INSTALL_COMMAND ""

Or if you want to keep the install:

-        BUILD_COMMAND ${MAKE_COMMAND} install -s
+        BUILD_COMMAND ${MAKE_COMMAND} -j
+        INSTALL_COMMAND ${MAKE_COMMAND} install
source/lib/rocprof-sys/library/components/comm_data.cpp (1)

169-195: Device ID hardcoded to 0 for CPU events.

The function rocpd_process_cpu_usage_events takes a device_id parameter but then uses agent.device_id at line 181, which may not match the passed parameter. Additionally, the function is always called with device_id=0.

Either use the passed device_id consistently or remove the parameter if it's always 0:

-    data_processor.insert_pmc_event(event_id, agent.device_id, name, value);
+    data_processor.insert_pmc_event(event_id, agent.base_id, name, value);
source/lib/core/rocpd/data_storage/schema/rocpd_tables.sql (1)

221-221: TODO comment indicates design uncertainty.

The comment suggests reconsidering whether to store values directly or use a reference to a string table. This is an important design decision that affects storage efficiency and query performance.

The current design stores values as TEXT directly. Using a value_id referencing a string table would:

  • Reduce storage for repeated values
  • Improve query performance for equality comparisons
  • Add complexity for insertions

Would you like me to help implement a string-table-based design or create an issue to track this optimization?

source/lib/core/rocpd/data_storage/schema/summary_views.sql (2)

11-12: Potential integer overflow in millisecond conversion.

The duration calculation SUM(K.end - K.start) / 1000.0 assumes the time values are in microseconds. Consider documenting the time unit explicitly or using a named constant for clarity.


36-37: Consider renaming the 'Busy' metric for clarity.

The "Busy" metric (GPU utilization) might benefit from a more descriptive name like gpu_utilization_ratio or gpu_busy_percentage (if multiplied by 100) to make its meaning clearer.

source/lib/rocprof-sys/library/cpu_freq.cpp (1)

353-362: Consider extracting CPU device handling to improve clarity.

The comment on lines 352-356 indicates potential architectural misalignment between CPU and GPU device handling. This should be tracked as technical debt.

Would you like me to open an issue to track the TODO about aligning get_enabled_cpus() with GPU implementation patterns?

source/lib/common/md5sum.hpp (1)

436-438: Potential buffer overflow with snprintf.

The snprintf call writes to a fixed buffer but the loop could potentially write beyond bounds if the format is changed.

Consider using safer string formatting:

-    char buf[33];
-    for(int i = 0; i < 16; i++)
-        snprintf(buf + i * 2, 3, "%02x", digest[i]);
-    buf[32] = '\0';
+    std::ostringstream oss;
+    for(int i = 0; i < 16; i++)
+        oss << std::hex << std::setw(2) << std::setfill('0') << static_cast<int>(digest[i]);
+    return oss.str();
source/lib/core/rocpd/data_storage/database.hpp (1)

70-71: Inconsistent error handling in constraint violation case.

The code appears to check constraints but doesn't use the proper error message. Line 70 has an extra quote after the colon.

-                ss << "Constraint violation(s): " << "\n";
+                ss << "Constraint violation(s):\n";
source/lib/core/rocpd/data_storage/schema/data_views.sql (1)

242-242: Useless duration calculation in sample_regions.

Line 242 calculates duration as (R.timestamp - R.timestamp) which will always be 0.

Either remove the duration field or document why it's always 0 for sample regions:

-    (R.timestamp - R.timestamp) AS duration,
+    0 AS duration,  -- Samples are point-in-time events with no duration
source/lib/core/rocpd/data_processor.cpp (3)

73-74: Potential race condition in string deduplication.

The string lookup and insertion pattern has a time-of-check-time-of-use (TOCTOU) issue. Between checking _string_map.find(str) and accessing _string_map.at(str), another thread could modify the map if the mutex scope was different.

The current code is actually safe because the mutex is held throughout, but line 74 uses .at() which will throw if the key doesn't exist. Since you just checked with find(), use the iterator directly:

-    if(it != _string_map.end()) return _string_map.at(str);
+    if(it != _string_map.end()) return it->second;

619-638: Consider null check for optional agent_id.

While the code correctly handles the optional agent_id, consider adding defensive validation.

Add validation for better error handling:

 void
 data_processor::insert_memory_alloc(size_t node_id, size_t process_id, size_t thread_id,
                                     std::optional<size_t> agent_id, const char* type,
                                     const char* level, uint64_t start, uint64_t end,
                                     size_t address, size_t size, size_t queue_id,
                                     size_t stream_id, size_t event_id,
                                     const char* extdata)
 {
+    std::lock_guard<std::mutex> lock(_data_mutex);
     if(agent_id.has_value())
     {
         _insert_memory_alloc_statement(_upid.c_str(), node_id, process_id, thread_id,
                                        agent_id.value(), type, level, start, end, address,
                                        size, queue_id, stream_id, event_id, extdata);
     }

32-49: Consider validating database initialization.

The constructor assumes database initialization always succeeds but doesn't handle potential failures.

Add error checking:

 data_processor::data_processor()
 {
-    data_storage::database::get_instance().initialize_schema();
-    _upid = data_storage::database::get_instance().get_upid();
+    auto& db = data_storage::database::get_instance();
+    db.initialize_schema();
+    _upid = db.get_upid();
+    
+    if(_upid.empty())
+    {
+        throw std::runtime_error("Failed to initialize data processor: invalid UPID");
+    }
source/lib/core/rocpd/data_processor.hpp (4)

42-43: Fix typo in type alias name.

The type alias insert_pmc_event_stms has a typo - should be insert_pmc_event_stmt for consistency.

-    using insert_pmc_event_stms =
+    using insert_pmc_event_stmt =
         std::function<void(const char*, size_t, size_t, double, const char*)>;

181-181: Fix typo in parameter name.

The parameter kernarg_segmnt_size is missing an 'e' - should be kernarg_segment_size.

-                              uint32_t kernarg_segmnt_size,
+                              uint32_t kernarg_segment_size,

235-235: Update member variable type to match corrected alias.

Following the type alias correction above, update the member variable declaration.

-    insert_pmc_event_stms             _insert_pmc_event_statement;
+    insert_pmc_event_stmt             _insert_pmc_event_statement;

79-102: Consider simplifying the PMC identifier implementation.

The custom hash and equality functors could be replaced with a simpler approach using std::pair.

You could simplify this by using std::pair instead of a custom struct:

-    struct pmc_identifier
-    {
-        size_t      agent_id;
-        std::string name;
-    };
-
-    struct pmc_identifier_hash
-    {
-        std::size_t operator()(const pmc_identifier& pmc) const noexcept
-        {
-            std::size_t h1 = std::hash<size_t>{}(pmc.agent_id);
-            std::size_t h2 = std::hash<std::string>{}(pmc.name);
-            return h1 ^ (h2 << 1);
-        }
-    };
-
-    struct pmc_identifier_equal
-    {
-        bool operator()(const pmc_identifier& lhs,
-                        const pmc_identifier& rhs) const noexcept
-        {
-            return lhs.agent_id == rhs.agent_id && lhs.name == rhs.name;
-        }
-    };
+    using pmc_identifier = std::pair<size_t, std::string>;

Then update the map declaration on line 223-224:

-    std::unordered_map<pmc_identifier, size_t, pmc_identifier_hash, pmc_identifier_equal>
-                                            _pmc_descriptor_map;
+    std::unordered_map<pmc_identifier, size_t> _pmc_descriptor_map;

The standard library provides hash specialization for std::pair that will work automatically.

source/lib/core/benchmark/category.hpp (1)

50-65: Consider explicitly handling the count sentinel in to_string (or assert).

Currently count falls into default → "unknown". Either map it to a specific string (e.g., "count") or add an assertion to catch accidental use.

Apply this minimal mapping if desired:

         case category::sdk_tool_buffered_tracing: return "sdk_tool_buffered_tracing";
-        default: return "unknown";
+        case category::count: return "count";
+        default: return "unknown";
source/lib/core/benchmark/benchmark.hpp (4)

163-189: Env var UX: support both ROCPROFSYS_BENCHMARK_CATEGORIES and BENCHMARK_CATEGORIES; fix message.

Current log message hardcodes one var and ignores the common alias. Fall back to BENCHMARK_CATEGORIES if the primary is empty, and update the warning.

Apply this diff:

-    static void init_from_env(const char* envVar = "ROCPROFSYS_BENCHMARK_CATEGORIES")
+    static void init_from_env(const char* envVar = "ROCPROFSYS_BENCHMARK_CATEGORIES")
     {
         std::lock_guard lock(m_mutex);
-        const auto*     env = std::getenv(envVar);
-        if(env == nullptr || std::string(env).empty())
+        const char*     env = std::getenv(envVar);
+        if(env == nullptr || std::string(env).empty())
         {
-            ROCPROFSYS_WARNING(1, "No BENCHMARK categories specified in environment "
-                                  "variable ROCPROFSYS_BENCHMARK_CATEGORIES.\n");
-            return;
+            // Try common alias
+            env = std::getenv("BENCHMARK_CATEGORIES");
+            if(env == nullptr || std::string(env).empty())
+            {
+                ROCPROFSYS_WARNING(1, "No BENCHMARK categories specified in environment "
+                                      "variables ROCPROFSYS_BENCHMARK_CATEGORIES or BENCHMARK_CATEGORIES.\n");
+                return;
+            }
         }
         std::string        _str(env);

340-344: Make the public init default match the internal default.

Public wrapper defaults to "BENCHMARK_CATEGORIES" while internal uses "ROCPROFSYS_BENCHMARK_CATEGORIES". Align them.

Apply this diff:

-inline void
-init_from_env(const char* envVar = "BENCHMARK_CATEGORIES")
+inline void
+init_from_env(const char* envVar = "ROCPROFSYS_BENCHMARK_CATEGORIES")
 {
     _benchmark_impl::init_from_env(envVar);
 }

111-115: Prefer steady_clock for interval timing.

high_resolution_clock may be aliasing system_clock on some libstdc++ versions; steady_clock avoids time jumps.

Apply this diff:

-    using clock                             = std::chrono::high_resolution_clock;
+    using clock                             = std::chrono::steady_clock;

216-244: Optional: colorize only if stdout is a TTY.

ANSI codes can pollute logs in non-TTY environments. Gate color on isatty(STDOUT_FILENO).

Minimal pattern:

-        std::cout << "\033[32m"
+        const bool use_color = ::isatty(STDOUT_FILENO);
+        if(use_color) std::cout << "\033[32m";
@@
-        std::cout << std::string(_category + _calls + _total + _avg + _min + _max, '=')
-                  << "\033[0m" << "\n\n";
+        std::cout << std::string(_category + _calls + _total + _avg + _min + _max, '=');
+        if(use_color) std::cout << "\033[0m";
+        std::cout << "\n\n";
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c424dac and 7a429bf.

📒 Files selected for processing (49)
  • .gitmodules (1 hunks)
  • CHANGELOG.md (1 hunks)
  • CMakeLists.txt (2 hunks)
  • cmake/Packages.cmake (2 hunks)
  • cmake/SQLite3.cmake (1 hunks)
  • source/bin/rocprof-sys-avail/avail.cpp (2 hunks)
  • source/lib/CMakeLists.txt (1 hunks)
  • source/lib/common/CMakeLists.txt (1 hunks)
  • source/lib/common/md5sum.hpp (1 hunks)
  • source/lib/common/traits.hpp (1 hunks)
  • source/lib/core/CMakeLists.txt (3 hunks)
  • source/lib/core/agent.hpp (1 hunks)
  • source/lib/core/agent_manager.cpp (1 hunks)
  • source/lib/core/agent_manager.hpp (1 hunks)
  • source/lib/core/benchmark/benchmark.hpp (1 hunks)
  • source/lib/core/benchmark/category.hpp (1 hunks)
  • source/lib/core/config.cpp (3 hunks)
  • source/lib/core/config.hpp (1 hunks)
  • source/lib/core/cpu.cpp (1 hunks)
  • source/lib/core/cpu.hpp (1 hunks)
  • source/lib/core/gpu.cpp (6 hunks)
  • source/lib/core/node_info.cpp (1 hunks)
  • source/lib/core/node_info.hpp (1 hunks)
  • source/lib/core/rocpd/CMakeLists.txt (1 hunks)
  • source/lib/core/rocpd/data_processor.cpp (1 hunks)
  • source/lib/core/rocpd/data_processor.hpp (1 hunks)
  • source/lib/core/rocpd/data_storage/CMakeLists.txt (1 hunks)
  • source/lib/core/rocpd/data_storage/database.cpp (1 hunks)
  • source/lib/core/rocpd/data_storage/database.hpp (1 hunks)
  • source/lib/core/rocpd/data_storage/insert_query_builders.hpp (1 hunks)
  • source/lib/core/rocpd/data_storage/schema/data_views.sql (1 hunks)
  • source/lib/core/rocpd/data_storage/schema/marker_views.sql (1 hunks)
  • source/lib/core/rocpd/data_storage/schema/rocpd_indexes.sql (1 hunks)
  • source/lib/core/rocpd/data_storage/schema/rocpd_tables.sql (1 hunks)
  • source/lib/core/rocpd/data_storage/schema/rocpd_views.sql (1 hunks)
  • source/lib/core/rocpd/data_storage/schema/summary_views.sql (1 hunks)
  • source/lib/core/rocpd/data_storage/table_insert_query.hpp (1 hunks)
  • source/lib/core/rocpd/json.cpp (1 hunks)
  • source/lib/core/rocpd/json.hpp (1 hunks)
  • source/lib/rocprof-sys/library.cpp (8 hunks)
  • source/lib/rocprof-sys/library/amd_smi.cpp (8 hunks)
  • source/lib/rocprof-sys/library/components/backtrace_metrics.cpp (4 hunks)
  • source/lib/rocprof-sys/library/components/backtrace_metrics.hpp (2 hunks)
  • source/lib/rocprof-sys/library/components/comm_data.cpp (16 hunks)
  • source/lib/rocprof-sys/library/components/comm_data.hpp (1 hunks)
  • source/lib/rocprof-sys/library/cpu_freq.cpp (6 hunks)
  • source/lib/rocprof-sys/library/kokkosp.cpp (6 hunks)
  • source/lib/rocprof-sys/library/rocprofiler-sdk/counters.cpp (1 hunks)
  • source/lib/rocprof-sys/library/sampling.cpp (9 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-10T02:42:26.719Z
Learnt from: sputhala-amd
PR: sputhala-amd/rocprofiler-systems#1
File: source/lib/rocprof-sys/library/amd_smi.cpp:447-468
Timestamp: 2025-06-10T02:42:26.719Z
Learning: When VCN activity is supported by AMD GPUs in the AMD SMI API, there will always be at least one row of values provided, making bounds checking for `m_xcp_metrics[0]` unnecessary in the supported case.

Applied to files:

  • source/lib/rocprof-sys/library/amd_smi.cpp
🧬 Code graph analysis (21)
source/lib/core/node_info.cpp (4)
source/lib/core/node_info.hpp (4)
  • node_info (34-34)
  • node_info (37-37)
  • node_info (38-38)
  • node_info (39-39)
source/lib/core/rocpd/data_processor.cpp (2)
  • get_instance (51-56)
  • get_instance (52-52)
source/lib/core/rocpd/data_storage/database.cpp (2)
  • get_instance (53-58)
  • get_instance (54-54)
source/lib/core/agent_manager.cpp (2)
  • get_instance (31-36)
  • get_instance (32-32)
source/lib/core/cpu.cpp (3)
source/lib/core/cpu.hpp (3)
  • process_cpu_info_data (45-45)
  • get_cpu_info (48-48)
  • device_count (51-51)
source/lib/core/gpu.cpp (2)
  • device_count (161-170)
  • device_count (162-162)
source/lib/core/agent_manager.cpp (2)
  • get_instance (31-36)
  • get_instance (32-32)
source/lib/core/rocpd/json.cpp (1)
source/lib/core/rocpd/json.hpp (2)
  • key (43-43)
  • value (51-51)
source/lib/core/rocpd/data_storage/table_insert_query.hpp (1)
source/lib/core/rocpd/data_storage/insert_query_builders.hpp (1)
  • _ss (59-59)
source/lib/core/node_info.hpp (1)
source/lib/core/node_info.cpp (1)
  • node_info (34-63)
source/lib/rocprof-sys/library.cpp (5)
source/lib/core/rocpd/data_processor.cpp (2)
  • get_instance (51-56)
  • get_instance (52-52)
source/lib/core/agent_manager.cpp (2)
  • get_instance (31-36)
  • get_instance (32-32)
source/lib/core/node_info.cpp (2)
  • get_instance (65-70)
  • get_instance (66-66)
source/lib/core/cpu.cpp (2)
  • query_cpu_agents (135-166)
  • query_cpu_agents (136-136)
source/lib/core/config.cpp (6)
  • get_debug_init (1767-1771)
  • get_debug_init (1768-1768)
  • get_use_rocpd (2387-2392)
  • get_use_rocpd (2388-2388)
  • get_use_rocm (1842-1851)
  • get_use_rocm (1843-1843)
source/lib/core/benchmark/benchmark.hpp (1)
source/lib/core/benchmark/category.hpp (2)
  • to_string (50-65)
  • to_string (51-51)
source/lib/core/benchmark/category.hpp (1)
source/lib/core/benchmark/benchmark.hpp (2)
  • cat (263-266)
  • cat (263-263)
source/lib/core/config.cpp (1)
source/lib/common/environment.hpp (2)
  • get_env (151-164)
  • get_env (152-152)
source/lib/rocprof-sys/library/kokkosp.cpp (4)
source/lib/core/rocpd/data_processor.cpp (3)
  • get_instance (51-56)
  • get_instance (52-52)
  • data_processor (32-49)
source/lib/core/node_info.cpp (2)
  • get_instance (65-70)
  • get_instance (66-66)
source/lib/core/rocpd/json.cpp (2)
  • create (29-33)
  • create (30-30)
source/lib/core/config.cpp (2)
  • get_use_rocpd (2387-2392)
  • get_use_rocpd (2388-2388)
source/lib/rocprof-sys/library/components/comm_data.cpp (7)
source/lib/core/rocpd/data_processor.cpp (3)
  • get_instance (51-56)
  • get_instance (52-52)
  • data_processor (32-49)
source/lib/core/agent_manager.cpp (2)
  • get_instance (31-36)
  • get_instance (32-32)
source/lib/core/node_info.cpp (2)
  • get_instance (65-70)
  • get_instance (66-66)
source/lib/rocprof-sys/library/cpu_freq.cpp (4)
  • get_data_processor (87-91)
  • get_data_processor (88-88)
  • rocpd_process_cpu_usage_events (207-240)
  • rocpd_process_cpu_usage_events (208-212)
source/lib/core/rocpd/data_processor.hpp (4)
  • data_processor (205-205)
  • data_processor (206-206)
  • event_id (131-132)
  • event_id (192-193)
source/lib/core/agent_manager.hpp (3)
  • device_id (44-44)
  • device_id (45-45)
  • agent (43-43)
source/lib/core/config.cpp (6)
  • get_use_rocpd (2387-2392)
  • get_use_rocpd (2388-2388)
  • get_use_timemory (1817-1822)
  • get_use_timemory (1818-1818)
  • get_use_perfetto (1810-1815)
  • get_use_perfetto (1811-1811)
source/lib/core/rocpd/data_storage/database.cpp (3)
source/lib/core/node_info.cpp (2)
  • get_instance (65-70)
  • get_instance (66-66)
source/lib/core/rocpd/data_storage/database.hpp (6)
  • database (43-43)
  • database (48-48)
  • database (51-51)
  • validate_sqlite3_result (54-104)
  • validate_sqlite3_result (54-55)
  • query (165-165)
source/lib/core/config.cpp (2)
  • get_database_absolute_path (2360-2385)
  • get_database_absolute_path (2361-2361)
source/lib/core/agent_manager.cpp (1)
source/lib/core/agent_manager.hpp (4)
  • device_id (44-44)
  • device_id (45-45)
  • type (48-48)
  • device_handle (46-46)
source/bin/rocprof-sys-avail/avail.cpp (3)
source/lib/core/gpu.cpp (2)
  • device_count (161-170)
  • device_count (162-162)
source/lib/rocprof-sys/library/amd_smi.cpp (2)
  • device_count (832-836)
  • device_count (833-833)
source/lib/core/gpu.hpp (1)
  • device_count (76-76)
source/lib/rocprof-sys/library/cpu_freq.cpp (5)
source/lib/core/rocpd/data_processor.cpp (3)
  • get_instance (51-56)
  • get_instance (52-52)
  • data_processor (32-49)
source/lib/core/agent_manager.cpp (2)
  • get_instance (31-36)
  • get_instance (32-32)
source/lib/core/node_info.cpp (2)
  • get_instance (65-70)
  • get_instance (66-66)
source/lib/core/rocpd/data_processor.hpp (4)
  • data_processor (205-205)
  • data_processor (206-206)
  • event_id (131-132)
  • event_id (192-193)
source/lib/core/config.cpp (4)
  • get_use_perfetto (1810-1815)
  • get_use_perfetto (1811-1811)
  • get_use_rocpd (2387-2392)
  • get_use_rocpd (2388-2388)
source/lib/core/rocpd/data_storage/database.hpp (2)
source/lib/core/rocpd/data_storage/database.cpp (2)
  • database (60-71)
  • database (73-77)
source/lib/common/traits.hpp (2)
  • is_string_literal (63-68)
  • is_string_literal (64-64)
source/lib/rocprof-sys/library/sampling.cpp (11)
source/lib/core/rocpd/json.cpp (4)
  • to_string (41-58)
  • to_string (42-42)
  • create (29-33)
  • create (30-30)
source/lib/rocprof-sys/library/components/backtrace_metrics.cpp (10)
  • get_hw_counter_labels (137-142)
  • get_hw_counter_labels (138-138)
  • get_data_processor (144-148)
  • get_data_processor (145-145)
  • post_process_rocpd (643-678)
  • post_process_rocpd (644-644)
  • init_rocpd (480-513)
  • init_rocpd (481-481)
  • fini_rocpd (515-560)
  • fini_rocpd (516-516)
source/lib/core/rocpd/data_processor.cpp (3)
  • get_instance (51-56)
  • get_instance (52-52)
  • data_processor (32-49)
source/lib/core/node_info.cpp (2)
  • get_instance (65-70)
  • get_instance (66-66)
source/lib/core/rocpd/data_processor.hpp (6)
  • data_processor (205-205)
  • data_processor (206-206)
  • track_name (124-125)
  • track (143-144)
  • event_id (131-132)
  • event_id (192-193)
source/lib/rocprof-sys/library/components/comm_data.cpp (2)
  • get_data_processor (79-83)
  • get_data_processor (80-80)
source/lib/rocprof-sys/library/kokkosp.cpp (2)
  • get_data_processor (160-164)
  • get_data_processor (161-161)
source/lib/rocprof-sys/library/cpu_freq.cpp (2)
  • get_data_processor (87-91)
  • get_data_processor (88-88)
source/lib/rocprof-sys/library/amd_smi.cpp (2)
  • get_data_processor (81-85)
  • get_data_processor (82-82)
source/lib/core/config.cpp (4)
  • get_use_rocpd (2387-2392)
  • get_use_rocpd (2388-2388)
  • get_sampling_include_inlines (2249-2254)
  • get_sampling_include_inlines (2250-2250)
source/lib/rocprof-sys/library/thread_info.hpp (1)
  • _beg (105-105)
source/lib/core/rocpd/data_processor.cpp (3)
source/lib/core/rocpd/data_processor.hpp (22)
  • data_processor (205-205)
  • data_processor (206-206)
  • str (107-107)
  • node_id (109-112)
  • node_id (114-116)
  • node_id (118-122)
  • node_id (134-141)
  • node_id (148-150)
  • node_id (152-154)
  • node_id (161-169)
  • node_id (171-176)
  • node_id (195-200)
  • track_name (124-125)
  • event_id (131-132)
  • event_id (192-193)
  • track (143-144)
  • category_id (127-129)
  • category_id (146-146)
  • stream_id (156-157)
  • queue_id (158-159)
  • id (178-185)
  • id (187-190)
source/lib/core/rocpd/data_storage/database.cpp (4)
  • get_instance (53-58)
  • get_instance (54-54)
  • flush (157-166)
  • flush (158-158)
source/lib/core/rocpd/data_storage/database.hpp (1)
  • query (165-165)
source/lib/rocprof-sys/library/amd_smi.cpp (7)
source/lib/core/rocpd/data_processor.cpp (3)
  • get_instance (51-56)
  • get_instance (52-52)
  • data_processor (32-49)
source/lib/core/agent_manager.cpp (2)
  • get_instance (31-36)
  • get_instance (32-32)
source/lib/core/node_info.cpp (2)
  • get_instance (65-70)
  • get_instance (66-66)
source/lib/core/rocpd/data_processor.hpp (4)
  • data_processor (205-205)
  • data_processor (206-206)
  • event_id (131-132)
  • event_id (192-193)
source/lib/core/config.cpp (4)
  • get_use_rocpd (2387-2392)
  • get_use_rocpd (2388-2388)
  • get_use_perfetto (1810-1815)
  • get_use_perfetto (1811-1811)
source/lib/rocprof-sys/library/thread_info.hpp (1)
  • _ts (104-104)
source/lib/core/gpu.cpp (6)
  • is_vcn_activity_supported (332-337)
  • is_vcn_activity_supported (333-333)
  • is_jpeg_activity_supported (339-344)
  • is_jpeg_activity_supported (340-340)
  • device_count (161-170)
  • device_count (162-162)
source/lib/rocprof-sys/library/components/backtrace_metrics.cpp (7)
source/lib/core/rocpd/data_processor.cpp (3)
  • get_instance (51-56)
  • get_instance (52-52)
  • data_processor (32-49)
source/lib/core/agent_manager.cpp (2)
  • get_instance (31-36)
  • get_instance (32-32)
source/lib/core/node_info.cpp (2)
  • get_instance (65-70)
  • get_instance (66-66)
source/lib/rocprof-sys/library/components/comm_data.cpp (2)
  • get_data_processor (79-83)
  • get_data_processor (80-80)
source/lib/rocprof-sys/library/thread_info.hpp (1)
  • _tid (118-118)
source/lib/core/rocpd/data_processor.hpp (5)
  • data_processor (205-205)
  • data_processor (206-206)
  • track_name (124-125)
  • event_id (131-132)
  • event_id (192-193)
source/lib/core/agent_manager.hpp (2)
  • device_id (44-44)
  • device_id (45-45)
source/lib/core/gpu.cpp (2)
source/lib/rocprof-sys/library/rocprofiler-sdk/fwd.hpp (1)
  • agents (153-153)
source/lib/core/agent_manager.cpp (2)
  • get_instance (31-36)
  • get_instance (32-32)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
  • GitHub Check: rhel (g++, 9.4, 6.3, Release)
  • GitHub Check: rhel (g++, 8.10, 0.0, Release)
  • GitHub Check: rhel (g++, 8.10, 6.4, Release)
  • GitHub Check: rhel (g++, 9.3, 6.3, Release)
  • GitHub Check: ubuntu-noble (g++, Debug, OFF, OFF, 6.4)
  • GitHub Check: opensuse (g++, 15.6, Release)
  • GitHub Check: opensuse (g++, 15.5, Release)
  • GitHub Check: ubuntu-focal-external (g++-11, ON, ON, OFF, Release, ON, ON, OFF)
  • GitHub Check: ubuntu-focal-external (g++-7, OFF, OFF, OFF, Release, OFF, OFF, OFF)
  • GitHub Check: ubuntu-focal-external (g++-10, OFF, ON, ON, Release, ON, ON, OFF)
  • GitHub Check: ubuntu-focal-external (g++-8, OFF, OFF, OFF, Release, OFF, OFF, OFF)
  • GitHub Check: ubuntu-focal-external (g++-9, OFF, ON, OFF, Release, ON, ON, ON)
  • GitHub Check: ubuntu-focal-external-rocm (g++, 6.3, OFF, 3, -LE "transpose|videodecode|jpegdecode|network")
  • GitHub Check: ubuntu-focal-codecov
  • GitHub Check: ubuntu-jammy-external (g++-12, OFF, OFF, OFF, OFF, ON, OFF, ON, Release, OFF, ON, 0.0)
  • GitHub Check: ubuntu-jammy-external-rocm (g++, ON, OFF, OFF, OFF, ON, OFF, ON, Release, OFF, ON, 6.3)
  • GitHub Check: ubuntu-jammy-external-rocm (g++, ON, OFF, OFF, OFF, ON, OFF, ON, Release, OFF, ON, 6.4)

{
auto _val = md5sum{};
for(const auto& itr : inp)
_val.update(std::string_view{ inp });
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Bug: Incorrect variable reference in compute_md5sum.

Line 115 uses inp instead of itr in the string_view constructor, which will cause compilation errors.

-        _val.update(std::string_view{ inp });
+        _val.update(std::string_view{ itr });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
_val.update(std::string_view{ inp });
_val.update(std::string_view{ itr });
🤖 Prompt for AI Agents
In source/lib/common/md5sum.hpp around line 115, the call
_val.update(std::string_view{ inp }); incorrectly references inp; change it to
use the correct variable itr (i.e., _val.update(std::string_view{ itr })), or if
itr is not a null-terminated C-string ensure you construct the string_view with
the correct pointer and length (std::string_view{itr, length}) so the code
compiles and updates the MD5 with the intended input.

GG(c, d, a, b, x[11], S23, 0x265e5a51); /* 19 */
GG(b, c, d, a, x[0], S24, 0xe9b6c7aa); /* 20 */
GG(a, b, c, d, x[5], S21, 0xd62f105d); /* 21 */
GG(d, a, b, c, x[10], S22, 0x2441453); /* 22 */
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

🧩 Analysis chain

Potential typo in hex constant.

Line 285 contains 0x2441453 which appears to be missing a digit. Standard MD5 constants use 8 hex digits.

-    GG(d, a, b, c, x[10], S22, 0x2441453);  /* 22 */
+    GG(d, a, b, c, x[10], S22, 0x02441453); /* 22 */

🌐 Web query:

What is the correct MD5 constant for round 2 step 22?

💡 Result:

Round 2 step 22 uses the constant 0x02441453. [1]

References
[1] RFC 1321 — The MD5 Message-Digest Algorithm (appendix C).


Replace incorrect MD5 constant (round 2 step 22)

0x2441453 is missing a leading 0; RFC 1321 (Appendix C) specifies 0x02441453.

File: source/lib/common/md5sum.hpp:285

-    GG(d, a, b, c, x[10], S22, 0x2441453);  /* 22 */
+    GG(d, a, b, c, x[10], S22, 0x02441453); /* 22 */
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
GG(d, a, b, c, x[10], S22, 0x2441453); /* 22 */
GG(d, a, b, c, x[10], S22, 0x02441453); /* 22 */
🤖 Prompt for AI Agents
In source/lib/common/md5sum.hpp around line 285, the MD5 constant used in the GG
round 2 step 22 is incorrect (0x2441453); replace it with the correct RFC 1321
value 0x02441453, ensuring the literal includes the leading zero so the constant
has the proper 32-bit hex form, then rebuild and run the MD5 tests to verify
correctness.

HH(a, b, c, d, x[13], S31, 0x289b7ec6); /* 41 */
HH(d, a, b, c, x[0], S32, 0xeaa127fa); /* 42 */
HH(c, d, a, b, x[3], S33, 0xd4ef3085); /* 43 */
HH(b, c, d, a, x[6], S34, 0x4881d05); /* 44 */
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

🧩 Analysis chain

Another potential typo in hex constant.

Line 309 contains 0x4881d05 which also appears to be missing a digit.

-    HH(b, c, d, a, x[6], S34, 0x4881d05);   /* 44 */
+    HH(b, c, d, a, x[6], S34, 0x04881d05);  /* 44 */

🌐 Web query:

What is the correct MD5 constant for round 3 step 44?

💡 Result:

Round 3 step 44 uses the MD5 constant 0x04881D05. [1][2]

Sources:
[1] RFC 1321 (MD5).
[2] Wikipedia — MD5 (K table).


Fix MD5 constant for round 3 step 44 — use 0x04881D05

File: source/lib/common/md5sum.hpp — Line 309 uses 0x4881d05 (missing leading 0); RFC 1321 / MD5 K table require 0x04881D05.

-    HH(b, c, d, a, x[6], S34, 0x4881d05);   /* 44 */
+    HH(b, c, d, a, x[6], S34, 0x04881D05);  /* 44 */
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
HH(b, c, d, a, x[6], S34, 0x4881d05); /* 44 */
HH(b, c, d, a, x[6], S34, 0x04881D05); /* 44 */
🤖 Prompt for AI Agents
In source/lib/common/md5sum.hpp around line 309, the MD5 round 3 constant for
step 44 is incorrect (0x4881d05 missing a leading zero); replace it with the
correct RFC 1321 value 0x04881D05 to match the MD5 K table and ensure constant
is represented as a full 32-bit hex literal.

Comment on lines +47 to +49
_agent.device_id =
(_agent.type == agent_type::GPU ? _gpu_agents_cnt++ : _cpu_agents_cnt++);
_agents.emplace_back(std::make_shared<agent>(_agent));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix: Side effect in device_id assignment.

The device_id field is being modified as a side effect of the insertion logic. This overwrites the original device_id passed by the caller with an incremented counter value.

-    _agent.device_id =
-        (_agent.type == agent_type::GPU ? _gpu_agents_cnt++ : _cpu_agents_cnt++);
+    size_t assigned_id =
+        (_agent.type == agent_type::GPU ? _gpu_agents_cnt++ : _cpu_agents_cnt++);
+    _agent.device_id = assigned_id;

This ensures the increment happens correctly while making the assignment more explicit. However, please verify whether overwriting the original device_id is intended, as it may cause confusion if callers expect their provided value to be preserved.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
_agent.device_id =
(_agent.type == agent_type::GPU ? _gpu_agents_cnt++ : _cpu_agents_cnt++);
_agents.emplace_back(std::make_shared<agent>(_agent));
size_t assigned_id =
(_agent.type == agent_type::GPU ? _gpu_agents_cnt++ : _cpu_agents_cnt++);
_agent.device_id = assigned_id;
_agents.emplace_back(std::make_shared<agent>(_agent));
🤖 Prompt for AI Agents
In source/lib/core/agent_manager.cpp around lines 47-49 the expression assigns
to _agent.device_id as a side effect of choosing which counter to increment,
overwriting the caller-provided device_id; fix this by first computing a new_id
from the counters (e.g. new_id = (_agent.type == agent_type::GPU ?
_gpu_agents_cnt++ : _cpu_agents_cnt++)), then explicitly assign _agent.device_id
= new_id (or only assign when the passed device_id is invalid/placeholder), and
finally push a copy into _agents; also verify whether overwriting the original
device_id is intended and change the logic to preserve caller values if needed.

Comment on lines +148 to +154
static const thread_local auto _thread_id = getpid();
const auto _end_time = clock::now();
std::lock_guard lock(m_mutex);
(..., (is_category_defined<categories>([&] {
if(m_enabled.test(to_index(categories)))
end_category(_end_time, categories, _thread_id);
})));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Bug: mismatched thread identifier used in start vs end (gettid vs getpid).

start() uses gettid() while end() uses getpid(). This guarantees lookup misses in m_started and breaks measurements.

Apply this diff to use gettid() consistently:

     template <category_enum... categories>
     static void end()
     {
-        static const thread_local auto _thread_id = getpid();
+        static const thread_local auto _thread_id = gettid();
         const auto                     _end_time  = clock::now();
         std::lock_guard                lock(m_mutex);
         (..., (is_category_defined<categories>([&] {
              if(m_enabled.test(to_index(categories)))
                  end_category(_end_time, categories, _thread_id);
         })));
     }

Optionally, introduce a small helper for clarity/portability:

// inside benchmark_impl<true,...> private or public:
static inline tid_t current_tid() {
#if defined(__linux__)
    return gettid();
#else
    return static_cast<tid_t>(std::hash<std::thread::id>{}(std::this_thread::get_id()));
#endif
}

Then use current_tid() in both places.

🤖 Prompt for AI Agents
In source/lib/core/benchmark/benchmark.hpp around lines 148 to 154, the end()
path uses getpid() while start() uses gettid(), causing m_started lookup misses;
change end() to use the same thread id source as start() (use gettid() instead
of getpid()) or add a small helper current_tid() that returns gettid() on Linux
and a hashed std::thread::id on other platforms, then replace both start() and
end() usages with current_tid() for consistent thread identification.

Comment thread source/lib/rocprof-sys/library/components/backtrace_metrics.cpp
Comment on lines +421 to +422
rocpd_process_cpu_usage_events<mpi_send>(0, sendcount * _send_size);
rocpd_process_cpu_usage_events<mpi_recv>(0, recvcount * _send_size);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Typo in variable name causes incorrect value.

Line 422 uses _send_size instead of _recv_size for the receive processing.

-        rocpd_process_cpu_usage_events<mpi_recv>(0, recvcount * _send_size);
+        rocpd_process_cpu_usage_events<mpi_recv>(0, recvcount * _recv_size);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
rocpd_process_cpu_usage_events<mpi_send>(0, sendcount * _send_size);
rocpd_process_cpu_usage_events<mpi_recv>(0, recvcount * _send_size);
rocpd_process_cpu_usage_events<mpi_send>(0, sendcount * _send_size);
rocpd_process_cpu_usage_events<mpi_recv>(0, recvcount * _recv_size);
🤖 Prompt for AI Agents
In source/lib/rocprof-sys/library/components/comm_data.cpp around lines 421 to
422, the call processing receive CPU usage mistakenly uses _send_size instead of
_recv_size; change the second call to use _recv_size when computing the
byte/count argument so rocpd_process_cpu_usage_events<mpi_recv>(0, recvcount *
_recv_size) is passed the correct value.

Comment thread source/lib/rocprof-sys/library/components/comm_data.cpp
Comment on lines +142 to +162
size_t EVENT_CODE = 0;
size_t INSTANCE_ID = 0;
const char* LONG_DESCRIPTION = "";
const char* COMPONENT = "";
const char* BLOCK = "";
const char* EXPRESSION = "";
const char* MEMORY = "MB";
const char* TIME = "sec";
auto ni = node_info::get_instance();
const auto* TARGET_ARCH = "CPU";

auto& _agent_manager = agent_manager::get_instance();
auto base_id = _agent_manager.get_agent_by_id(dev_id, agent_type::CPU).base_id;

do_for_enabled_cpus([&](size_t cpu_id) {
data_processor.insert_pmc_description(
ni.id, getpid(), base_id, TARGET_ARCH, EVENT_CODE, INSTANCE_ID,
get_cpu_freq_track_name<category::cpu_freq>(cpu_id).c_str(), "Frequency",
trait::name<category::cpu_freq>::description, LONG_DESCRIPTION, COMPONENT,
component::cpu_freq::display_unit().c_str(), "ABS", BLOCK, EXPRESSION, 0, 0);
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Hard-coded PMC description values need proper initialization.

Lines 142-151 contain placeholder values (EVENT_CODE = 0, INSTANCE_ID = 0, empty string literals). These should either be properly initialized or documented as intentional placeholders.

Consider either:

  1. Loading these values from configuration
  2. Using named constants with clear documentation
  3. Adding TODO comments if these are temporary placeholders
-    // find the proper values for a following definitions
-    size_t      EVENT_CODE       = 0;
-    size_t      INSTANCE_ID      = 0;
-    const char* LONG_DESCRIPTION = "";
-    const char* COMPONENT        = "";
-    const char* BLOCK            = "";
-    const char* EXPRESSION       = "";
+    // TODO: These are placeholder values - update with actual PMC configuration
+    constexpr size_t EVENT_CODE       = 0;  // CPU event code placeholder
+    constexpr size_t INSTANCE_ID      = 0;  // Instance ID placeholder
+    const char* LONG_DESCRIPTION = "";  // Detailed description to be added
+    const char* COMPONENT        = "";  // Component identifier to be added
+    const char* BLOCK            = "";  // Hardware block identifier to be added
+    const char* EXPRESSION       = "";  // PMC expression to be added
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
size_t EVENT_CODE = 0;
size_t INSTANCE_ID = 0;
const char* LONG_DESCRIPTION = "";
const char* COMPONENT = "";
const char* BLOCK = "";
const char* EXPRESSION = "";
const char* MEMORY = "MB";
const char* TIME = "sec";
auto ni = node_info::get_instance();
const auto* TARGET_ARCH = "CPU";
auto& _agent_manager = agent_manager::get_instance();
auto base_id = _agent_manager.get_agent_by_id(dev_id, agent_type::CPU).base_id;
do_for_enabled_cpus([&](size_t cpu_id) {
data_processor.insert_pmc_description(
ni.id, getpid(), base_id, TARGET_ARCH, EVENT_CODE, INSTANCE_ID,
get_cpu_freq_track_name<category::cpu_freq>(cpu_id).c_str(), "Frequency",
trait::name<category::cpu_freq>::description, LONG_DESCRIPTION, COMPONENT,
component::cpu_freq::display_unit().c_str(), "ABS", BLOCK, EXPRESSION, 0, 0);
});
// TODO: These are placeholder values - update with actual PMC configuration
constexpr size_t EVENT_CODE = 0; // CPU event code placeholder
constexpr size_t INSTANCE_ID = 0; // Instance ID placeholder
const char* LONG_DESCRIPTION = ""; // Detailed description to be added
const char* COMPONENT = ""; // Component identifier to be added
const char* BLOCK = ""; // Hardware block identifier to be added
const char* EXPRESSION = ""; // PMC expression to be added
const char* MEMORY = "MB";
const char* TIME = "sec";
auto ni = node_info::get_instance();
const auto* TARGET_ARCH = "CPU";
auto& _agent_manager = agent_manager::get_instance();
auto base_id = _agent_manager.get_agent_by_id(dev_id, agent_type::CPU).base_id;
do_for_enabled_cpus([&](size_t cpu_id) {
data_processor.insert_pmc_description(
ni.id, getpid(), base_id, TARGET_ARCH, EVENT_CODE, INSTANCE_ID,
get_cpu_freq_track_name<category::cpu_freq>(cpu_id).c_str(), "Frequency",
trait::name<category::cpu_freq>::description, LONG_DESCRIPTION, COMPONENT,
component::cpu_freq::display_unit().c_str(), "ABS", BLOCK, EXPRESSION, 0, 0);
});
🤖 Prompt for AI Agents
In source/lib/rocprof-sys/library/cpu_freq.cpp around lines 142 to 162, the PMC
description variables (EVENT_CODE, INSTANCE_ID, LONG_DESCRIPTION, COMPONENT,
BLOCK, EXPRESSION, MEMORY, TIME, TARGET_ARCH) are populated with placeholder
literals (zeros and empty strings); replace these placeholders with proper
initialization or documentation: either load values from configuration or
constants (e.g., named constexprs) that clearly state their intent, or add
explicit TODO comments indicating they are intentional placeholders and why, and
ensure MEMORY/TIME use correct units; update the
data_processor.insert_pmc_description call to use the chosen initialized values
so the PMC descriptions are meaningful and maintainable.

Comment on lines +1817 to +1840
if(get_sampling_include_inlines() && iitr.lineinfo)
{
auto _lines = iitr.lineinfo.lines;
std::reverse(_lines.begin(), _lines.end());
size_t _n = 0;
for(const auto& line : _lines)
{
const auto* _name =
static_strings.emplace(demangle(line.name)).first->c_str();
auto inlined_name_id = data_processor.insert_string(_name);

auto inlined_call_stack = ::rocpd::json::create();
inlined_call_stack->set("name", std::string(demangle(line.name)));
inlined_call_stack->set("location", std::string(line.location));
inlined_call_stack->set("line", std::to_string(line.line));
inlined_call_stack->set("inlined", "true");

rocpd_insert_region<category::timer_sampling>(
thread_idx, _beg, _end, inlined_name_id, _track_name.c_str(),
inlined_call_stack->to_string().c_str(), "{}",
// Only include HW counters for first inlined function
(_n == 0) ? hw_counter_json.c_str() : "{}");
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

🧩 Analysis chain

Potential JSON injection in inline function names.

The inline function names are being directly inserted into JSON without proper escaping, which could lead to malformed JSON if names contain special characters.

The JSON generation should escape special characters in function names. Since you're already using the json class, ensure all string values are properly escaped:

-    inlined_call_stack->set("name", std::string(demangle(line.name)));
+    // Ensure proper escaping is handled in json::set for string values
+    inlined_call_stack->set("name", std::string(demangle(line.name)));

🏁 Script executed:

#!/bin/bash
# Check if json::set properly escapes special characters
ast-grep --pattern 'json::set($_,$_)' source/lib/core/rocpd/json.cpp
rg -n "json::set.*escape|json::set.*sanitize" source/lib/core/rocpd/

Length of output: 0


🏁 Script executed:

#!/bin/bash
set -euo pipefail
echo "PWD: $(pwd)"
echo "Top-level dirs:"
ls -la | sed -n '1,200p' || true

echo
echo "Searching for rocpd json implementation and usages..."
rg -n --hidden -S 'namespace\s+rocpd\b|class\s+json\b|struct\s+json\b|rocpd::json\b|json::create\b|->set\(|->to_string\(|json->set\(|to_string\(\)' || true

echo
echo "Files containing 'class json' or 'struct json':"
rg -n --hidden -S 'class\s+json\b|struct\s+json\b' --files-with-matches || true

echo
echo "Files with '->set(' occurrences (showing context):"
rg -n --hidden -S '->set\(' -C3 || true

# If we find a candidate file for json implementation, print its content (limited)
json_impl_files=$(rg -n --hidden -S 'class\s+json\b|struct\s+json\b|namespace\s+rocpd\b' --files-with-matches || true)
if [ -n "$json_impl_files" ]; then
  echo
  echo "Previewing matched files for json implementation (first 400 lines each):"
  echo "$json_impl_files" | while read -r f; do
    echo "----- FILE: $f -----"
    sed -n '1,400p' "$f" || true
    echo "----- END: $f -----"
  done
else
  echo
  echo "No obvious json class/namespace matches found with 'class json' / 'struct json' / 'namespace rocpd'. Listing likely rocpd files:"
  rg -n --hidden -S 'rocpd' --files-with-matches | xargs -r -n1 echo || true
fi

Length of output: 62631


Critical: escape JSON string values in rocpd::json::stringify

stringify() emits std::string values verbatim (oss << """ << arg << """) so quotes, backslashes or control chars in demangled names will produce malformed JSON; sampling inserts demangled names (e.g. source/lib/rocprof-sys/library/sampling.cpp ~line 1828).
Fix: implement proper JSON string escaping (escape , ", control chars, etc.) in source/lib/core/rocpd/json.cpp::stringify (std::string branch) or use a vetted JSON library; add tests.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review continued from previous batch...

Comment on lines +25 to +41
#include <algorithm>
#include <array>
#include <bitset>
#include <chrono>
#include <cstdlib>
#include <iomanip>
#include <iostream>
#include <limits>
#include <mutex>
#include <sstream>
#include <string>
#include <type_traits>
#include <unistd.h>
#include <unordered_map>
#include <vector>

#include "core/benchmark/category.hpp"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add missing include for pid_t (if adopting pid_t).

To safely use pid_t, include <sys/types.h>.

Apply this diff near the other includes:

 #include <type_traits>
+#include <sys/types.h>
 #include <unistd.h>
 #include <unordered_map>
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#include <algorithm>
#include <array>
#include <bitset>
#include <chrono>
#include <cstdlib>
#include <iomanip>
#include <iostream>
#include <limits>
#include <mutex>
#include <sstream>
#include <string>
#include <type_traits>
#include <unistd.h>
#include <unordered_map>
#include <vector>
#include "core/benchmark/category.hpp"
#include <algorithm>
#include <array>
#include <bitset>
#include <chrono>
#include <cstdlib>
#include <iomanip>
#include <iostream>
#include <limits>
#include <mutex>
#include <sstream>
#include <string>
#include <type_traits>
#include <sys/types.h>
#include <unistd.h>
#include <unordered_map>
#include <vector>
#include "core/benchmark/category.hpp"
🤖 Prompt for AI Agents
In source/lib/core/benchmark/benchmark.hpp around lines 25 to 41, the file may
use pid_t but does not include the header that defines it; add an include for
<sys/types.h> alongside the other system includes (e.g., near <unistd.h>) so
pid_t is defined and portable across platforms that require that header.

Comment on lines +84 to +104
using tid_t = __pid_t;
struct indexed_category
{
size_t category;
tid_t thread_id;

friend bool operator==(const indexed_category& lhs, const indexed_category& rhs)
{
return lhs.category == rhs.category && lhs.thread_id == rhs.thread_id;
}
};

struct indexed_category_hash
{
size_t operator()(const indexed_category& p) const noexcept
{
std::size_t hash1 = std::hash<size_t>{}(p.category);
std::size_t hash2 = std::hash<size_t>{}(p.thread_id);
return hash1 ^ (hash2 << 1);
}
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Portability: avoid glibc-internal __pid_t; use pid_t and hash its own type.

__pid_t is implementation detail; prefer pid_t. Also hash tid_t directly.

Apply this diff:

-using tid_t = __pid_t;
+// explicitly include for pid_t
+#include <sys/types.h>
+using tid_t = pid_t;
@@
-        std::size_t hash2 = std::hash<size_t>{}(p.thread_id);
+        std::size_t hash2 = std::hash<tid_t>{}(p.thread_id);

Note: Ensure <sys/types.h> is included (see include block).

🤖 Prompt for AI Agents
In source/lib/core/benchmark/benchmark.hpp around lines 84 to 104, replace the
non-portable use of the glibc-internal type __pid_t with the standard pid_t
(i.e. using tid_t = pid_t) and update the hasher to hash the tid_t type directly
(use std::hash<tid_t> for hash2 and use an appropriate size_t result) instead of
casting tid_t to size_t; also ensure <sys/types.h> is included in the header
include block so pid_t is available.

@sputhala-amd
Copy link
Copy Markdown
Owner Author

@coderabbitai help

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Sep 23, 2025

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

CodeRabbit commands

These commands are invoked using PR/Issue comments.

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai evaluate custom pre-merge check --instructions <custom-checks-instructions> --name <custom-checks-title> [--mode <error|warning>] to validate the custom pre-merge checks instructions. Defaults to error when --mode is omitted.
    • @coderabbitai ignore pre-merge checks to override pre-merge checks and get an approval on PR.
    • @coderabbitai run pre-merge checks to run pre-merge checks on the pull request.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve to resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit configuration file (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, support, documentation and community

  • Visit our status page to check the current availability of CodeRabbit.
  • Create a ticket on our support page for assistance with any issues or questions.
  • Visit our documentation site for detailed information on how to use CodeRabbit.
  • Join our Discord community to connect with other users and get help from the community.
  • Follow us on X/Twitter for updates and announcements.

coderabbitai bot added a commit that referenced this pull request Sep 23, 2025
Docstrings generation was requested by @sputhala-amd.

* #6 (comment)

The following files were modified:

* `source/bin/rocprof-sys-avail/avail.cpp`
* `source/lib/common/md5sum.hpp`
* `source/lib/common/traits.hpp`
* `source/lib/core/agent_manager.cpp`
* `source/lib/core/agent_manager.hpp`
* `source/lib/core/benchmark/benchmark.hpp`
* `source/lib/core/benchmark/category.hpp`
* `source/lib/core/config.cpp`
* `source/lib/core/cpu.cpp`
* `source/lib/core/gpu.cpp`
* `source/lib/core/node_info.cpp`
* `source/lib/core/node_info.hpp`
* `source/lib/core/rocpd/data_processor.cpp`
* `source/lib/core/rocpd/data_processor.hpp`
* `source/lib/core/rocpd/data_storage/database.cpp`
* `source/lib/core/rocpd/data_storage/database.hpp`
* `source/lib/core/rocpd/data_storage/insert_query_builders.hpp`
* `source/lib/core/rocpd/data_storage/table_insert_query.hpp`
* `source/lib/core/rocpd/json.cpp`
* `source/lib/core/rocpd/json.hpp`
* `source/lib/rocprof-sys/library.cpp`
* `source/lib/rocprof-sys/library/amd_smi.cpp`
* `source/lib/rocprof-sys/library/components/backtrace_metrics.cpp`
* `source/lib/rocprof-sys/library/components/backtrace_metrics.hpp`
* `source/lib/rocprof-sys/library/components/comm_data.cpp`
* `source/lib/rocprof-sys/library/components/comm_data.hpp`
* `source/lib/rocprof-sys/library/cpu_freq.cpp`
* `source/lib/rocprof-sys/library/kokkosp.cpp`
* `source/lib/rocprof-sys/library/sampling.cpp`
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Sep 23, 2025

Note

Generated docstrings for this pull request at #7

@sputhala-amd
Copy link
Copy Markdown
Owner Author

@coderabbitai generate sequence diagram
@coderabbitai summary

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Sep 23, 2025

Sequence Diagram for PR

sequenceDiagram
title PR 6: Rocpd part1 opened from rocpd-part1 to amd-staging
participant dev as "sputhala-amd (Author)"
participant local as "Local Repo"
participant gh as "GitHub"
participant pr as "PR #6"
participant temp as "PR Template"
participant maint as "Maintainer (amd-staging)"
dev->>local: Create branch "rocpd-part1"
dev->>local: Commit "Rocpd part1" changes
local->>gh: Push branch "rocpd-part1"
dev->>gh: Open PR to "amd-staging" with title "Rocpd part1"
gh->>pr: Create PR #6 (status: open, URL provided)
gh->>temp: Attach PR template
note right of temp
  Template fields present but unfilled
  (checklist items unchecked)
end note
gh->>pr: No reviewers assigned
gh->>pr: No labels applied
maint->>pr: View PR summary/description
maint->>dev: Request technical details, tests, docs, changelog
alt Author updates PR content
  dev->>pr: Edit description / fill template
  dev->>local: Add commits addressing feedback
  local->>gh: Push updates to "rocpd-part1"
  gh->>pr: Update PR timeline
else No updates yet
  pr->>maint: Remains open awaiting details
end
Loading

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Sep 23, 2025

✅ Actions performed

Sequence diagram generated.

@sputhala-amd
Copy link
Copy Markdown
Owner Author

@coderabbitai summary

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Sep 23, 2025

✅ Actions performed

Summary regeneration triggered.

@sputhala-amd
Copy link
Copy Markdown
Owner Author

@coderabbitai generate sequence diagram

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Sep 23, 2025

Sequence Diagram for PR

sequenceDiagram
    actor Dev as 'Developer (sputhala-amd)'
    participant GH as 'GitHub Repo'
    participant PR as 'Pull Request #6 (open)'
    participant CI as 'CI Pipeline'
    participant Build as 'Build Job'
    participant SA as 'Static Analysis'
    participant Test as 'Test Suite'
    actor Maint as 'Maintainer'

    Dev->GH: Push branch rocpd-part1
    Dev->GH: Open PR to amd-staging
    GH->PR: Create PR #6 (status: open)
    PR->Dev: PR template present, fields/checklist unfilled
    GH->CI: Trigger workflow on PR open

    CI->Build: Compile project
    Build->CI: Build results

    CI->SA: Lint/format/static checks
    SA->CI: Analysis results

    CI->Test: Run test suite
    Test->CI: Test results

    CI->PR: Report checks/status

    Maint->PR: Review pending (no reviewers assigned)
    PR->Maint: Awaiting details/tests/docs/changelog updates
Loading

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Sep 23, 2025

✅ Actions performed

Sequence diagram generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants