Skip to content

Add JSON output to MemoryAnalysis plugin#21

Closed
coleramos425 wants to merge 12 commits intomainfrom
colramos/json-support
Closed

Add JSON output to MemoryAnalysis plugin#21
coleramos425 wants to merge 12 commits intomainfrom
colramos/json-support

Conversation

@coleramos425
Copy link
Collaborator

@coleramos425 coleramos425 commented May 21, 2025

Overview

This PR closes #10 by setting a standard JSON output schema for the Memory Analysis plugin. To start, I'm just supporting the Memory Analysis, but the goal is to create a generalized JsonOutputManager (living in dh_comms) that can be modified to fit other Omniprobe plugins.

This PR is in two parts: (1) this PR to logduration, (2) the JsonOutputManager implementation to dh_comms(AMDResearch/dh_comms#14). Main changes include:

  • Utilizing header only C++ library nlohmann/json for JSON
  • Hide old terminal MemoryAnalysis output behind -v/--verbose flag
  • Write a memory_analysis_output/memory_analysis_<dispatch-id>.json file to CWD where omniprobe command is ran after report() completes
  • In dh_comms, create a JsonOutputManager which is initialized as an empty json struct, and is included in memory_analysis_handler.cpp such that on report()'s calls to report_bank_conflict() and report_cache_line_use() updates the json instance.

** You'll see there's quite a bit of cout debugging that I've left in the code, @rwvo I left that there for your benefit while reviewing. Once you're happy, I'll remove that

Example

Details
# Build project (via container)
cd logduration
git checkout colramos/json-support
git submodule update --init --recursive
cd docker/
./build --apptainer
# Launch project container
apptainer exec logduration.sif bash
# Run MemoryAnalysis with example (I use matrix_transpose)
Apptainer> omniprobe -i -a MemoryAnalysis -- $WORK/audacious/maestro/examples/bank_conflict/matrix_transpose/matrix_transpose
Found config file at /opt/logduration/bin/logDuration/runtime_config.txt

Omniprobe is developed by Advanced Micro Devices, Research and Advanced Development
Copyright (c) 2024 Advanced Micro Devices. All rights reserved.

No Triton cache location provided; assuming HIP run.
HANDLER: libMemAnalysis64.so
Memory Analysis Wrapper loaded.
Adding /work1/amd/colramos/audacious/maestro/examples/bank_conflict/matrix_transpose/matrix_transpose
Adding linux-vdso.so.1
Adding /opt/rocm/lib/libamdhip64.so.6
Adding /lib/x86_64-linux-gnu/libstdc++.so.6
Adding /lib/x86_64-linux-gnu/libm.so.6
Adding /lib/x86_64-linux-gnu/libgcc_s.so.1
Adding /lib/x86_64-linux-gnu/libc.so.6
Adding /opt/rocm/lib/librocprofiler-register.so.0
Adding /opt/rocm/lib/libamd_comgr.so.2
Adding /opt/rocm/lib/libhsa-runtime64.so.1
Adding /lib/x86_64-linux-gnu/libnuma.so.1
Adding /lib64/ld-linux-x86-64.so.2
Adding /lib/x86_64-linux-gnu/libz.so.1
Adding /lib/x86_64-linux-gnu/libzstd.so.1
Adding /lib/x86_64-linux-gnu/libelf.so.1
Adding /opt/amdgpu/lib/x86_64-linux-gnu/libdrm.so.2
Adding /opt/amdgpu/lib/x86_64-linux-gnu/libdrm_amdgpu.so.1
Adding /opt/logduration/lib/logDuration/liblogDuration64.so
Adding /opt/logduration/lib/libdh_comms.so
Adding /opt/logduration/lib/libkernelDB64.so.1
Adding /lib/x86_64-linux-gnu/libdwarf.so.1
Adding /opt/logduration/lib/libMemAnalysis64.so
Adding /work1/amd/colramos/audacious/maestro/examples/bank_conflict/matrix_transpose/matrix_transpose
Adding linux-vdso.so.1
Adding /opt/rocm/lib/libamdhip64.so.6
Adding /lib/x86_64-linux-gnu/libstdc++.so.6
Adding /lib/x86_64-linux-gnu/libm.so.6
Adding /lib/x86_64-linux-gnu/libgcc_s.so.1
Adding /lib/x86_64-linux-gnu/libc.so.6
Adding /opt/rocm/lib/librocprofiler-register.so.0
Adding /opt/rocm/lib/libamd_comgr.so.2
Adding /opt/rocm/lib/libhsa-runtime64.so.1
Adding /lib/x86_64-linux-gnu/libnuma.so.1
Adding /lib64/ld-linux-x86-64.so.2
Adding /lib/x86_64-linux-gnu/libz.so.1
Adding /lib/x86_64-linux-gnu/libzstd.so.1
Adding /lib/x86_64-linux-gnu/libelf.so.1
Adding /opt/amdgpu/lib/x86_64-linux-gnu/libdrm.so.2
Adding /opt/amdgpu/lib/x86_64-linux-gnu/libdrm_amdgpu.so.1
Adding /opt/logduration/lib/logDuration/liblogDuration64.so
Adding /opt/logduration/lib/libdh_comms.so
Adding /opt/logduration/lib/libkernelDB64.so.1
Adding /lib/x86_64-linux-gnu/libdwarf.so.1
Adding /opt/logduration/lib/libMemAnalysis64.so
Adding /work1/amd/colramos/audacious/maestro/examples/bank_conflict/matrix_transpose/matrix_transpose
Adding linux-vdso.so.1
Adding /opt/rocm/lib/libamdhip64.so.6
Adding /lib/x86_64-linux-gnu/libstdc++.so.6
Adding /lib/x86_64-linux-gnu/libm.so.6
Adding /lib/x86_64-linux-gnu/libgcc_s.so.1
Adding /lib/x86_64-linux-gnu/libc.so.6
Adding /opt/rocm/lib/librocprofiler-register.so.0
Adding /opt/rocm/lib/libamd_comgr.so.2
Adding /opt/rocm/lib/libhsa-runtime64.so.1
Adding /lib/x86_64-linux-gnu/libnuma.so.1
Adding /lib64/ld-linux-x86-64.so.2
Adding /lib/x86_64-linux-gnu/libz.so.1
Adding /lib/x86_64-linux-gnu/libzstd.so.1
Adding /lib/x86_64-linux-gnu/libelf.so.1
Adding /opt/amdgpu/lib/x86_64-linux-gnu/libdrm.so.2
Adding /opt/amdgpu/lib/x86_64-linux-gnu/libdrm_amdgpu.so.1
Adding /opt/logduration/lib/logDuration/liblogDuration64.so
Adding /opt/logduration/lib/libdh_comms.so
Adding /opt/logduration/lib/libkernelDB64.so.1
Adding /lib/x86_64-linux-gnu/libdwarf.so.1
Adding /opt/logduration/lib/libMemAnalysis64.so
Adding /work1/amd/colramos/audacious/maestro/examples/bank_conflict/matrix_transpose/matrix_transpose
Adding linux-vdso.so.1
Adding /opt/rocm/lib/libamdhip64.so.6
Adding /lib/x86_64-linux-gnu/libstdc++.so.6
Adding /lib/x86_64-linux-gnu/libm.so.6
Adding /lib/x86_64-linux-gnu/libgcc_s.so.1
Adding /lib/x86_64-linux-gnu/libc.so.6
Adding /opt/rocm/lib/librocprofiler-register.so.0
Adding /opt/rocm/lib/libamd_comgr.so.2
Adding /opt/rocm/lib/libhsa-runtime64.so.1
Adding /lib/x86_64-linux-gnu/libnuma.so.1
Adding /lib64/ld-linux-x86-64.so.2
Adding /lib/x86_64-linux-gnu/libz.so.1
Adding /lib/x86_64-linux-gnu/libzstd.so.1
Adding /lib/x86_64-linux-gnu/libelf.so.1
Adding /opt/amdgpu/lib/x86_64-linux-gnu/libdrm.so.2
Adding /opt/amdgpu/lib/x86_64-linux-gnu/libdrm_amdgpu.so.1
Adding /opt/logduration/lib/logDuration/liblogDuration64.so
Adding /opt/logduration/lib/libdh_comms.so
Adding /opt/logduration/lib/libkernelDB64.so.1
Adding /lib/x86_64-linux-gnu/libdwarf.so.1
Adding /opt/logduration/lib/libMemAnalysis64.so
Adding /work1/amd/colramos/audacious/maestro/examples/bank_conflict/matrix_transpose/matrix_transpose
Adding linux-vdso.so.1
Adding /opt/rocm/lib/libamdhip64.so.6
Adding /lib/x86_64-linux-gnu/libstdc++.so.6
Adding /lib/x86_64-linux-gnu/libm.so.6
Adding /lib/x86_64-linux-gnu/libgcc_s.so.1
Adding /lib/x86_64-linux-gnu/libc.so.6
Adding /opt/rocm/lib/librocprofiler-register.so.0
Adding /opt/rocm/lib/libamd_comgr.so.2
Adding /opt/rocm/lib/libhsa-runtime64.so.1
Adding /lib/x86_64-linux-gnu/libnuma.so.1
Adding /lib64/ld-linux-x86-64.so.2
Adding /lib/x86_64-linux-gnu/libz.so.1
Adding /lib/x86_64-linux-gnu/libzstd.so.1
Adding /lib/x86_64-linux-gnu/libelf.so.1
Adding /opt/amdgpu/lib/x86_64-linux-gnu/libdrm.so.2
Adding /opt/amdgpu/lib/x86_64-linux-gnu/libdrm_amdgpu.so.1
Adding /opt/logduration/lib/logDuration/liblogDuration64.so
Adding /opt/logduration/lib/libdh_comms.so
Adding /opt/logduration/lib/libkernelDB64.so.1
Adding /lib/x86_64-linux-gnu/libdwarf.so.1
Adding /opt/logduration/lib/libMemAnalysis64.so
Adding /work1/amd/colramos/audacious/maestro/examples/bank_conflict/matrix_transpose/matrix_transpose
Adding linux-vdso.so.1
Adding /opt/rocm/lib/libamdhip64.so.6
Adding /lib/x86_64-linux-gnu/libstdc++.so.6
Adding /lib/x86_64-linux-gnu/libm.so.6
Adding /lib/x86_64-linux-gnu/libgcc_s.so.1
Adding /lib/x86_64-linux-gnu/libc.so.6
Adding /opt/rocm/lib/librocprofiler-register.so.0
Adding /opt/rocm/lib/libamd_comgr.so.2
Adding /opt/rocm/lib/libhsa-runtime64.so.1
Adding /lib/x86_64-linux-gnu/libnuma.so.1
Adding /lib64/ld-linux-x86-64.so.2
Adding /lib/x86_64-linux-gnu/libz.so.1
Adding /lib/x86_64-linux-gnu/libzstd.so.1
Adding /lib/x86_64-linux-gnu/libelf.so.1
Adding /opt/amdgpu/lib/x86_64-linux-gnu/libdrm.so.2
Adding /opt/amdgpu/lib/x86_64-linux-gnu/libdrm_amdgpu.so.1
Adding /opt/logduration/lib/logDuration/liblogDuration64.so
Adding /opt/logduration/lib/libdh_comms.so
Adding /opt/logduration/lib/libkernelDB64.so.1
Adding /lib/x86_64-linux-gnu/libdwarf.so.1
Adding /opt/logduration/lib/libMemAnalysis64.so
Adding /work1/amd/colramos/audacious/maestro/examples/bank_conflict/matrix_transpose/matrix_transpose
Adding linux-vdso.so.1
Adding /opt/rocm/lib/libamdhip64.so.6
Adding /lib/x86_64-linux-gnu/libstdc++.so.6
Adding /lib/x86_64-linux-gnu/libm.so.6
Adding /lib/x86_64-linux-gnu/libgcc_s.so.1
Adding /lib/x86_64-linux-gnu/libc.so.6
Adding /opt/rocm/lib/librocprofiler-register.so.0
Adding /opt/rocm/lib/libamd_comgr.so.2
Adding /opt/rocm/lib/libhsa-runtime64.so.1
Adding /lib/x86_64-linux-gnu/libnuma.so.1
Adding /lib64/ld-linux-x86-64.so.2
Adding /lib/x86_64-linux-gnu/libz.so.1
Adding /lib/x86_64-linux-gnu/libzstd.so.1
Adding /lib/x86_64-linux-gnu/libelf.so.1
Adding /opt/amdgpu/lib/x86_64-linux-gnu/libdrm.so.2
Adding /opt/amdgpu/lib/x86_64-linux-gnu/libdrm_amdgpu.so.1
Adding /opt/logduration/lib/logDuration/liblogDuration64.so
Adding /opt/logduration/lib/libdh_comms.so
Adding /opt/logduration/lib/libkernelDB64.so.1
Adding /lib/x86_64-linux-gnu/libdwarf.so.1
Adding /opt/logduration/lib/libMemAnalysis64.so
Adding /work1/amd/colramos/audacious/maestro/examples/bank_conflict/matrix_transpose/matrix_transpose
Adding linux-vdso.so.1
Adding /opt/rocm/lib/libamdhip64.so.6
Adding /lib/x86_64-linux-gnu/libstdc++.so.6
Adding /lib/x86_64-linux-gnu/libm.so.6
Adding /lib/x86_64-linux-gnu/libgcc_s.so.1
Adding /lib/x86_64-linux-gnu/libc.so.6
Adding /opt/rocm/lib/librocprofiler-register.so.0
Adding /opt/rocm/lib/libamd_comgr.so.2
Adding /opt/rocm/lib/libhsa-runtime64.so.1
Adding /lib/x86_64-linux-gnu/libnuma.so.1
Adding /lib64/ld-linux-x86-64.so.2
Adding /lib/x86_64-linux-gnu/libz.so.1
Adding /lib/x86_64-linux-gnu/libzstd.so.1
Adding /lib/x86_64-linux-gnu/libelf.so.1
Adding /opt/amdgpu/lib/x86_64-linux-gnu/libdrm.so.2
Adding /opt/amdgpu/lib/x86_64-linux-gnu/libdrm_amdgpu.so.1
Adding /opt/logduration/lib/logDuration/liblogDuration64.so
Adding /opt/logduration/lib/libdh_comms.so
Adding /opt/logduration/lib/libkernelDB64.so.1
Adding /lib/x86_64-linux-gnu/libdwarf.so.1
Adding /opt/logduration/lib/libMemAnalysis64.so
>>>>>>>> HSA intercept registered.
[MemoryAnalysisWrapper] Initializing wrapper for kernel: matrixTransposeShared(float*, float const*, int, int) [clone .kd]
[JsonOutputManager] Constructor called for instance 0x8a932f0
[JsonOutputManager] Created new instance at 0x8a932f0
[JsonOutputManager:0x8a932f0] Initializing kernel analysis for matrixTransposeShared(float*, float const*, int, int) [clone .kd] (dispatch_id: 1)
[JsonOutputManager:0x8a932f0] Kernel analysis initialized, size: 1
[JsonOutputManager:0x8a932f0] Setting metadata - GPU: gfx90a:sramecc+:xnack-, Cache line size: 128, Kernels found: 0
[JsonOutputManager:0x8a932f0] Metadata set successfully
[MemoryAnalysisWrapper] Initialization complete
Cache Watcher shutting down
Comms Runner shutting down
[JsonOutputManager:0x8a932f0] Updating kernels found to: 17
Memory analysis for matrixTransposeShared(float*, float const*, int, int) [clone .kd] dispatch_id[1]
[MemoryAnalysisWrapper] Starting wrapped_.report()...
[MemoryAnalysisHandler] Starting report...
[MemoryAnalysisHandler] Starting cache line use report...
[CodeContext] Retrieving context for /work1/amd/colramos/audacious/maestro/examples/bank_conflict/matrix_transpose/matrix_transpose.hip:40
[CodeContext] Cache miss - loading new file: /work1/amd/colramos/audacious/maestro/examples/bank_conflict/matrix_transpose/matrix_transpose.hip
[CodeContext] Cached 84 lines from file
[CodeContext] Retrieved line 40:     tile[threadIdx.y][threadIdx.x] = in[y * width + x];
[JsonOutputManager:0x8a932f0] Adding cache analysis for /work1/amd/colramos/audacious/maestro/examples/bank_conflict/matrix_transpose/matrix_transpose.hip:40:38
  Access type: read, IR bytes: 4, ISA bytes: 4
  Cache lines needed: 32768, used: 65536
[JsonOutputManager:0x8a932f0] Added cache analysis, current size: 1
[DEBUG] After addCacheAnalysis - Current kernel analyses size: 1
[CodeContext] Retrieving context for /work1/amd/colramos/audacious/maestro/examples/bank_conflict/matrix_transpose/matrix_transpose.hip:49
[CodeContext] Using cached contents for: /work1/amd/colramos/audacious/maestro/examples/bank_conflict/matrix_transpose/matrix_transpose.hip
[CodeContext] Retrieved line 49:     out[y * height + x] = tile[threadIdx.x][threadIdx.y];
[JsonOutputManager:0x8a932f0] Adding cache analysis for /work1/amd/colramos/audacious/maestro/examples/bank_conflict/matrix_transpose/matrix_transpose.hip:49:25
  Access type: write, IR bytes: 4, ISA bytes: 4
  Cache lines needed: 32768, used: 65536
[JsonOutputManager:0x8a932f0] Added cache analysis, current size: 1
[DEBUG] After addCacheAnalysis - Current kernel analyses size: 1
[MemoryAnalysisHandler] Completed cache line use report
[MemoryAnalysisHandler] Starting bank conflicts report...
[CodeContext] Retrieving context for /work1/amd/colramos/audacious/maestro/examples/bank_conflict/matrix_transpose/matrix_transpose.hip:49
[CodeContext] Using cached contents for: /work1/amd/colramos/audacious/maestro/examples/bank_conflict/matrix_transpose/matrix_transpose.hip
[CodeContext] Retrieved line 49:     out[y * height + x] = tile[threadIdx.x][threadIdx.y];
[JsonOutputManager:0x8a932f0] Adding bank conflict for /work1/amd/colramos/audacious/maestro/examples/bank_conflict/matrix_transpose/matrix_transpose.hip:49:27
  Access type: read, IR bytes: 4
  Execution count: 16384, Total conflicts: 229376
[JsonOutputManager:0x8a932f0] Added bank conflict, current size: 1
[DEBUG] After addBankConflict - Current kernel analyses size: 1
[MemoryAnalysisHandler] Completed bank conflicts report
[MemoryAnalysisHandler] Report completed
[MemoryAnalysisWrapper] Completed wrapped_.report() successfully
[MemoryAnalysisWrapper] Current analysis size after report: 1
[MemoryAnalysisWrapper] Output directory: "/work1/amd/colramos/audacious/logduration/docker/memory_analysis_output"
[MemoryAnalysisWrapper] Creating output directory...
[MemoryAnalysisWrapper] Writing memory analysis to /work1/amd/colramos/audacious/logduration/docker/memory_analysis_output/memory_analysis_1.json
[MemoryAnalysisWrapper] Dumping current JsonOutputManager state...
[MemoryAnalysisWrapper] Current analysis size before dump: 1
[JsonOutputManager:0x8a932f0] Current State:
{
  "kernel_analyses": [
    {
      "bank_conflicts": {
        "accesses": [
          {
            "access_info": {
              "execution_count": 16384,
              "ir_bytes": 4,
              "total_conflicts": 229376,
              "type": "read"
            },
            "code_context": "    out[y * height + x] = tile[threadIdx.x][threadIdx.y];",
            "source_location": {
              "column": 27,
              "file": "/work1/amd/colramos/audacious/maestro/examples/bank_conflict/matrix_transpose/matrix_transpose.hip",
              "line": 49
            }
          }
        ]
      },
      "cache_analysis": {
        "accesses": [
          {
            "access_info": {
              "cache_lines": {
                "needed": 32768,
                "used": 65536
              },
              "execution_count": 16384,
              "ir_bytes": 4,
              "isa_bytes": 4,
              "isa_instruction": "global_load_dword",
              "type": "read"
            },
            "code_context": "    tile[threadIdx.y][threadIdx.x] = in[y * width + x];",
            "source_location": {
              "column": 38,
              "file": "/work1/amd/colramos/audacious/maestro/examples/bank_conflict/matrix_transpose/matrix_transpose.hip",
              "line": 40
            }
          },
          {
            "access_info": {
              "cache_lines": {
                "needed": 32768,
                "used": 65536
              },
              "execution_count": 16384,
              "ir_bytes": 4,
              "isa_bytes": 4,
              "isa_instruction": "global_store_dword",
              "type": "write"
            },
            "code_context": "    out[y * height + x] = tile[threadIdx.x][threadIdx.y];",
            "source_location": {
              "column": 25,
              "file": "/work1/amd/colramos/audacious/maestro/examples/bank_conflict/matrix_transpose/matrix_transpose.hip",
              "line": 49
            }
          }
        ]
      },
      "kernel_info": {
        "dispatch_id": 1,
        "name": "matrixTransposeShared(float*, float const*, int, int) [clone .kd]"
      }
    }
  ],
  "metadata": {
    "gpu_info": {
      "architecture": "gfx90a:sramecc+:xnack-",
      "cache_line_size": 128
    },
    "kernels_found": 17,
    "timestamp": "2025-05-21 14:22:35",
    "version": "1.0"
  }
}
[MemoryAnalysisWrapper] Writing to file...
[JsonOutputManager:0x8a932f0] Writing analysis to file: /work1/amd/colramos/audacious/logduration/docker/memory_analysis_output/memory_analysis_1.json
[JsonOutputManager:0x8a932f0] File write complete
[MemoryAnalysisWrapper] Report completed successfully
54525952 bytes processed in 2.060784 seconds (26.5 MiB/s)
[JsonOutputManager:0x8a932f0] Clearing all data
signal_runner is shutting down
Ending kernelDB
Found 17 kernels.
Ending kernelDB
Found 17 kernels.
Ending kernelDB
Found 17 kernels.
Ending kernelDB
Found 17 kernels.
Ending kernelDB
Found 17 kernels.
Ending kernelDB
Found 17 kernels.
Ending kernelDB
Found 17 kernels.
Ending kernelDB
Found 17 kernels.
hsaInterceptor: Application elapsed usecs: 4165494us

Questions for reviewers

  1. Is there a better way to query cache line size for metadata output (see example). Right now, I'm hardcoding 128B because I know this is true for MI200, but does this hold true in MI300? Is there a more robust solution that can dynamically query this info
  2. The JSON output schema contains a metadata field which holds useful debug information such as timestamp and version. Ideally, I'd like to introduce a VERSION file at project root that we can use to dynamically populate fields like this throughout the project. However, having several components, it's unclear to be which project's VERSION we are to report here. Any outside perspective?

@coleramos425 coleramos425 marked this pull request as ready for review May 21, 2025 19:42
@coleramos425 coleramos425 requested a review from rwvo May 21, 2025 19:42
…to memory_analysis_handler, write output file on completion

Signed-off-by: coleramos425 <colramos@amd.com>
Signed-off-by: coleramos425 <colramos@amd.com>
Signed-off-by: coleramos425 <colramos@amd.com>
…ration

Signed-off-by: coleramos425 <colramos@amd.com>
Signed-off-by: coleramos425 <colramos@amd.com>
…ding

The wrapped class memory_analysis_handler_t has a default deconstructor that should be called. Otherwise, the JsonOutputManger is destroyed prematurely when the library is unloaded, and then a new empty instance us being created inadvertently.

Signed-off-by: coleramos425 <colramos@amd.com>
Signed-off-by: coleramos425 <colramos@amd.com>
…o dh_comms to avoid additional dependency

Signed-off-by: coleramos425 <colramos@amd.com>
Signed-off-by: coleramos425 <colramos@amd.com>
@coleramos425 coleramos425 force-pushed the colramos/json-support branch from 3fc34f8 to a2ed5c3 Compare May 27, 2025 18:11
Signed-off-by: coleramos425 <colramos@amd.com>
…in a single object

Signed-off-by: coleramos425 <colramos@amd.com>
Signed-off-by: coleramos425 <colramos@amd.com>
@coleramos425
Copy link
Collaborator Author

Closing this (outdated) PR in favor of an improved implementation that logs JSON inside of the memory analysis handler

@coleramos425 coleramos425 deleted the colramos/json-support branch September 25, 2025 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Req] Standardized output format for message handeler

2 participants