Skip to content

accel/amdxdna: add CERT logging support for AIE4#1318

Open
donwalkarsoham wants to merge 1 commit into
amd:mainfrom
donwalkarsoham:cert-log
Open

accel/amdxdna: add CERT logging support for AIE4#1318
donwalkarsoham wants to merge 1 commit into
amd:mainfrom
donwalkarsoham:cert-log

Conversation

@donwalkarsoham
Copy link
Copy Markdown
Contributor

@donwalkarsoham donwalkarsoham commented May 15, 2026

Add support for per-hardware-context CERT logging on AIE4 silicon.

@donwalkarsoham donwalkarsoham changed the title drm/amdxdna: add CERT logging support for AIE4 accel/amdxdna: add CERT logging support for AIE4 May 15, 2026
Copy link
Copy Markdown
Contributor

@xdavidz xdavidz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modify this from
drm/amdxdna: add CERT logging support for AIE4
to
accel/amdxdna: add CERT logging support for AIE4

Copy link
Copy Markdown
Contributor

@xdavidz xdavidz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Add one empty line above this first comment.

Add support for per-hardware-context CERT logging on AIE4 silicon.

Remove the following description for now, it has too much details.

CERT maintains four classes of per-hwctx debug surfaces --
CERT log, CERT debug, CERT trace, and CERT debug-queue buffers --
which user-space programs and inspects via
DRM_AMDXDNA_HWCTX_{ASSIGN,REMOVE}_DBG_BUF on the existing
DRM_IOCTL_AMDXDNA_CONFIG_HWCTX uAPI.

@@ -57,16 +57,23 @@ struct amdxdna_dev_hdl {
struct amdxdna_msg_buf_hdl *work_buf_hdl;
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add one empty line above this struct.
I am not sure if your entire code is checkpatch clean.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran check patch. There is already an empty line present above the struct.
If we add one more then the checkpatch mentions "Please don't use multiple blank lines".

xdna-driver# bash tools/codingsty_check.sh drivers/accel/amdxdna/aie4_pci.h
run check patch: /lib/modules/6.17.0-14-generic/build/scripts/checkpatch.pl
CHECK: Please don't use multiple blank lines
#60: FILE: drivers/accel/amdxdna/aie4_pci.h:60:
+
+

total: 0 errors, 0 warnings, 1 checks, 112 lines checked  

Copy link
Copy Markdown
Contributor

@xdavidz xdavidz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do a rebase instead of using "git merge", it generates unnecessary changes in the repo history.

Please also ask Max or Hayden to add "copilot" as reviewer.

Overall, looks ok to me.

@donwalkarsoham
Copy link
Copy Markdown
Contributor Author

  1. Add one empty line above this first comment.

Add support for per-hardware-context CERT logging on AIE4 silicon.

Remove the following description for now, it has too much details.

CERT maintains four classes of per-hwctx debug surfaces -- CERT log, CERT debug, CERT trace, and CERT debug-queue buffers -- which user-space programs and inspects via DRM_AMDXDNA_HWCTX_{ASSIGN,REMOVE}_DBG_BUF on the existing DRM_IOCTL_AMDXDNA_CONFIG_HWCTX uAPI.

Done

Signed-off-by: Soham Donwalkar <Soham.Donwalkar@amd.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds AIE4 support for per-hardware-context CERT logging by introducing a UAPI metadata format for CERT firmware buffers and wiring a new AIE4 “configure hw context” management message to attach/detach those buffers.

Changes:

  • Introduces new UAPI types to describe per-hwctx CERT buffers (type + per-uC slice descriptors in a metadata BO).
  • Adds AIE4 management-message support for CONFIGURE_HW_CONTEXT and hooks it into the device ops via a new hwctx_config callback.
  • Implements AIE4 hwctx attach/detach logic that translates UAPI metadata into firmware “CERT logging” configuration payloads.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
include/uapi/drm/amdxdna_accel.h Adds new UAPI enums/structs for describing CERT firmware buffers via a metadata BO.
drivers/accel/amdxdna/aie4_pci.h Exposes new AIE4 APIs (aie4_hwctx_config, configure message helper) to other compilation units.
drivers/accel/amdxdna/aie4_pci.c Wires AIE4 hwctx_config into AIE4 VF/classic device ops.
drivers/accel/amdxdna/aie4_msg_priv.h Defines the new AIE4 mgmt opcode + request/response payload layout for hwctx configuration.
drivers/accel/amdxdna/aie4_message.c Implements the new mgmt message send/wait helper for configuring hwctx CERT logging.
drivers/accel/amdxdna/aie4_ctx.c Implements attach/detach handling for CERT buffers using the new UAPI metadata format.
Comments suppressed due to low confidence (2)

drivers/accel/amdxdna/aie4_ctx.c:457

  • cl.num is set to meta->num_ucs, but the loop skips entries with uc_info[i].size == 0 (and duplicate indices would overwrite earlier entries). This can make cl.num inconsistent with the actually programmed cl.info[] entries. Consider rejecting zero-sized/duplicate entries or computing cl.num from the number of populated/unique indices.
	cl.num = FIELD_PREP(AIE4_MSG_CERT_LOG_NUM, attach ? meta->num_ucs : 0);

include/uapi/drm/amdxdna_accel.h:187

  • New UAPI type name struct fw_buffer_metadata is generic and doesn’t follow this header’s established amdxdna_... prefix convention. Consider renaming it to a prefixed name (e.g., amdxdna_fw_buffer_metadata) to avoid namespace collisions for UAPI consumers.
struct fw_buffer_metadata {
	__u8 buf_type;
	__u8 num_ucs;

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +425 to +429
log_bo = amdxdna_gem_get_obj(client, meta->bo_handle, AMDXDNA_BO_SHARE);
if (!log_bo) {
XDNA_ERR(xdna, "Get payload bo %llu failed", meta->bo_handle);
ret = -EINVAL;
goto put_meta_bo;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@donwalkarsoham you need to fix this, because the aie4_hwctx_config passing u64 but for some reason the aie4_hwctx_cfg_debug_bo cast it to u32. Given this is just for searching the bo index, we should not cast it to u32.

static int aie4_hwctx_cfg_debug_bo(struct amdxdna_hwctx *hwctx, u32 meta_bo_hdl,
bool attach)
{
struct aie4_msg_context_config_cert_logging cl;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because you are passing this into the mailbox message and fw will consume this cl, better to take AI's suggestion.

Comment on lines +451 to +453
cl.info[index].paddr = base_addr + prev_size;
cl.info[index].size = meta->uc_info[i].size;
prev_size += meta->uc_info[i].size;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add checking after each caculation for buffer overflow
we need to make sure the prev_size + uc_info[i].size is never beyond the BO size of log_bo->mem.size to protect kernel.

The rule here is that never trust user-land passed info, in this case it is size, and protect kernel being corrupted if the size is wrong.

Comment on lines +165 to +168
struct uc_info_entry {
__u32 index;
__u32 size;
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both uc_info_array and fw_buffer_metadata naming are not acceptable.

Look into existing amdxdna_accel.h, we have to use amdxdna_ prefix for all interface.

Comment on lines +189 to +191
__u64 command_id;
__u64 bo_handle;
struct uc_info_entry uc_info[];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should consider changing this bo_handle to __u32 that matches other bo handle in this interface.

Comment on lines +185 to +189
struct fw_buffer_metadata {
__u8 buf_type;
__u8 num_ucs;
__u8 pad[48];
__u64 command_id;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change the pad to pad[54], this is the compatible change.

but I really doubt the original calculation is wrong, the pad should be just pad[6] instead of 48 because this is per byte (8bits) not bits....

however it might be too late to change this if all other compiled applications are using this interface.

I'd @houlz0507 and @maxzhen chime in, what should we do here?

@xdavidz
Copy link
Copy Markdown
Contributor

xdavidz commented May 22, 2026

@donwalkarsoham please address all AI comments, I think those are valid. Also, fix the conflicts and trigger a clean re-run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants