accel/amdxdna: add CERT logging support for AIE4#1318
Conversation
xdavidz
left a comment
There was a problem hiding this comment.
modify this from
drm/amdxdna: add CERT logging support for AIE4
to
accel/amdxdna: add CERT logging support for AIE4
xdavidz
left a comment
There was a problem hiding this comment.
- Add one empty line above this first comment.
Add support for per-hardware-context CERT logging on AIE4 silicon.
Remove the following description for now, it has too much details.
CERT maintains four classes of per-hwctx debug surfaces --
CERT log, CERT debug, CERT trace, and CERT debug-queue buffers --
which user-space programs and inspects via
DRM_AMDXDNA_HWCTX_{ASSIGN,REMOVE}_DBG_BUF on the existing
DRM_IOCTL_AMDXDNA_CONFIG_HWCTX uAPI.
| @@ -57,16 +57,23 @@ struct amdxdna_dev_hdl { | |||
| struct amdxdna_msg_buf_hdl *work_buf_hdl; | |||
| }; | |||
|
|
|||
There was a problem hiding this comment.
add one empty line above this struct.
I am not sure if your entire code is checkpatch clean.
There was a problem hiding this comment.
I ran check patch. There is already an empty line present above the struct.
If we add one more then the checkpatch mentions "Please don't use multiple blank lines".
xdna-driver# bash tools/codingsty_check.sh drivers/accel/amdxdna/aie4_pci.h
run check patch: /lib/modules/6.17.0-14-generic/build/scripts/checkpatch.pl
CHECK: Please don't use multiple blank lines
#60: FILE: drivers/accel/amdxdna/aie4_pci.h:60:
+
+
total: 0 errors, 0 warnings, 1 checks, 112 lines checked
xdavidz
left a comment
There was a problem hiding this comment.
Please do a rebase instead of using "git merge", it generates unnecessary changes in the repo history.
Please also ask Max or Hayden to add "copilot" as reviewer.
Overall, looks ok to me.
Done |
Signed-off-by: Soham Donwalkar <Soham.Donwalkar@amd.com>
There was a problem hiding this comment.
Pull request overview
Adds AIE4 support for per-hardware-context CERT logging by introducing a UAPI metadata format for CERT firmware buffers and wiring a new AIE4 “configure hw context” management message to attach/detach those buffers.
Changes:
- Introduces new UAPI types to describe per-hwctx CERT buffers (type + per-uC slice descriptors in a metadata BO).
- Adds AIE4 management-message support for
CONFIGURE_HW_CONTEXTand hooks it into the device ops via a newhwctx_configcallback. - Implements AIE4 hwctx attach/detach logic that translates UAPI metadata into firmware “CERT logging” configuration payloads.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| include/uapi/drm/amdxdna_accel.h | Adds new UAPI enums/structs for describing CERT firmware buffers via a metadata BO. |
| drivers/accel/amdxdna/aie4_pci.h | Exposes new AIE4 APIs (aie4_hwctx_config, configure message helper) to other compilation units. |
| drivers/accel/amdxdna/aie4_pci.c | Wires AIE4 hwctx_config into AIE4 VF/classic device ops. |
| drivers/accel/amdxdna/aie4_msg_priv.h | Defines the new AIE4 mgmt opcode + request/response payload layout for hwctx configuration. |
| drivers/accel/amdxdna/aie4_message.c | Implements the new mgmt message send/wait helper for configuring hwctx CERT logging. |
| drivers/accel/amdxdna/aie4_ctx.c | Implements attach/detach handling for CERT buffers using the new UAPI metadata format. |
Comments suppressed due to low confidence (2)
drivers/accel/amdxdna/aie4_ctx.c:457
cl.numis set tometa->num_ucs, but the loop skips entries withuc_info[i].size == 0(and duplicate indices would overwrite earlier entries). This can makecl.numinconsistent with the actually programmedcl.info[]entries. Consider rejecting zero-sized/duplicate entries or computingcl.numfrom the number of populated/unique indices.
cl.num = FIELD_PREP(AIE4_MSG_CERT_LOG_NUM, attach ? meta->num_ucs : 0);
include/uapi/drm/amdxdna_accel.h:187
- New UAPI type name
struct fw_buffer_metadatais generic and doesn’t follow this header’s establishedamdxdna_...prefix convention. Consider renaming it to a prefixed name (e.g.,amdxdna_fw_buffer_metadata) to avoid namespace collisions for UAPI consumers.
struct fw_buffer_metadata {
__u8 buf_type;
__u8 num_ucs;
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| log_bo = amdxdna_gem_get_obj(client, meta->bo_handle, AMDXDNA_BO_SHARE); | ||
| if (!log_bo) { | ||
| XDNA_ERR(xdna, "Get payload bo %llu failed", meta->bo_handle); | ||
| ret = -EINVAL; | ||
| goto put_meta_bo; |
There was a problem hiding this comment.
@donwalkarsoham you need to fix this, because the aie4_hwctx_config passing u64 but for some reason the aie4_hwctx_cfg_debug_bo cast it to u32. Given this is just for searching the bo index, we should not cast it to u32.
| static int aie4_hwctx_cfg_debug_bo(struct amdxdna_hwctx *hwctx, u32 meta_bo_hdl, | ||
| bool attach) | ||
| { | ||
| struct aie4_msg_context_config_cert_logging cl; |
There was a problem hiding this comment.
because you are passing this into the mailbox message and fw will consume this cl, better to take AI's suggestion.
| cl.info[index].paddr = base_addr + prev_size; | ||
| cl.info[index].size = meta->uc_info[i].size; | ||
| prev_size += meta->uc_info[i].size; |
There was a problem hiding this comment.
please add checking after each caculation for buffer overflow
we need to make sure the prev_size + uc_info[i].size is never beyond the BO size of log_bo->mem.size to protect kernel.
The rule here is that never trust user-land passed info, in this case it is size, and protect kernel being corrupted if the size is wrong.
| struct uc_info_entry { | ||
| __u32 index; | ||
| __u32 size; | ||
| }; |
There was a problem hiding this comment.
both uc_info_array and fw_buffer_metadata naming are not acceptable.
Look into existing amdxdna_accel.h, we have to use amdxdna_ prefix for all interface.
| __u64 command_id; | ||
| __u64 bo_handle; | ||
| struct uc_info_entry uc_info[]; |
There was a problem hiding this comment.
we should consider changing this bo_handle to __u32 that matches other bo handle in this interface.
| struct fw_buffer_metadata { | ||
| __u8 buf_type; | ||
| __u8 num_ucs; | ||
| __u8 pad[48]; | ||
| __u64 command_id; |
There was a problem hiding this comment.
please change the pad to pad[54], this is the compatible change.
but I really doubt the original calculation is wrong, the pad should be just pad[6] instead of 48 because this is per byte (8bits) not bits....
however it might be too late to change this if all other compiled applications are using this interface.
I'd @houlz0507 and @maxzhen chime in, what should we do here?
|
@donwalkarsoham please address all AI comments, I think those are valid. Also, fix the conflicts and trigger a clean re-run. |
Add support for per-hardware-context CERT logging on AIE4 silicon.