Skip to content

Mxfp8 cast optimization#507

Open
alextmagro wants to merge 5 commits intodevfrom
mxfp8_cast_optimization
Open

Mxfp8 cast optimization#507
alextmagro wants to merge 5 commits intodevfrom
mxfp8_cast_optimization

Conversation

@alextmagro
Copy link
Copy Markdown
Contributor

Description

Improves performance of HIP MXFP8 Cast Kernels -- see confluence page for details
Adds benchmarking scripts for reproducability

@alextmagro alextmagro force-pushed the mxfp8_cast_optimization branch from 630d87a to 74a82fc Compare March 26, 2026 21:34
@alextmagro alextmagro added the ci-level 3 CI test level 3 label Mar 26, 2026

#include "benchmark_utils.h"

#include "amd_detail/hip_float8.h"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dup of line 11


#include "amd_detail/hip_float8.h"

#include <transformer_engine/cast_hip.h>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file hipified?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is written in hip, but references the hipified headers in te. Would you prefer we write this in CUDA then hipify it?

@@ -0,0 +1,1241 @@
// !!! This is a file automatically generated by hipify!!!
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add original file, not hipified

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops. This file was copied over from the test/cpp directory to use the tensor capabilities, but forgot to clean it up to just keep what we need for the benchmarks. I will work on this, and push in a later commit.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding this, made a design change that's hopefully cleaner. I use the test_common files directly, but added a macro that ensures we only compile what we need within the executable.

}

#define TRANSFORMER_ENGINE_CHUNK_DIM_SWITCH(use_large, CHUNK_Y, CHUNK_X, THREADS, ...) \
[&] { \
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those values are selected for specific kernel, aren't they? The define can bereplaced with TRANSFORMER_ENGINE_SWITCH_CONDITION

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, will make that change.

@alextmagro alextmagro requested a review from ipanfilo March 27, 2026 16:43
-DNDEBUG
-DUSE_ROCM
--offload-arch=${GPU_TARGETS}
-w
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the -w here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

during debugging, I found the warnings to be clutter, especially since they came from TE rather than these benchmarks. I have removed the -w now that things are working.

Comment on lines +8 to +10
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend not using colors here, these might not show up well in e.g. CI logs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will remove.


if [ ! -f "${SCRIPT_DIR}/${test_common_hip}" ] || [ ! -f "${SCRIPT_DIR}/${test_common_h}" ]; then
echo -e "${RED}Error: hipified test_common files not found. Build tests before running benchmarks."
echo -e "Error: hipified test_common files not found. Build tests before running benchmarks."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many of these echos don't need -e anymore. I'd also suggest removing the non-ASCII symbols like "✓" and "✗" - these don't always render well in terminals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-level 3 CI test level 3

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants