Increasing block size dimensions to avoid configs which are slow and poor candidates.#1677
Conversation
| if not is_hip() or not self.grid_block_ids: | ||
| return |
There was a problem hiding this comment.
Similar to the other PR, I wonder if it would make sense to do this in a vendor-agnostic way?
There was a problem hiding this comment.
I only tested it on AMD devices. I can make it vendor agnostic and see if the CI passes for all devices
There was a problem hiding this comment.
@jansel, I don't have access to Nvidia devices, I did make the changes generic but since the PRs change the autotuning space, there are tests which are failing because they still expect the old configs. Can you review the changes and if it looks good, is there a way to update the. expected file on the CI machines and commit?
2d3a9c1 to
4e4b77d
Compare
|
@jansel, Can you approve the workflows? |
|
@umechand-amd yes I approved CI sorry for the delay. Re-request review when you want me to look again. |
33f8a41 to
6dfa634
Compare
| """ | ||
| if not self.grid_block_ids: | ||
| return | ||
| import math |
|
Failing tests and merge conflicts? |
The failing tests need a fix to the expected files, I dont have access to the cuda devices to check the new config and fix the expected files. |
6dfa634 to
3397e6d
Compare
3397e6d to
e6fe78a
Compare
This PR puts a floor on the block sizes by capping the maximum number of blocks in each dimension in a grid, this ensures that we do not test degenerate cases like block size = 1. This will let the autotuning be limited to only the subspace of configs which have block sizes known to give descent performance and not have huge runtime delays.
The thought process is that the known subset of configs (very small block size) which have significantly higher runtime will never be selected by the autotuning bench-marker so exploring those is a waste of autotuning time and compute.