Skip to content

Conversation

@josephjohnjj
Copy link
Contributor

Delegate GPU task completion to a co-manager using the MCA parameter device_cuda_delegate_task_completion.

  1. The second CPU thread that submits the task to the GPU device is transitioned to a co-manager.
  2. The task is completed by the manager if the co manager has not yet been set.
  3. The manager pushes the task to be completed to a co-manager specific queue.
  4. The GPU task is freed by the thread (manager or co-manager) that completes it.

complete_mutex - tracks the number of tasks to be completed by the co-manager
to_complete - list of tasks to be completed by the co-manager
co_manager_mutex - ensures that there is only one co-manager per device
The second thread that submits the task to the GPU device is transitioned to a co-manager.
The task is completed by the manager if the co manager has not yet been set.
The task is freed by the manager if it completes the tasks or the task is freed by the
co-manager.
@devreal
Copy link
Contributor

devreal commented Nov 1, 2024

What is the status of this? Any reason for not taking this in?

@josephjohnjj Could you please rebase your branch?

@josephjohnjj
Copy link
Contributor Author

josephjohnjj commented Nov 1, 2024

@devreal There was no performance improvement when using a co-manager to complete the task. @bosilca suggested that this might be due to a single task completion not generating enough child tasks to make a noticeable impact. Unlike #566, in #509 co-manager just completed the tasks and was not involved in task execution.

I'm doubtful that rebasing the code would be helpful this at this stage. In my codebase, all task offloading to GPU occurs in parsec_cuda_kernel_scheduler() within parsec/mca/device/cuda/device_cuda_module.c.

In the current codebase, this has been moved to parsec_device_kernel_scheduler() in parsec/parsec/mca/device/cuda/device_cuda_module.c.

I can implement the same in the current codebase if having a co-manger will be helpful. Also, the co-manager was controlled by an MCA parameter, so if in the extreme case there is just 2 cores we could make the choice not to use the co-manager.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants