Skip to content

Commit f99a514

Browse files
committed
Added info to Javid's and Sinan's works.
1 parent e34d321 commit f99a514

File tree

2 files changed

+23
-3
lines changed

2 files changed

+23
-3
lines changed

docs/assets/CG.png

739 KB
Loading

docs/index.md

Lines changed: 23 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -490,7 +490,7 @@ a {
490490
<div class="card grid grid-cols-4 justify-center items-center">
491491
<img class="shadow rounded-full max-w-full h-auto align-middle border-none" src="./team-images/abdullah.png" width="100px" />
492492
<p class="grid-colspan-3">
493-
<b>Alumni:</b> Muhammad Abdullah Soyturk
493+
<b>Alumni:</b> Muhammed Abdullah Soytürk
494494
<br>
495495
<b>Research Focus:</b> Scalable deep learning.
496496
</p>
@@ -531,6 +531,7 @@ BeyondMoore Software Ecosystem
531531
**_Profiling Tools_**
532532
* [Snoopie](#SNOOPIE): A Multi-GPU Communication Profiler and Visualiser
533533
* [PES AMD vs Intel](#PRECISE-EVENT-SAMPLING): A Precise Event Sampling Benchmark Suite
534+
* [aCG](#ACG): CPU- and GPU-initiated Communication Strategies for CG Methods
534535

535536
</div>
536537

@@ -606,13 +607,32 @@ single-process, multi-threaded, and multi-process codes. More details about the
606607
</div>
607608

608609

610+
<div id="ACG" class="h-auto bg-gray-100 rounded-s p-4 border-solid border-1 border-gray-200 flex flex-row justify-start items-start gap-5 transform transition-all hover:shadow-gray-100 hover:shadow-lg shadow-none">
611+
<div clas="flex flex-col justify-start">
612+
<div class="flex flex-row gap-2 justify-start items-center flex-shrink">
613+
<img width="32" src="./assets/git.webp" />
614+
<a href="https://github.com/ParCoreLab/aCG" class="text-xl font-semibold font-sans visited:text-teal-700">CPU- and GPU-initiated Communication Strategies for CG Methods</a>
615+
</div>
616+
<p class="text-lg">This work revisits Conjugate Gradient (CG) parallelization for large-scale multi-GPU systems, addressing challenges from low computational intensity and communication overhead. We develop scalable CG and pipelined CG solvers for NVIDIA and AMD GPUs, employing GPU-aware MPI, NCCL/RCCL, and NVSHMEM for both CPU- and GPU-initiated communication. A monolithic GPU-offloaded variant further enables fully device-driven execution, removing CPU involvement. Optimizations across all designs reduce data transfers and synchronization costs. Evaluations on SuiteSparse matrices and a real finite element application show 8–14% gains over state-of-the-art on single GPUs and 5–15% improvements in strong scaling tests on over 1,000 GPUs. While CPU-driven variants currently benefit from stronger library support, results highlight the promising scalability of GPU-initiated execution for future large-scale systems.</p>
617+
618+
<p>
619+
<a href="https://github.com/ParCoreLab/aCG" class="text-xl font-semibold font-sans visited:text-teal-700">More details and git repository of the project.</a>
620+
</p>
621+
</div>
622+
<div class="grid h-auto justify-center place-items-center">
623+
<img width="400px" src="./assets/CG.png" />
624+
</div>
625+
</div>
626+
609627
<div id="CPU-FREE-MODEL-COMPILER" class="h-auto bg-gray-100 rounded-s p-4 border-solid border-1 border-gray-200 flex flex-row justify-start items-start gap-5 transform transition-all hover:shadow-gray-100 hover:shadow-lg shadow-none">
610628
<div clas="flex flex-col justify-start">
611629
<div class="flex flex-row gap-2 justify-start items-center flex-shrink">
612630
<img width="32" src="./assets/git.webp" />
613631
<a href="https://github.com/ParCoreLab/" class="text-xl font-semibold font-sans visited:text-teal-700">CPU Free Model Compiler</a>
614632
</div>
615-
<p class="text-lg">We're actively crafting a compiler to empower developers to write high-level Python code that compiles into efficient CPU-free device code. This compiler integrates GPU-initiated communication libraries, NVSHMEM for NVIDIA and ROC_SHMEM for AMD, enabling GPU communication directly within Python code. With automatic generation of GPU-initiated communication calls and persistent kernels, we aim to streamline development workflows. Our prototype will be available soon.</p>
633+
<p class="text-lg">We're actively crafting a compiler to empower developers to write high-level Python code that compiles into efficient CPU-free device code. This compiler integrates GPU-initiated communication libraries, NVSHMEM for NVIDIA and ROC_SHMEM for AMD, enabling GPU communication directly within Python code. With automatic generation of GPU-initiated communication calls and persistent kernels, we aim to streamline development workflows.</p> <p>
634+
<a href="https://github.com/ParCoreLab/CPU-Free-model" class="text-xl font-semibold font-sans visited:text-teal-700">More details and git repository of the project.</a>
635+
</p>
616636
</div>
617637
<div class="grid h-auto justify-center place-items-center">
618638
<img width="300px" src="./assets/dace-compiler.png" />
@@ -689,7 +709,7 @@ Graphs</a> <a href="https://docs.google.com/presentation/d/1po87zQeUQb5l12AXB5RM
689709
<div class="card text-lg">Tugba Torun, Ameer Taweel, Didem Unat (2024) <a href="https://arxiv.org/pdf/2405.04944">A Sparse Tensor Generator with Efficient Feature Extraction</a>. <span class="italic">Accepted for publication; online release pending</span>. <a class="italic" href="https://arxiv.org/pdf/2405.04944">preprint pdf</a> </div>
690710

691711
<div class="card text-lg"> Javid Baydamirli, Tal Ben Nun, Didem Unat (2024) <a href="https://ieeexplore.ieee.org/abstract/document/10820747">Autonomous Execution for Multi-GPU Systems:
692-
Compiler Support</a>. In SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis. <a class="italic" download href="./assets/preprint-pdfs/P3HPC_____Autonomous_Execution_for_Multi_GPU_Systems__Compiler_Support-2 (1).pdf">preprint pdf</a>
712+
Compiler Support</a> <a href="https://docs.google.com/presentation/d/1nBsANrcLh0Tnc2qqqDL_-6khqo-Y-_mX5kfJbmRwawE/edit?slide=id.p#slide=id.p">(presentation)</a>. In SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis. <a class="italic" download href="./assets/preprint-pdfs/P3HPC_____Autonomous_Execution_for_Multi_GPU_Systems__Compiler_Support-2 (1).pdf">preprint pdf</a>
693713
</div>
694714
<div class="card text-lg"> Javid Baydamirli, Tal Ben Nun, Didem Unat (2024) <a href="https://sc24.supercomputing.org/proceedings/workshops/workshop_pages/ws_p3hpc108.html">Autonomous Execution for Multi-GPU Systems:
695715
Compiler Support</a> <a href="https://sc24.conference-program.com/presentation/?id=ws_p3hpc108&sess=sess751">(presentation)</a>. In the 2024 International Workshop on Performance, Portability, and Productivity in HPC. <a class="italic" download href="./assets/preprint-pdfs/sc24-workshop-autonomous-execution-for-multi-gpu-systems-compiler-support.pdf">preprint pdf</a>

0 commit comments

Comments
 (0)