Skip to content

Remove remaining memref.dealloc after InsertGpuAllocs pass#3

Open
dchigarev wants to merge 11 commits intoMenooker:devfrom
dchigarev:dealloc-fix
Open

Remove remaining memref.dealloc after InsertGpuAllocs pass#3
dchigarev wants to merge 11 commits intoMenooker:devfrom
dchigarev:dealloc-fix

Conversation

@dchigarev
Copy link
Copy Markdown

This PR fixes the following buggy behavior that caused double deallocations:

// this is how insert-gpu-allocs worked sometimes before the fix
// before insert-gpu-allocs pass
func.func @test() {
    %0 = memref.alloc() : memref<32x4096xf16>
    ...
    memref.dealloc %0: memref<32x4096xf16>
}

// after insert-gpu-allocs pass
func.func @test() {
    %0 = gpu.alloc() host_shared : memref<32x4096xf16> // memref.alloc was replaced with GPU
    ...
    memref.dealloc %0 : memref<32x4096xf16> // memref.dealloc is still there (causes crashes)
    gpu.dealloc %0 : memref<32x4096xf16> // added gpu.dealloc
}

This bug causes xegpu pipeline in graph compiler to crash on big tensors. I've noticed this problem for the first time on the new tests added by intel/graph-compiler#220. @LongshengDu have you encountered this problem already? Maybe you have another fix?

crash dump that I've seen in GC
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /home/jovyan/graph-compiler/build/bin/gc-cpu-runner -e main -entry-point-result=void --shared-libs=/home/jovyan/llvm/llvm-gc-master-patches-install/lib/libmlir_runner_utils.so,/home/jovyan/llvm/llvm-gc-master-patches-install/lib/libmlir_c_runner_utils.so,/home/jovyan/graph-compiler/build/lib/libGcOpenclRuntime.so
 #0 0x00005589fcc4c2a0 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/jovyan/graph-compiler/build/bin/gc-cpu-runner+0x3b82a0)
 #1 0x00005589fcc493af llvm::sys::RunSignalHandlers() (/home/jovyan/graph-compiler/build/bin/gc-cpu-runner+0x3b53af)
 #2 0x00005589fcc49505 SignalHandler(int) Signals.cpp:0:0
 #3 0x00007f1def2716ac (/usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so+0x5436ac)
 #4 0x00007f1e29ef8520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #5 0x00007f1e29f5b3fe __libc_free ./malloc/malloc.c:3368:7
 #6 0x00007f1e2a43279d 
 #7 0x00007f1e2a432ff7 
 #8 0x00007f1e2a4338e1 
 #9 0x00005589fd205a0c compileAndExecute((anonymous namespace)::Options&, mlir::Operation*, llvm::StringRef, (anonymous namespace)::CompileAndExecuteConfig, void**, std::unique_ptr<llvm::TargetMachine, std::default_delete<llvm::TargetMachine>>) JitRunner.cpp:0:0
#10 0x00005589fd205ead compileAndExecuteVoidFunction((anonymous namespace)::Options&, mlir::Operation*, llvm::StringRef, (anonymous namespace)::CompileAndExecuteConfig, std::unique_ptr<llvm::TargetMachine, std::default_delete<llvm::TargetMachine>>) JitRunner.cpp:0:0
#11 0x00005589fd207473 mlir::JitRunnerMain(int, char**, mlir::DialectRegistry const&, mlir::JitRunnerConfig) (/home/jovyan/graph-compiler/build/bin/gc-cpu-runner+0x973473)
#12 0x00005589fcb846c0 std::vector<std::unique_ptr<mlir::DialectExtensionBase, std::default_delete<mlir::DialectExtensionBase>>, std::allocator<std::unique_ptr<mlir::DialectExtensionBase, std::default_delete<mlir::DialectExtensionBase>>>>::~vector() /usr/include/c++/11/bits/stl_vector.h:680:15
#13 0x00005589fcb846c0 mlir::DialectRegistry::~DialectRegistry() /home/jovyan/llvm/llvm-gc-master-patches-install/include/mlir/IR/DialectRegistry.h:139:7
#14 0x00005589fcb846c0 main /home/jovyan/graph-compiler/src/gc-cpu-runner/gc-cpu-runner.cpp:46:1
#15 0x00007f1e29edfd90 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#16 0x00007f1e29edfe40 call_init ./csu/../csu/libc-start.c:128:20
#17 0x00007f1e29edfe40 __libc_start_main ./csu/../csu/libc-start.c:379:5
#18 0x00005589fcc35195 _start (/home/jovyan/graph-compiler/build/bin/gc-cpu-runner+0x3a1195)

Menooker and others added 11 commits July 3, 2024 09:28
Signed-off-by: dchigarev <dmitry.chigarev@intel.com>
Signed-off-by: dchigarev <dmitry.chigarev@intel.com>
Signed-off-by: dchigarev <dmitry.chigarev@intel.com>
This PR aligns IMEX with the new LLVM version in GC (introduced here: intel/graph-compiler#204)

The update has broken the following tests in IMEX (namely this assert statement now fails), I'm not sure what exactly caused this problem (my changes to IMEX or new LLVM versions). We don't use XeTile in our linalg->xegpu->gpu exe pipeline, so our flow is not affected by this problem.
Signed-off-by: dchigarev <dmitry.chigarev@intel.com>
@dchigarev dchigarev changed the title Remove remaining memref.dealloc in InsertGpuAllocs pass Remove remaining memref.dealloc after InsertGpuAllocs pass Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants