The problem I met only happens when I use --hugify.
I have a program which works well with BOLT.
llvm-bolt ./sample -o ./sample.bolt -data=./workspace/perf.fdata -reorder-blocks=cache+ -reorder-functions=hfsort+ -split-functions=3 -split-all-cold -split-eh -dyno-stats
I want to try the --hugify option. It passed BOLT process. But when I ran it, it will core dump. And problem seems at the entrypoint: (0x10a0 is the disassamble _start address), but different from readelf header. core dump happens at the _start(0x10a0) with segfault. GDB can't capture since program not start yet.
$ llvm-bolt ./sample -o ./sample.bolt -data=./workspace/perf.fdata -reorder-blocks=cache+ -reorder-functions=hfsort+ -split-functions=3 -split-all-cold -split-eh -dyno-stats --hugify
BOLT-INFO: shared object or position-independent executable detected
BOLT-INFO: Target architecture: x86_64
BOLT-INFO: BOLT version: c62053979489ccb002efe411c3af059addcb5d7d
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: creating new program header table at address 0x200000, offset 0x200000
BOLT-WARNING: debug info will be stripped from the binary. Use -update-debug-sections to keep it.
BOLT-INFO: enabling relocation mode
BOLT-WARNING: disabling -split-eh for shared object
BOLT-INFO: enabling lite mode
BOLT-INFO: pre-processing profile using branch profile reader
BOLT-WARNING: Ignored 0 functions due to cold fragments.
BOLT-INFO: 2 out of 13 functions in the binary (15.4%) have non-empty execution profile
BOLT-INFO: the input contains 1 (dynamic count : 429) opportunities for macro-fusion optimization. Will fix instances on a hot path.
BOLT-INFO: 3 instructions were shortened
BOLT-INFO: basic block reordering modified layout of 2 (11.76%) functions
BOLT-INFO: UCE removed 0 blocks and 0 bytes of code.
BOLT-INFO: splitting separates 135 hot bytes from 124 cold bytes (52.12% of split functions is hot).
BOLT-INFO: 1 Functions were reordered by LoopInversionPass
BOLT-INFO: hfsort+ reduced the number of chains from 2 to 1
BOLT-INFO: program-wide dynostats after all optimizations before SCTC and FOP:
429 : executed forward branches
403 : taken forward branches
718626 : executed backward branches
718626 : taken backward branches
441 : executed unconditional branches
429 : all function calls
0 : indirect calls
0 : PLT calls
4324956 : executed instructions
3594762 : executed load instructions
1439014 : executed store instructions
0 : taken jump table branches
0 : taken unknown indirect branches
719496 : total branches
719470 : taken branches
26 : non-taken conditional branches
719029 : taken conditional branches
719055 : all conditional branches
429 : executed forward branches (=)
0 : taken forward branches (-100.0%)
718626 : executed backward branches (=)
718626 : taken backward branches (=)
441 : executed unconditional branches (=)
429 : all function calls (=)
0 : indirect calls (=)
0 : PLT calls (=)
4325373 : executed instructions (+0.0%)
3594762 : executed load instructions (=)
1439014 : executed store instructions (=)
0 : taken jump table branches (=)
0 : taken unknown indirect branches (=)
719496 : total branches (=)
719067 : taken branches (-0.1%)
429 : non-taken conditional branches (+1550.0%)
718626 : taken conditional branches (-0.1%)
719055 : all conditional branches (=)
BOLT-INFO: SCTC: patched 0 tail calls (0 forward) tail calls (0 backward) from a total of 0 while removing 0 double jumps and removing 0 basic blocks totalling 0 bytes of code. CTCs total execution count is 0 and the number of times CTCs are taken is 0.
BOLT-INFO: padding code to 0x600000 to accommodate hot text
BOLT-INFO: setting _end to 0x6001c8
BOLT-INFO: setting __hot_start to 0x400000
BOLT-INFO: setting __hot_end to 0x4000ba
BOLT-INFO: patched build-id (flipped last bit)
But when I ran it, it will core dump. And problem seems at the entrypoint: (0x10a0 is the disassamble _start address), but different from readelf. core dump happens at the _start(0x10a0) with segfault. GDB can't capture since program not start yet.
> munmap(0x7ffff7fc6000, 34326) = 0
> open("/sys/kernel/mm/transparent_hugepage/enabled", O_RDONLY) = 3
> read(3, "always [madvise] never\n", 256) = 23
> madvise(0x555555800000, 2097152, MADV_HUGEPAGE) = 0
> --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, **si_addr=0x10a0**} ---
> +++ killed by SIGSEGV (core dumped) +++
$ readelf -h sample.bolt
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: DYN (Shared object file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x601780
Start of program headers: 2097152 (bytes into file)
Start of section headers: 6301056 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 14
Size of section headers: 64 (bytes)
Number of section headers: 43
Section header string table index: 41
I have hugepage in my system:
$ cat /proc/meminfo |grep -i hug
AnonHugePages: 272384 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 20000
HugePages_Free: 20000
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 40960000 kB
I can't find --hugify manual or guide to help debug this issue. If anyone knows this problem, pls help comment. Thanks a lot!
The problem I met only happens when I use
--hugify.I have a program which works well with BOLT.
I want to try the
--hugifyoption. It passed BOLT process. But when I ran it, it will core dump. And problem seems at the entrypoint: (0x10a0 is the disassamble _start address), but different from readelf header. core dump happens at the _start(0x10a0) with segfault. GDB can't capture since program not start yet.But when I ran it, it will core dump. And problem seems at the entrypoint: (
0x10a0 is the disassamble _start address), but different from readelf. core dump happens at the _start(0x10a0) with segfault. GDB can't capture since program not start yet.I have hugepage in my system:
I can't find --hugify manual or guide to help debug this issue. If anyone knows this problem, pls help comment. Thanks a lot!