Skip to content

Commit 15762dc

Browse files
committed
Add restart_plugin/README (document MANA restart)
1 parent f00b6dc commit 15762dc

1 file changed

Lines changed: 79 additions & 0 deletions

File tree

restart_plugin/README

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
When ./configure-mana is called in MANA, it calls ./configure, which
2+
in turn configures the DMTCP submodule as:
3+
./configure ... --enable-debug CFLAGS=-fno-stack-protector CXXFLAGS=-fno-stack-protector MPI_BIN=/usr/local/bin ... MANA_USE_LH_FIXED_ADDRESS=1 --with-mana-helper-dir=../restart_plugin --disable-dlsym-wrapper ...
4+
5+
Hence, '--with-mana-helper-dir' above points to this directory.
6+
7+
In ../dmtcp/src/mtcp/Makefile.in, it has hardwired MANA-specific code
8+
to compile the local filenames here as object files in ../dmtcp/src/mtcp:
9+
10+
ifneq ($(MANA_HELPER_DIR),)
11+
HEADERS += $(MANA_HELPER_DIR)/mtcp_split_process.h \
12+
$(MANA_HELPER_DIR)/ucontext_i.h
13+
OBJS += mtcp_restart_plugin.o mtcp_split_process.o getcontext.o
14+
CFLAGS += -DMTCP_PLUGIN_H="<mtcp_restart_plugin.h>"
15+
INCLUDES += -I$(MANA_HELPER_DIR)
16+
endif
17+
18+
(Note that MANA_HELPER_DIR was set by ./configure using --with-mana-helper-dir.)
19+
20+
====
21+
When MANA restarts using mana_restart, the relevant logic is found in the
22+
files of this directory (at the time of restart) and
23+
../mpi-proxy-split/mpi_plugin.cpp (earlier at the time of checkpoint).
24+
25+
mpi_plugin.cpp has written libsStart, libsEnd and highMemStart into the
26+
MTCP header of each checkpoint image at the time of checkpoint.
27+
28+
At the time of checkpoint, control comes to:
29+
../mpi-proxy-split/mpi_plugin.cpp:computeUnionOfCkptImageAddresses()
30+
(i) which computes libsStart, libsEnd, highMemStart
31+
(ii) and saves it in the MTCP header of the checkpoint image,
32+
(iii) such that [libsStart, libsEnd]+[highMemStart, STACK] should
33+
cover all memory regions of the upper half for every rank.
34+
35+
At the time of restart, control comes to:
36+
../dmtcp/src/mtcp/mtcp_restart.c:main() ->
37+
mtcp_restart_plugin.c:mtcp_plugin_hook() ->
38+
mtcp_split_process.c:splitProcess() ->
39+
mtcp_split_process.c:initializeLowerHalf() ->
40+
(i) mtcp_split_process.c:splitProcess()
41+
// forks proxy process for lower half
42+
// and then copies it into cur. process
43+
(ii) initializes the lower half with libc_start_main (now that it is
44+
in the current process)
45+
(iii) returns to 'splitProcess()', which returns to 'mtcp_plugin_hook()':
46+
mtcp_restart_plugin.c:mtcp_plugin_hook() ->
47+
(i) We finished 'splitProcess()', above.
48+
(ii) reserve_fds_upper_half()
49+
reserveUpperHalfMemoryRegionsForCkptImgs() // mmap memory regions
50+
// of future upper half
51+
(iii) JUMP_TO_LOWER_HALF()
52+
(iv) // MPI_Init is called here. Network memory areas are loaded by MPI_Init
53+
// Also, MPI_Cart_create will be called to restore cartesian topology.
54+
// Based on the coordinates, checkpoint image is restored instead of
55+
// world rank.
56+
// This includes /dev/xpmem, *shared_mem*, etc.
57+
(v) RETURN_TO_UPPER_HALF()
58+
(vi) releaseUpperHalfMemoryRegionsForCkptImgs()
59+
unreserve_fds_upper_half()
60+
(vii) getCkptImageByRank() // Sets ckpt image for upper half for this rank
61+
(viii) returns to ../dmtcp/src/mtcp/mtcp_restart.c:main()
62+
../dmtcp/src/mtcp/mtcp_restart.c:main() ->
63+
(i) Load ckpt image file found by 'mtcp_plugin_hook()'
64+
(ii) Control passes to program counter and stack from time of checkpoint
65+
(iii) The upper half then rebinds MPI wrappers, etc.
66+
67+
====
68+
DEBUGGING mana_restart:
69+
Note that the coordinator dumps a *.json file in the directory where the
70+
coordinator was launched, at the time of checkpoint (and during restart).
71+
The checkpoint version includes:
72+
libsStart, libsEnd, highMemStart, and the /proc/*/maps during checkpoint.
73+
This can be used to verify that [libsStart, libsEnd]+[highMemStart, STACK]
74+
truly covers all upper-half memory regions.
75+
This can also be checked in GDB by comparing /proc/self/maps inside
76+
../dmtcp/src/mtcp/mtcp_restart.c:main() just before it loads the
77+
checkpoint image file, with the /proc/self/maps when afterward executing
78+
the statement 'case DMTCP_EVENT_RESTART:' in the file
79+
../mpi-proxy-split/mpi_plugin.cpp.

0 commit comments

Comments
 (0)