[DYNAREC] Save temporary registers on the stack before calling PrintT…#3876
Merged
Conversation
Owner
|
Are you sure it's usefull to save all 6 temp register here? That's a lot of opcodes added in this case! |
Collaborator
|
Nice find, but yeah, you should use get_free_scratch() here. |
950cfb9 to
117c0f3
Compare
…race
The PrintTrace function may modify temporary registers, so we need to push
them onto the stack before execution and restore them upon return.
For example, in the RV64 implementation, register `t3` stores the comparison
result. As its value may be overwritten by PrintTrace, the subsequent `jz`
instruction will use invalid data directly.
```
[BOX64] 0x3f0000239b: 48 85 C0 test rax, rax
[BOX64] 0x3ff7af34d4: 53 emitted opcodes, inst=2, barrier=0 state=3/1(0), set=3F/80, use=0, need=0/80, fuse=1/0, sm=0(0/0), sew@entry=7, sew@exit=7, pred=1
03f00e13 ADDI t3, zero, 0x3f(63)
45ccaa23 SW t3, emu_s9, 0x454(1108)
01087e33 AND t3, rax_a6, rax_a6
47ccb423 SD t3, emu_s9, 0x468(1128)
[BOX64] New Instruction x64:0x3f0000239e, native:0x3ff7af35a8
[BOX64] TRACE ----
01f80b37 LUI rip_s6, 0x1f80000(33030144)
001b0b1b ADDIW rip_s6, rip_s6, 0x1(1)
00db1b13 SLLI rip_s6, rip_s6, 0xd(13)
39eb0b13 ADDI rip_s6, rip_s6, 0x39e(926)
000b0313 MV t1, rip_s6
018cbc23 SD rbx_s8, emu_s9, 0x18(24)
029cb023 SD rsp_s1, emu_s9, 0x20(32)
028cb423 SD rbp_s0, emu_s9, 0x28(40)
05acb823 SD r10_s10, emu_s9, 0x50(80)
05bcbc23 SD r11_s11, emu_s9, 0x58(88)
072cb023 SD r12_s2, emu_s9, 0x60(96)
073cb423 SD r13_s3, emu_s9, 0x68(104)
074cb823 SD r14_s4, emu_s9, 0x70(112)
075cbc23 SD r15_s5, emu_s9, 0x78(120)
0010039b ADDIW t2, zero, 0x1(1)
ffffffb7 LUI t6, 0xfffff000(-4096)
7dff8f9b ADDIW t6, t6, 0x7df(2015)
01fbffb3 AND t6, flags_s7, t6
020bfb93 ANDI flags_s7, flags_s7, 0x20(32)
006b9b93 SLLI flags_s7, flags_s7, 0x6(6)
01fbebb3 OR flags_s7, flags_s7, t6
097cb023 SD flags_s7, emu_s9, 0x80(128)
ff010113 ADDI sp, sp, 0xfffffff0(-16)
00513023 SD t0, sp, 0x0(0)
02acbc23 SD rdi_a0, emu_s9, 0x38(56)
02bcb823 SD rsi_a1, emu_s9, 0x30(48)
00ccb823 SD rdx_a2, emu_s9, 0x10(16)
00dcb423 SD rcx_a3, emu_s9, 0x8(8)
04ecb023 SD r8_a4, emu_s9, 0x40(64)
04fcb423 SD r9_a5, emu_s9, 0x48(72)
010cb023 SD rax_a6, emu_s9, 0x0(0)
096cb423 SD rip_s6, emu_s9, 0x88(136)
[BOX64] Table64: 0x5d
00000f97 AUIPC t6, 0x0(0)
3a8fbf83 LD t6, t6, 0x3a8(936)
00030593 MV rsi_a1, t1
00038613 MV rdx_a2, t2
000c8513 MV rdi_a0, emu_s9
000f80e7 JALR ra, t6, 0x0(0)
00013283 LD t0, sp, 0x0(0)
01010113 ADDI sp, sp, 0x10(16)
038cb503 LD rdi_a0, emu_s9, 0x38(56)
030cb583 LD rsi_a1, emu_s9, 0x30(48)
010cb603 LD rdx_a2, emu_s9, 0x10(16)
008cb683 LD rcx_a3, emu_s9, 0x8(8)
040cb703 LD r8_a4, emu_s9, 0x40(64)
048cb783 LD r9_a5, emu_s9, 0x48(72)
000cb803 LD rax_a6, emu_s9, 0x0(0)
088cbb03 LD rip_s6, emu_s9, 0x88(136)
080cbb83 LD flags_s7, emu_s9, 0x80(128)
fdfbfb93 ANDI flags_s7, flags_s7, 0xffffffdf(-33)
006bdf93 SRLI t6, flags_s7, 0x6(6)
020fff93 ANDI t6, t6, 0x20(32)
01fbebb3 OR flags_s7, flags_s7, t6
[BOX64] ----------
[BOX64] 0x3f0000239e: 74 02 jz 0x0000003F000023A2
[BOX64] 0x3ff7af35a8: 55 emitted opcodes, inst=3, barrier=0 state=0/3(0), set=0/0, use=0, need=80/80, fuse=1/0, sm=0(0/0), sew@entry=7, sew@exit=7, pred=2, jmp=5
140e0463 BEQ t3, zero, 0x148(328) # +82i(0x3ff7af37c4)
00000013 NOP
```
117c0f3 to
10aae81
Compare
Owner
|
Thanks |
zengdage
added a commit
to zengdage/box64
that referenced
this pull request
May 21, 2026
Fix scratch register corruption caused by non-consecutive flags producer and consumer when BOX64_DYNAREC_TRACE is enabled, which introduced by ptitSeb#3876. For example, the `test rax, rax` flags producer stores its flag calculation operands in scratch register `t3`. The next `mov r14, rax` instruction does not use scratch registers, but its associated trace code can still overwrite t3's value. This means we need to reference the flags consumer that is `jz 0x0000003F0000ABC0` to identify which registers require saving. ``` [BOX64] 0x3f00009f68: 48 85 C0 test rax, rax [BOX64] 0x3ff7afaef0: 53 emitted opcodes, inst=14, barrier=0 state=3/1(0), set=3F/80, use=0, need=0/80, fuse=1/0, sm=0(0/0), sew@entry=7, sew@exit=7, pred=13 03f00e13 ADDI t3, zero, 0x3f(63) 45ccaa23 SW t3, emu_s9, 0x454(1108) 01087e33 AND t3, rax_a6, rax_a6 47ccb423 SD t3, emu_s9, 0x468(1128) [BOX64] New Instruction x64:0x3f00009f6b, native:0x3ff7afafc4 [BOX64] TRACE ---- [BOX64] n1:0 n2:0 ---- 01f80b37 LUI rip_s6, 0x1f80000(33030144) 005b0b1b ADDIW rip_s6, rip_s6, 0x5(5) 00db1b13 SLLI rip_s6, rip_s6, 0xd(13) f6bb0b13 ADDI rip_s6, rip_s6, 0xffffff6b(-149) ............................................................... ............................................................... ............................................................... 048cb783 LD r9_a5, emu_s9, 0x48(72) 000cb803 LD rax_a6, emu_s9, 0x0(0) 088cbb03 LD rip_s6, emu_s9, 0x88(136) 080cbb83 LD flags_s7, emu_s9, 0x80(128) fdfbfb93 ANDI flags_s7, flags_s7, 0xffffffdf(-33) 006bdf93 SRLI t6, flags_s7, 0x6(6) 020fff93 ANDI t6, t6, 0x20(32) 01fbebb3 OR flags_s7, flags_s7, t6 [BOX64] ---------- [BOX64] 0x3f00009f6b: 49 89 C6 mov r14, rax [BOX64] 0x3ff7afafc4: 54 emitted opcodes, inst=15, barrier=0 state=0/3(0), set=0/0, use=0, need=80/80, fuse=0/1, sm=0(0/0), sew@entry=7, sew@exit=7, pred=14 00080a13 MV r14_s4, rax_a6 [BOX64] New Instruction x64:0x3f00009f6e, native:0x3ff7afb09c [BOX64] TRACE ---- [BOX64] n1:28 n2:0 ---- ff010113 ADDI sp, sp, 0xfffffff0(-16) 01c13023 SD t3, sp, 0x0(0) 01f80b37 LUI rip_s6, 0x1f80000(33030144) 01fbebb3 OR flags_s7, flags_s7, t6 ............................................................... ............................................................... ............................................................... 00013e03 LD t3, sp, 0x0(0) 01010113 ADDI sp, sp, 0x10(16) [BOX64] ---------- [BOX64] 0x3f00009f6e: 0F 84 4C 0C 00 00 jz 0x0000003F0000ABC0 [BOX64] 0x3ff7afb09c: 67 emitted opcodes, inst=16, barrier=2 state=0/3(0), set=0/0, use=0, need=80/80, fuse=1/0, sm=0(0/0), sew@entry=7, sew@exit=7, pred=15, jmp=out 020e1463 BNE t3, zero, 0x28(40) # +10i(0x3ff7afb1a8) 00000013 NOP ```
zengdage
added a commit
to zengdage/box64
that referenced
this pull request
May 22, 2026
…ratch registers 1. Rename macro to SPILL_NF_REGISTERS, add implementation for LA64 and PPC64LE. 2. Modify nat flag register spill logic to now save all scratch registers. Fix scratch register corruption caused by non-consecutive flags producer and consumer when BOX64_DYNAREC_TRACE is enabled, which introduced by ptitSeb#3876. For example, the `test rax, rax` flags producer stores its flag calculation operands in scratch register `t3`. The next `mov r14, rax` instruction does not use scratch registers, but its associated trace code can still overwrite t3's value. This means we need to reference the flags consumer that is `jz 0x0000003F0000ABC0` to identify which registers require saving. But this is too complicated. So we went with the simpler approach of saving all scratch registers, this won't add noticeable performance overhead in trace mode. ``` [BOX64] 0x3f00009f68: 48 85 C0 test rax, rax [BOX64] 0x3ff7afaef0: 53 emitted opcodes, inst=14, barrier=0 state=3/1(0), set=3F/80, use=0, need=0/80, fuse=1/0, sm=0(0/0), sew@entry=7, sew@exit=7, pred=13 03f00e13 ADDI t3, zero, 0x3f(63) 45ccaa23 SW t3, emu_s9, 0x454(1108) 01087e33 AND t3, rax_a6, rax_a6 47ccb423 SD t3, emu_s9, 0x468(1128) [BOX64] New Instruction x64:0x3f00009f6b, native:0x3ff7afafc4 [BOX64] TRACE ---- [BOX64] n1:0 n2:0 ---- 01f80b37 LUI rip_s6, 0x1f80000(33030144) 005b0b1b ADDIW rip_s6, rip_s6, 0x5(5) 00db1b13 SLLI rip_s6, rip_s6, 0xd(13) f6bb0b13 ADDI rip_s6, rip_s6, 0xffffff6b(-149) ............................................................... ............................................................... ............................................................... 048cb783 LD r9_a5, emu_s9, 0x48(72) 000cb803 LD rax_a6, emu_s9, 0x0(0) 088cbb03 LD rip_s6, emu_s9, 0x88(136) 080cbb83 LD flags_s7, emu_s9, 0x80(128) fdfbfb93 ANDI flags_s7, flags_s7, 0xffffffdf(-33) 006bdf93 SRLI t6, flags_s7, 0x6(6) 020fff93 ANDI t6, t6, 0x20(32) 01fbebb3 OR flags_s7, flags_s7, t6 [BOX64] ---------- [BOX64] 0x3f00009f6b: 49 89 C6 mov r14, rax [BOX64] 0x3ff7afafc4: 54 emitted opcodes, inst=15, barrier=0 state=0/3(0), set=0/0, use=0, need=80/80, fuse=0/1, sm=0(0/0), sew@entry=7, sew@exit=7, pred=14 00080a13 MV r14_s4, rax_a6 [BOX64] New Instruction x64:0x3f00009f6e, native:0x3ff7afb09c [BOX64] TRACE ---- [BOX64] n1:28 n2:0 ---- ff010113 ADDI sp, sp, 0xfffffff0(-16) 01c13023 SD t3, sp, 0x0(0) 01f80b37 LUI rip_s6, 0x1f80000(33030144) 01fbebb3 OR flags_s7, flags_s7, t6 ............................................................... ............................................................... ............................................................... 00013e03 LD t3, sp, 0x0(0) 01010113 ADDI sp, sp, 0x10(16) [BOX64] ---------- [BOX64] 0x3f00009f6e: 0F 84 4C 0C 00 00 jz 0x0000003F0000ABC0 [BOX64] 0x3ff7afb09c: 67 emitted opcodes, inst=16, barrier=2 state=0/3(0), set=0/0, use=0, need=80/80, fuse=1/0, sm=0(0/0), sew@entry=7, sew@exit=7, pred=15, jmp=out 020e1463 BNE t3, zero, 0x28(40) # +10i(0x3ff7afb1a8) 00000013 NOP ```
zengdage
added a commit
to zengdage/box64
that referenced
this pull request
May 22, 2026
…ratch registers 1. Rename macro to SPILL_NF_REGISTERS, add implementation for LA64 and PPC64LE. 2. Modify nat flag register spill logic to now save all scratch registers. Fix scratch register corruption caused by non-consecutive flags producer and consumer when BOX64_DYNAREC_TRACE is enabled, which introduced by ptitSeb#3876. For example, the `test rax, rax` flags producer stores its flag calculation operands in scratch register `t3`. The next `mov r14, rax` instruction does not use scratch registers, but its associated trace code can still overwrite t3's value. This means we need to reference the flags consumer that is `jz 0x0000003F0000ABC0` to identify which registers require saving. But this is too complicated. So we went with the simpler approach of saving all scratch registers, this won't add noticeable performance overhead in trace mode. ``` [BOX64] 0x3f00009f68: 48 85 C0 test rax, rax [BOX64] 0x3ff7afaef0: 53 emitted opcodes, inst=14, barrier=0 state=3/1(0), set=3F/80, use=0, need=0/80, fuse=1/0, sm=0(0/0), sew@entry=7, sew@exit=7, pred=13 03f00e13 ADDI t3, zero, 0x3f(63) 45ccaa23 SW t3, emu_s9, 0x454(1108) 01087e33 AND t3, rax_a6, rax_a6 47ccb423 SD t3, emu_s9, 0x468(1128) [BOX64] New Instruction x64:0x3f00009f6b, native:0x3ff7afafc4 [BOX64] TRACE ---- [BOX64] n1:0 n2:0 ---- 01f80b37 LUI rip_s6, 0x1f80000(33030144) 005b0b1b ADDIW rip_s6, rip_s6, 0x5(5) 00db1b13 SLLI rip_s6, rip_s6, 0xd(13) f6bb0b13 ADDI rip_s6, rip_s6, 0xffffff6b(-149) ............................................................... ............................................................... ............................................................... 048cb783 LD r9_a5, emu_s9, 0x48(72) 000cb803 LD rax_a6, emu_s9, 0x0(0) 088cbb03 LD rip_s6, emu_s9, 0x88(136) 080cbb83 LD flags_s7, emu_s9, 0x80(128) fdfbfb93 ANDI flags_s7, flags_s7, 0xffffffdf(-33) 006bdf93 SRLI t6, flags_s7, 0x6(6) 020fff93 ANDI t6, t6, 0x20(32) 01fbebb3 OR flags_s7, flags_s7, t6 [BOX64] ---------- [BOX64] 0x3f00009f6b: 49 89 C6 mov r14, rax [BOX64] 0x3ff7afafc4: 54 emitted opcodes, inst=15, barrier=0 state=0/3(0), set=0/0, use=0, need=80/80, fuse=0/1, sm=0(0/0), sew@entry=7, sew@exit=7, pred=14 00080a13 MV r14_s4, rax_a6 [BOX64] New Instruction x64:0x3f00009f6e, native:0x3ff7afb09c [BOX64] TRACE ---- [BOX64] n1:28 n2:0 ---- ff010113 ADDI sp, sp, 0xfffffff0(-16) 01c13023 SD t3, sp, 0x0(0) 01f80b37 LUI rip_s6, 0x1f80000(33030144) 01fbebb3 OR flags_s7, flags_s7, t6 ............................................................... ............................................................... ............................................................... 00013e03 LD t3, sp, 0x0(0) 01010113 ADDI sp, sp, 0x10(16) [BOX64] ---------- [BOX64] 0x3f00009f6e: 0F 84 4C 0C 00 00 jz 0x0000003F0000ABC0 [BOX64] 0x3ff7afb09c: 67 emitted opcodes, inst=16, barrier=2 state=0/3(0), set=0/0, use=0, need=80/80, fuse=1/0, sm=0(0/0), sew@entry=7, sew@exit=7, pred=15, jmp=out 020e1463 BNE t3, zero, 0x28(40) # +10i(0x3ff7afb1a8) 00000013 NOP ```
ptitSeb
pushed a commit
that referenced
this pull request
May 22, 2026
…ratch registers (#3880) 1. Rename macro to SPILL_NF_REGISTERS, add implementation for LA64 and PPC64LE. 2. Modify nat flag register spill logic to now save all scratch registers. Fix scratch register corruption caused by non-consecutive flags producer and consumer when BOX64_DYNAREC_TRACE is enabled, which introduced by #3876. For example, the `test rax, rax` flags producer stores its flag calculation operands in scratch register `t3`. The next `mov r14, rax` instruction does not use scratch registers, but its associated trace code can still overwrite t3's value. This means we need to reference the flags consumer that is `jz 0x0000003F0000ABC0` to identify which registers require saving. But this is too complicated. So we went with the simpler approach of saving all scratch registers, this won't add noticeable performance overhead in trace mode. ``` [BOX64] 0x3f00009f68: 48 85 C0 test rax, rax [BOX64] 0x3ff7afaef0: 53 emitted opcodes, inst=14, barrier=0 state=3/1(0), set=3F/80, use=0, need=0/80, fuse=1/0, sm=0(0/0), sew@entry=7, sew@exit=7, pred=13 03f00e13 ADDI t3, zero, 0x3f(63) 45ccaa23 SW t3, emu_s9, 0x454(1108) 01087e33 AND t3, rax_a6, rax_a6 47ccb423 SD t3, emu_s9, 0x468(1128) [BOX64] New Instruction x64:0x3f00009f6b, native:0x3ff7afafc4 [BOX64] TRACE ---- [BOX64] n1:0 n2:0 ---- 01f80b37 LUI rip_s6, 0x1f80000(33030144) 005b0b1b ADDIW rip_s6, rip_s6, 0x5(5) 00db1b13 SLLI rip_s6, rip_s6, 0xd(13) f6bb0b13 ADDI rip_s6, rip_s6, 0xffffff6b(-149) ............................................................... ............................................................... ............................................................... 048cb783 LD r9_a5, emu_s9, 0x48(72) 000cb803 LD rax_a6, emu_s9, 0x0(0) 088cbb03 LD rip_s6, emu_s9, 0x88(136) 080cbb83 LD flags_s7, emu_s9, 0x80(128) fdfbfb93 ANDI flags_s7, flags_s7, 0xffffffdf(-33) 006bdf93 SRLI t6, flags_s7, 0x6(6) 020fff93 ANDI t6, t6, 0x20(32) 01fbebb3 OR flags_s7, flags_s7, t6 [BOX64] ---------- [BOX64] 0x3f00009f6b: 49 89 C6 mov r14, rax [BOX64] 0x3ff7afafc4: 54 emitted opcodes, inst=15, barrier=0 state=0/3(0), set=0/0, use=0, need=80/80, fuse=0/1, sm=0(0/0), sew@entry=7, sew@exit=7, pred=14 00080a13 MV r14_s4, rax_a6 [BOX64] New Instruction x64:0x3f00009f6e, native:0x3ff7afb09c [BOX64] TRACE ---- [BOX64] n1:28 n2:0 ---- ff010113 ADDI sp, sp, 0xfffffff0(-16) 01c13023 SD t3, sp, 0x0(0) 01f80b37 LUI rip_s6, 0x1f80000(33030144) 01fbebb3 OR flags_s7, flags_s7, t6 ............................................................... ............................................................... ............................................................... 00013e03 LD t3, sp, 0x0(0) 01010113 ADDI sp, sp, 0x10(16) [BOX64] ---------- [BOX64] 0x3f00009f6e: 0F 84 4C 0C 00 00 jz 0x0000003F0000ABC0 [BOX64] 0x3ff7afb09c: 67 emitted opcodes, inst=16, barrier=2 state=0/3(0), set=0/0, use=0, need=80/80, fuse=1/0, sm=0(0/0), sew@entry=7, sew@exit=7, pred=15, jmp=out 020e1463 BNE t3, zero, 0x28(40) # +10i(0x3ff7afb1a8) 00000013 NOP ```
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…race
The PrintTrace function may modify temporary registers, so we need to push them onto the stack before execution and restore them upon return.
For example, in the RV64 implementation, register
t3stores the comparison result. As its value may be overwritten by PrintTrace, the subsequentjzinstruction will use invalid data directly.