[DYNAREC] Modify natflags register save logic to now save all scratch registers#3880
Conversation
|
@ptitSeb @ksco Sorry, the scratch register usage detection method I shared last night has a flaw: for non-consecutive flags producer and consumer, the trace code for all intermediate x86-64 instructions between them must also save scratch registers to avoid data corruption. My fix traverses forward to find the target flags consumer, then fetches the exact scratch registers that need to be saved. |
|
This is getting complicated, and I don't like where it's going. On second thought, I think we should simply disable nativeflags if trace is effective. It does not make much sense to have trace and nativeflags both enabled anyway, right? |
|
Ok, disabling NativeFlags if Dynarec Trace is enabled looks fair enough to me. |
13b9da5 to
891ddba
Compare
You will have the wrong flags state if nativeflags is enabled; that's also why we disabled nativeflags in cosim. |
|
Can we just simplify this by saving all scratch registers? Trace is for debugging, performance isn't a priority. Adding ~10 extra instructions won't make any difference. |
And why do you need to enable nativeflags for trace? |
Disabling nativeflags is also ok for me. I just think that directly preserving all scratch registers is a very straightforward fix for this issue, and it will not cause any noticeable impact in trace mode. |
|
Well, disable NativeFlags or Save all the registers, I don't mind any of those, as they both have their pros and cons in my point of view. |
But what are the cons of disabling nativeflags in trace mode? Trace mode is very similar to cosim, but it is for human manual analysis, so it makes sense to disable it IMO. |
The cons are, mostly, disabling native flags prevent you from using cosim & trace to debug native flags issues. you will use a different code path in trace than without trace. |
|
Well, you can't use cosim & trace to debug nativeflags issues, nativeflags is meant to skip those FLAGS computations... |
The x64 opcode flow is still respected. Trace don't give access to native stuffs, just the x64 part. The flow on jump or conditionnal moved from a trace can be used to debug issue with native flags: you can spot an incoherent branch taken for example. Again, I'm not against any of the change. If you prefer to disable native flags for trace, I'm fine with that. |
Bah, me neither. @zengdage If you decide to do the scratch registers spill/restore, please refer to this closed PR for suggested macro names and add changes for other backends too: https://github.com/ptitSeb/box64/pull/3879/changes |
891ddba to
5ca8514
Compare
Done. Modify nat flag register spill logic to now save all scratch registers and add implementation for LA64 and PPC64LE too. |
ksco
left a comment
There was a problem hiding this comment.
LGTM, but please update your PR and commit title/desc to reflect the changes.
| || ((ip >= trace_start) && (ip < trace_end))) { | ||
| MESSAGE(LOG_DUMP, "TRACE ----\n"); | ||
| if (BOX64ENV(dynarec_nativeflags)) SAVE_ACTIVE_SCRATCH_REGISTERS; | ||
| #if defined (RV64) || defined(LA64) || defined(PPC64LE) |
There was a problem hiding this comment.
| #if defined (RV64) || defined(LA64) || defined(PPC64LE) | |
| #if defined (SPILL_NF_REGISTERS) |
| GO_TRACE(PrintTrace, 1, x5); | ||
| fpu_unreflectcache(dyn, ninst, x1, x2, x3); | ||
| if (BOX64ENV(dynarec_nativeflags)) LOAD_ACTIVE_SCRATCH_REGISTERS; | ||
| #if defined (RV64) || defined(LA64) || defined(PPC64LE) |
There was a problem hiding this comment.
| #if defined (RV64) || defined(LA64) || defined(PPC64LE) | |
| #if defined (RESTORE_NF_REGISTERS) |
|
I suggested some change for simplicity |
…ratch registers 1. Rename macro to SPILL_NF_REGISTERS, add implementation for LA64 and PPC64LE. 2. Modify nat flag register spill logic to now save all scratch registers. Fix scratch register corruption caused by non-consecutive flags producer and consumer when BOX64_DYNAREC_TRACE is enabled, which introduced by ptitSeb#3876. For example, the `test rax, rax` flags producer stores its flag calculation operands in scratch register `t3`. The next `mov r14, rax` instruction does not use scratch registers, but its associated trace code can still overwrite t3's value. This means we need to reference the flags consumer that is `jz 0x0000003F0000ABC0` to identify which registers require saving. But this is too complicated. So we went with the simpler approach of saving all scratch registers, this won't add noticeable performance overhead in trace mode. ``` [BOX64] 0x3f00009f68: 48 85 C0 test rax, rax [BOX64] 0x3ff7afaef0: 53 emitted opcodes, inst=14, barrier=0 state=3/1(0), set=3F/80, use=0, need=0/80, fuse=1/0, sm=0(0/0), sew@entry=7, sew@exit=7, pred=13 03f00e13 ADDI t3, zero, 0x3f(63) 45ccaa23 SW t3, emu_s9, 0x454(1108) 01087e33 AND t3, rax_a6, rax_a6 47ccb423 SD t3, emu_s9, 0x468(1128) [BOX64] New Instruction x64:0x3f00009f6b, native:0x3ff7afafc4 [BOX64] TRACE ---- [BOX64] n1:0 n2:0 ---- 01f80b37 LUI rip_s6, 0x1f80000(33030144) 005b0b1b ADDIW rip_s6, rip_s6, 0x5(5) 00db1b13 SLLI rip_s6, rip_s6, 0xd(13) f6bb0b13 ADDI rip_s6, rip_s6, 0xffffff6b(-149) ............................................................... ............................................................... ............................................................... 048cb783 LD r9_a5, emu_s9, 0x48(72) 000cb803 LD rax_a6, emu_s9, 0x0(0) 088cbb03 LD rip_s6, emu_s9, 0x88(136) 080cbb83 LD flags_s7, emu_s9, 0x80(128) fdfbfb93 ANDI flags_s7, flags_s7, 0xffffffdf(-33) 006bdf93 SRLI t6, flags_s7, 0x6(6) 020fff93 ANDI t6, t6, 0x20(32) 01fbebb3 OR flags_s7, flags_s7, t6 [BOX64] ---------- [BOX64] 0x3f00009f6b: 49 89 C6 mov r14, rax [BOX64] 0x3ff7afafc4: 54 emitted opcodes, inst=15, barrier=0 state=0/3(0), set=0/0, use=0, need=80/80, fuse=0/1, sm=0(0/0), sew@entry=7, sew@exit=7, pred=14 00080a13 MV r14_s4, rax_a6 [BOX64] New Instruction x64:0x3f00009f6e, native:0x3ff7afb09c [BOX64] TRACE ---- [BOX64] n1:28 n2:0 ---- ff010113 ADDI sp, sp, 0xfffffff0(-16) 01c13023 SD t3, sp, 0x0(0) 01f80b37 LUI rip_s6, 0x1f80000(33030144) 01fbebb3 OR flags_s7, flags_s7, t6 ............................................................... ............................................................... ............................................................... 00013e03 LD t3, sp, 0x0(0) 01010113 ADDI sp, sp, 0x10(16) [BOX64] ---------- [BOX64] 0x3f00009f6e: 0F 84 4C 0C 00 00 jz 0x0000003F0000ABC0 [BOX64] 0x3ff7afb09c: 67 emitted opcodes, inst=16, barrier=2 state=0/3(0), set=0/0, use=0, need=80/80, fuse=1/0, sm=0(0/0), sew@entry=7, sew@exit=7, pred=15, jmp=out 020e1463 BNE t3, zero, 0x28(40) # +10i(0x3ff7afb1a8) 00000013 NOP ```
5ca8514 to
319b3ce
Compare
Fix scratch register corruption caused by non-consecutive flags producer and consumer when BOX64_DYNAREC_TRACE is enabled, which introduced by #3876.
For example, the
test rax, raxflags producer stores its flag calculation operands in scratch registert3. The nextmov r14, raxinstruction does not use scratch registers, but its associated trace code can still overwrite t3's value. This means we need to reference the flags consumer that isjz 0x0000003F0000ABC0to identify which registers require saving.