gtsrelf: include LOCAL globals, apply atEnd, and add flag to throw on mismatch#512
gtsrelf: include LOCAL globals, apply atEnd, and add flag to throw on mismatch#512
LOCAL globals, apply atEnd, and add flag to throw on mismatch#512Conversation
|
With the new stricter tests, we're seeing the known nestedifglobal problem and some others. We're observing a mismatch in these cases:
These appear to be fixed by re-lifting? But @ l-kent says re-lifting didn't help. Maybe worth looking into it. In tests which use examples/cntlm-noduk/cntlm-noduk.gts (e.g. IntervalDSATest), we're also seeing the deliberate incompatibility where oldrelf includes external variables as a ExternalFunction, but gtsrelf does not. [INFO] [!] Using ELF data from relf: examples/cntlm-noduk/cntlm-noduk.relf
[WARN] PLEASE REPORT THIS ISSUE! (https://github.com/UQ-PAC/BASIL/issues/509) include the gts and relf files.
gtirb relf discrepancy, external functions differ:
gtirb - relf = HashSet()
relf - gtirb = HashSet(ExternalFunction(optind,4391784), ExternalFunction(stdout,4391792), ExternalFunction(stderr,4391760), ExternalFunction(stdin,4391808), ExternalFunction(optarg,4391776)) [checkReadELFCompatibility@GTIRBReadELF.scala:281]Do we want to precisely mimic the (probably wrong) oldrelf? Or, should we strike out and I'll modify the compat check? |
We probably want them somewhere, but should implement it correctly rather than just matching the previous implementation. If they are just mapped as variables into the this processes' address space then it might make sense to treat them like regular globals. |
|
Yeah, so the oldrelf line also has a global variables line and the SpecGlobal is also present in the gtsrelf. So maybe I'll remove them from ExternalFunctions. The only problem might be if two libraries are linked together, I think we lose the information that the underlying variables are the same. But this shouldn't be a problem as long as we look at whole ELFs. |
oldrelf will generate both SpecGlobal and ExternalFunction entries for external global variables (e.g., errno, optind), but gtsrelf does not. the SpecGlobal entry is more semantically correct and contains strictly more information, so we allow this incompatibility
This is likely compiler/OS etc. version differences, or potentially a ddisasm version difference given that it seems to be related to a warning about relocating the BSS section that I get from ddisasm
Including those external variables as ExternalFunctions is an instance where the existing ReadELFLoader is imprecise - external functions can appear in the .rela.dyn section but aren't readily distinguishable from external variables, so they're currently all included as ExternalFunctions. I think the way to improve this is by matching the names to the .dynsym table entries but having those extraneous ExternalFunctions hadn't previously caused any problems so no one had looked any further at it. |
|
The optind external variable in cntlm-noduk seems to just be a rarer case that we haven't given any particular thought to before - it's type R_AARCH64_COPY in .rela.dyn and appears in .dynsym and .symtab too, all with the same address, so the .symtab entry causes it to be included as a SpecGlobal and the .rela.dyn entry causes it to be (incorrectly) included as an ExternalFunction. |
|
To clarify, when you say "none of it is solved by re-lifting" in the other thread, are you using main branch or this #512 branch to do the gtsrelf/oldrelf comparison? In #509 (comment), I was using the #512 branch. Can you confirm which you used? Sorry for the confusion. |
|
That was with the main branch. On this branch, the only discrepancies for the malloc_memcpy_strlen_memset_free tests involve __bss_start, which re-lifting doesn't change. |
|
To be clear, this issue was not caused by mismatched .relf and .gts files, they were derived from the same binaries and it still happens with freshly compiled and lifted binaries. The only test case where that's an issue is nestedifglobal and I have included a fix for that in #510. The issue is likely related to the ddisasm warning I previously highlighted: |
|
I understand, I'll change the comment. Which ddisasm version do you have? If it's from nix, do you know which commit it's from ( |
|
Anyway, this PR can be reviewed now. It fixes some bugs which affected a lot of the system tests. We can fix these last few test cases in another PR. Also, be assured that the CI passes. I just didn't want to re-run it after the comment-only change. |
|
It's Here's one of the binaries where ddisasm produces that warning when lifting: |
|
I also see the ddisasm warning with your binary. But honestly, idk where to go from this. I'm using the Docker-based infrastructure in #288 and that produces a binary that lifts with no warnings and seems to work. I'm happy enough with that, and I think it would be really hard to dig into why your system's compiler has this problem. Anyway, do you want to review this PR? |
|
I think a reasonable solution would just be to ignore it if the offsets for symbols named Looking at the .relf file, it seems that ddisasm is just modifying the address of those symbols to match the address of the .bss section - I'm not sure why those would be different but they are in this case. |
|
I don't really want to special case certain symbol entries by name, especially if it only happens with certain test cases on certain compilers. I've lessened the error message so it doesn't say "PLEASE REPORT THIS ISSUE" for these particular test cases. Eventually, we should move to a repeatable lifting environment which doesn't have the mismatch |
|
I think it makes a lot of sense to have a special case where we ignore a known inconsistency in behaviour between two different tools (readelf and DDisasm) that doesn't seem to actually matter. The point of these checks is to just to make sure that the behaviour of BASIL is consistent in reading the symbol table etc. data from readelf vs. DDisasm + GTIRB, isn't it, to ensure that the new DDisasm + GTIRB symbol table pipeline is working correctly? In this case, we've determined that the inconsistency is due to DDisasm, not due to an error in BASIL. It's possible to be more precise about what DDisasm is doing - it is changing the address of the |
|
To tweak the compatibility check, we have two knobs: (1) the check itself, which applies to all inputs, and (2) the reporting level of particular test cases (exception, warning, or silent). We should choose which knob to turn based on whether a problem is systemic or isolated. At the moment, this issue has only been observed in that extraspec test, so I have adjusted the reporting level of that test case only. This way, if the same issue were to appear in other test cases, this would raise an error and we would notice it (which is good!). Yes, this does have the side-effect of silencing all mismatches in this test case, but I think this is an acceptable compromise. In the alternative, it would be impossible to tell if more binaries started having the bss mismatch. Also, in the current infrastructure, it is not possible to special case the checking method based on a particular test case. I'm also not interested in adding that level of customisation, because I think it will be tremendously hacky. I also think that this issue will go away if we use the Docker container for compiling, so I'm not interested in building an exception which will only be necessary for a short period of time. |
Why do we need to notice this? This isn't an issue with anything other than DDisasm and readelf handling certain binaries (where the .bss section doesn't exactly match the __bss_start symbol) differently. |
|
Guh, I yield. I'll log on and change it. |
UNFORTUNATELY, the bss_start correction can't be put in normaliseRelf alone, because i want to be able to detect when it happens. the mismatch occurs in the gtsrelf, but its only detectable by considering both the oldrelf and gtsrelf. if it was in normaliseRelf, the differences would be smoothed over and undetectable.
works towards #509