masterThesis#1
Open
vmihalko wants to merge 198 commits into
Open
Conversation
void foo(int a) {
int a; // remove this line
}
1. generate random test.c file [csmith] 2. compile test.c to binary [clang] 3. modify generated test.c file [fix-csmi.sh] - Here we replace csmith.h header with necessery stuff to avoid header expansion while compiling to llvm in next step 4. compile to LLVMIR test.ll [clang] 5. run llvm2c and generate decompiled.c [llvm2c] 6. modify decompiled.c file [decom-fix-csmi.sh] - Here we add csmith.h include and fix types for functions from csmith.h 7. compile decompiled.c to binary [clang] 8. run compiled binaries and compare their outputs If something goes wrong at any point then all generated files are copied to tmp (for debug purpose). If we caught exception after running the compiled test.c then we continue to the next step.
Parse basic types are: - int - char - short - long - float - double - long double and correctly recognize whether a type is signed or unsigned.
from metadataTypeInfo
E.g. `int *` or `unsigned int*`
Which prints strings to the llvm:errs().
From https://lists.llvm.org/pipermail/cfe-dev/2013-January/027302.html: When a function has a struct parameter or return type, Clang may lower a struct parameter into... - a "byval" pointer (for a struct with several different members) - a vector (for a struct with a few float members) - two doubles (for a struct with two double members) - an i64 (for a struct with two i32 members) ... and possibly more variations. But there is no information in the metadata about types created in this way. Therefore, we detect the use of the struct type as an argument or return value of a function and do not reconstruct these types from the metadata.
Enable all (loop) passes from https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/IPO/PassManagerBuilder.cpp#L353-#L375: ```c if (EnableSimpleLoopUnswitch) { // The simple loop unswitch pass relies on separate cleanup passes. Schedule // them first so when we re-process a loop they run before other loop // passes. MPM.add(createLoopInstSimplifyPass()); MPM.add(createLoopSimplifyCFGPass()); } // Rotate Loop - disable header duplication at -Oz MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1)); MPM.add(createLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap)); if (EnableSimpleLoopUnswitch) MPM.add(createSimpleLoopUnswitchLegacyPass()); else MPM.add(createLoopUnswitchPass(SizeLevel || OptLevel < 3, DivergentTarget)); // FIXME: We break the loop pass pipeline here in order to do full // simplify-cfg. Eventually loop-simplifycfg should be enhanced to replace the // need for this. MPM.add(createCFGSimplificationPass()); addInstructionCombiningPass(MPM); // We resume loop passes creating a second loop pipeline here. MPM.add(createIndVarSimplifyPass()); // Canonicalize indvars MPM.add(createLoopIdiomPass()); // Recognize idioms like memset. ``` Test: ```bash clang -S -emit-llvm -Xclang -disable-O0-optnone simple-for-loop-second-latch.c -o simple-for-loop-second-latch-noopt.ll optpassPasses simple-for-loop-second-latch-noopt --loop-simplify --simplifycfg --loop-rotate --lcssa --licm --loop-unswitch --simplifycfg --instcombine --indvars old_llvm2c simple-for-loop-second-latch-noopt-opt == new_llvm2c simple-for-loop-second-latch-noopt ```
1. map LOOP with BRANCH instruction (condition) 2. transform BRANCH inst. to if's or doWhile constructs
First:
```c
goto head;
head:
...
do
goto head;
while( C );
```
transfrom into
```c
do
head
while( C );
Second:
Cache result from loopInfoAnalysis in particular function
- Implement InsertValue handling in parser/createExpressions.cpp:
- Materialize a temporary aggregate of the result type.
- If source aggregate is not 'undef', copy-initialize temp from source.
- Walk struct/array index path, build lvalue for the target field, and assign the inserted value.
- Register the temporary aggregate as the result expression.
- Wire InsertValue into expression creation:
- Handle llvm::Instruction::InsertValue in the top-level creation loop (needs Func/Block context).
- In parseLLVMInstruction(), return an already-created expression for InsertValue.
- Improve unsupported-instruction diagnostics:
- Print the exact unsupported instruction to stderr before asserting.
Tested on Rust-generated IR pattern:
%i8 = insertvalue { i64, i64 } undef, i64 1, 0
- Introduce BoolType class with _Bool representation - Use BoolType instead of unsigned int for LLVM i1 types - Parse boolean types from debug metadata (DW_ATE_boolean, 1-bit types) - Implement toString() for BoolType to support const/static qualifiers
- Add isConst field to GlobalValue class - Parse const qualifier from debug info (isTopLevelConst helper) - Emit const qualifier in C output for global variables - Fix extern qualifier logic (only emit if no initializer)
- Add 'f' suffix to float constants to preserve float type - Prevents implicit promotion to double in C - Ensures float constants match LLVM IR float semantics
- Collapse identical cast chains: (T)(T)(X) => (T)(X) - Drop redundant integer-of-bool casts - Drop redundant bool casts of boolean expressions - Simplify equality comparisons with symmetric integer casts - Preserve correct semantics while reducing cast noise
- Implement parseLandingPadInstruction: creates temporary for result - Implement parseResumeInstruction: translates to abort() call - Conservative implementation for panic=abort compilation mode - Adds isnan helper documentation
- Wrap float-typed binary operation results in (float) cast - Prevents implicit double promotion in C - Ensures binary operations match LLVM IR float semantics - Critical for floating point comparison correctness
- Translate llvm.fmuladd.* to (a * b + c) expression - Apply explicit (float) cast for float results - Properly handle both used and unused result cases - Enables FMA-optimized code compilation
- Add normalizeIndexForPointerShift lambda for robust indexing - Detect (0 - X) pattern and transform to -(X) correctly - Cast to long before negation to avoid invalid pointer negation - Unwrap nested casts to handle ptrtoint patterns - Cast all pointer indices to signed long (ptrdiff_t semantics) - Fixes invalid C code generation for negative pointer offsets
- Make ordered comparison simplification more conservative: only simplify when both source types are the same unsigned integer type - Remove hardcoded struct member names in intrinsic definitions: dynamically find struct type from program expressions - Search for CallExpr in both direct expressions and AssignExpr->right to find intrinsic return types - Fixes aws_add_size_saturating_harness benchmark
…tion for intrinsics
…popcountl Replace llvm_ctpop_i64 with __builtin_popcountl for 64-bit and 128-bit types, and __builtin_popcount for smaller types. This fixes AWS benchmarks that use popcount operations.
…equals_fn When a PointerType points to a FunctionPointerType, don't add an extra * in toString() since function pointer types already represent pointers. This fixes struct members like 'typeDef_7* destroy_key_fn' to be correctly generated as 'typeDef_7 destroy_key_fn'.
…with.overflow The previous check (sum < a || sum < b) incorrectly flags overflow when one operand is 0. The correct check is (a != 0 && (sum / a) != b), which properly handles all cases including when b == 0 (no overflow).
When normalizing pointer indices for PointerShift, check if the index is unsigned and use unsigned long long instead of signed long. This preserves the correct semantics for size_t and other unsigned index types.
…struct assignment - Enabled the memcpyToAssignment pass in the pass pipeline - Improved the pass to handle both RefExpr and direct Value expressions - Fixed the pass to work with or without bitcast operations - This fixes aws_hash_table_move_harness by converting memcpy to proper struct assignment
- Added unwrapExpr lambda to unwrap CastExpr expressions - This allows the pass to handle memcpy calls with casted operands
…ssions correctly - Handle RefExpr wrapping AggregateElement (struct members) correctly - Don't dereference struct members when converting memcpy - Fixes compilation errors with struct member assignments
- Check if inner expression is a pointer type to determine correct dereferencing - Handle AggregateElement expressions correctly (don't dereference struct members) - This fixes the main aws_hash_table_move memcpy conversion
…ointer cases - Check the type of &var to determine if it's pointer-to-pointer or pointer-to-struct - Correctly generate *ptr = struct for pointer-to-pointer cases - Use original values (dstVal, srcVal) to get expressions for better structure preservation
…efinitions - Fix pointer-to-pointer handling: for memcpy(&ptr, &other_ptr, size), generate ptr = other_ptr (no dereference) - Improve intrinsic definition search to find struct types from ExtractValueExpr and RetExpr - Fix struct type selection to prefer anonymous structs with _Bool overflow flag
…ther_ptr - For memcpy(&ptr, &other_ptr, size) where both are pointers, generate *ptr = *other_ptr (dereference both) - Fix source handling to dereference pointer values when needed - This fixes aws_hash_table_move_harness to produce VERIFICATION SUCCESSFUL
- For memcpy(&struct_var, &(*ptr), size), generate struct_var = (*ptr) not (*struct_var) = (*ptr) - When destination is &struct_var where struct_var is a struct (not pointer), use struct_var directly - This fixes aws_hash_table_swap compilation error with --no-slice flag
…ion pointers When decompiling LLVM IR to C, functions declared in K&R style (no parameter list) that are used as function pointers with parameters were generating incorrect C signatures. This caused verification failures in CBMC when the decompiled code had different function signatures than expected. Problem: - Original C: void *allocator() [K&R style, compatible with void*(*)(void*)] - LLVM IR: define i8* @Allocator() [type: i8* ()] - Used as: bitcast (i8* ()* @Allocator to i8* (i8*)*) - llvm2c generated: void* allocator(void); [wrong - no parameters] - CBMC expected: void* allocator(void*); [based on function pointer usage] Solution: 1. Check debug info (DISubprogram) first - most authoritative source 2. Fall back to scanning function pointer usage (bitcasts) to find target types 3. Adjust function declarations to match the maximum parameter count found The fix scans all bitcast operations where functions are cast to function pointer types, extracts the target function pointer signature, and adjusts the function declaration accordingly. This works for any function, any number of parameters, and all usage sites without hardcoding. Fixes verification failures where symbiotic->cbmc produced different results than cbmc alone due to function pointer signature mismatches.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.