Skip to content

masterThesis#1

Open
vmihalko wants to merge 198 commits into
masterfrom
vmihalko-devel
Open

masterThesis#1
vmihalko wants to merge 198 commits into
masterfrom
vmihalko-devel

Conversation

@vmihalko

Copy link
Copy Markdown
Owner

No description provided.

vmihalko and others added 30 commits April 17, 2023 11:29
void foo(int a) {
	int a; // remove this line
}
1. generate random test.c file [csmith]
2. compile test.c to binary [clang]
3. modify generated test.c file [fix-csmi.sh]
	- Here we replace csmith.h header with
	  necessery stuff to avoid header expansion
	  while compiling to llvm in next step
4. compile to LLVMIR test.ll [clang]
5. run llvm2c and generate decompiled.c [llvm2c]
6. modify decompiled.c file [decom-fix-csmi.sh]
	- Here we add csmith.h include and fix
	  types for functions from csmith.h
7. compile decompiled.c to binary [clang]
8. run compiled binaries and compare their outputs

If something goes wrong at any point then  all generated files
are copied to tmp (for debug purpose).

If we caught exception after running the compiled test.c then
we continue to the next step.
Parse basic types are:
- int
- char
- short
- long

- float
- double
- long double

and correctly recognize whether a type is signed or unsigned.
Which prints strings to the llvm:errs().
From https://lists.llvm.org/pipermail/cfe-dev/2013-January/027302.html:
When a function has a struct parameter or return type,
Clang may lower a struct parameter into...
	- a "byval" pointer (for a struct with several different members)
	- a vector (for a struct with a few float members)
	- two doubles (for a struct with two double members)
	- an i64 (for a struct with two i32 members)
... and possibly more variations.

But there is no information in the metadata about types created in this way.
Therefore, we detect the use of the struct type as an argument or return value
of a function and do not reconstruct these types from the metadata.
Enable all (loop) passes from https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/IPO/PassManagerBuilder.cpp#L353-#L375:
```c
  if (EnableSimpleLoopUnswitch) {
    // The simple loop unswitch pass relies on separate cleanup passes. Schedule
    // them first so when we re-process a loop they run before other loop
    // passes.
    MPM.add(createLoopInstSimplifyPass());
    MPM.add(createLoopSimplifyCFGPass());
  }
  // Rotate Loop - disable header duplication at -Oz
  MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1));
  MPM.add(createLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap));
  if (EnableSimpleLoopUnswitch)
    MPM.add(createSimpleLoopUnswitchLegacyPass());
  else
    MPM.add(createLoopUnswitchPass(SizeLevel || OptLevel < 3, DivergentTarget));
  // FIXME: We break the loop pass pipeline here in order to do full
  // simplify-cfg. Eventually loop-simplifycfg should be enhanced to replace the
  // need for this.
  MPM.add(createCFGSimplificationPass());
  addInstructionCombiningPass(MPM);
  // We resume loop passes creating a second loop pipeline here.
  MPM.add(createIndVarSimplifyPass());        // Canonicalize indvars
  MPM.add(createLoopIdiomPass());             // Recognize idioms like memset.
```
Test:
```bash
clang -S -emit-llvm -Xclang -disable-O0-optnone simple-for-loop-second-latch.c -o simple-for-loop-second-latch-noopt.ll
optpassPasses simple-for-loop-second-latch-noopt --loop-simplify --simplifycfg --loop-rotate --lcssa --licm --loop-unswitch --simplifycfg --instcombine --indvars
old_llvm2c simple-for-loop-second-latch-noopt-opt == new_llvm2c simple-for-loop-second-latch-noopt
```
1. map LOOP with BRANCH instruction (condition)
2. transform BRANCH inst. to if's or doWhile constructs
First:
```c
goto head;
head:
...
do
    goto head;
while( C );
```

transfrom into
```c
do
    head
while( C );

Second:
Cache result from loopInfoAnalysis in particular function
- Implement InsertValue handling in parser/createExpressions.cpp:
  - Materialize a temporary aggregate of the result type.
  - If source aggregate is not 'undef', copy-initialize temp from source.
  - Walk struct/array index path, build lvalue for the target field, and assign the inserted value.
  - Register the temporary aggregate as the result expression.
- Wire InsertValue into expression creation:
  - Handle llvm::Instruction::InsertValue in the top-level creation loop (needs Func/Block context).
  - In parseLLVMInstruction(), return an already-created expression for InsertValue.
- Improve unsupported-instruction diagnostics:
  - Print the exact unsupported instruction to stderr before asserting.

Tested on Rust-generated IR pattern:
  %i8 = insertvalue { i64, i64 } undef, i64 1, 0
- Introduce BoolType class with _Bool representation
- Use BoolType instead of unsigned int for LLVM i1 types
- Parse boolean types from debug metadata (DW_ATE_boolean, 1-bit types)
- Implement toString() for BoolType to support const/static qualifiers
- Add isConst field to GlobalValue class
- Parse const qualifier from debug info (isTopLevelConst helper)
- Emit const qualifier in C output for global variables
- Fix extern qualifier logic (only emit if no initializer)
- Add 'f' suffix to float constants to preserve float type
- Prevents implicit promotion to double in C
- Ensures float constants match LLVM IR float semantics
- Collapse identical cast chains: (T)(T)(X) => (T)(X)
- Drop redundant integer-of-bool casts
- Drop redundant bool casts of boolean expressions
- Simplify equality comparisons with symmetric integer casts
- Preserve correct semantics while reducing cast noise
- Implement parseLandingPadInstruction: creates temporary for result
- Implement parseResumeInstruction: translates to abort() call
- Conservative implementation for panic=abort compilation mode
- Adds isnan helper documentation
- Wrap float-typed binary operation results in (float) cast
- Prevents implicit double promotion in C
- Ensures binary operations match LLVM IR float semantics
- Critical for floating point comparison correctness
- Translate llvm.fmuladd.* to (a * b + c) expression
- Apply explicit (float) cast for float results
- Properly handle both used and unused result cases
- Enables FMA-optimized code compilation
- Add normalizeIndexForPointerShift lambda for robust indexing
- Detect (0 - X) pattern and transform to -(X) correctly
- Cast to long before negation to avoid invalid pointer negation
- Unwrap nested casts to handle ptrtoint patterns
- Cast all pointer indices to signed long (ptrdiff_t semantics)
- Fixes invalid C code generation for negative pointer offsets
- Make ordered comparison simplification more conservative: only simplify when both source types are the same unsigned integer type
- Remove hardcoded struct member names in intrinsic definitions: dynamically find struct type from program expressions
- Search for CallExpr in both direct expressions and AssignExpr->right to find intrinsic return types
- Fixes aws_add_size_saturating_harness benchmark
…popcountl

Replace llvm_ctpop_i64 with __builtin_popcountl for 64-bit and 128-bit types,
and __builtin_popcount for smaller types. This fixes AWS benchmarks that
use popcount operations.
…equals_fn

When a PointerType points to a FunctionPointerType, don't add an extra *
in toString() since function pointer types already represent pointers.
This fixes struct members like 'typeDef_7* destroy_key_fn' to be correctly
generated as 'typeDef_7 destroy_key_fn'.
…with.overflow

The previous check (sum < a || sum < b) incorrectly flags overflow when one
operand is 0. The correct check is (a != 0 && (sum / a) != b), which properly
handles all cases including when b == 0 (no overflow).
When normalizing pointer indices for PointerShift, check if the index is
unsigned and use unsigned long long instead of signed long. This preserves
the correct semantics for size_t and other unsigned index types.
…struct assignment

- Enabled the memcpyToAssignment pass in the pass pipeline
- Improved the pass to handle both RefExpr and direct Value expressions
- Fixed the pass to work with or without bitcast operations
- This fixes aws_hash_table_move_harness by converting memcpy to proper struct assignment
- Added unwrapExpr lambda to unwrap CastExpr expressions
- This allows the pass to handle memcpy calls with casted operands
…ssions correctly

- Handle RefExpr wrapping AggregateElement (struct members) correctly
- Don't dereference struct members when converting memcpy
- Fixes compilation errors with struct member assignments
- Check if inner expression is a pointer type to determine correct dereferencing
- Handle AggregateElement expressions correctly (don't dereference struct members)
- This fixes the main aws_hash_table_move memcpy conversion
…ointer cases

- Check the type of &var to determine if it's pointer-to-pointer or pointer-to-struct
- Correctly generate *ptr = struct for pointer-to-pointer cases
- Use original values (dstVal, srcVal) to get expressions for better structure preservation
…efinitions

- Fix pointer-to-pointer handling: for memcpy(&ptr, &other_ptr, size), generate ptr = other_ptr (no dereference)
- Improve intrinsic definition search to find struct types from ExtractValueExpr and RetExpr
- Fix struct type selection to prefer anonymous structs with _Bool overflow flag
…ther_ptr

- For memcpy(&ptr, &other_ptr, size) where both are pointers, generate *ptr = *other_ptr (dereference both)
- Fix source handling to dereference pointer values when needed
- This fixes aws_hash_table_move_harness to produce VERIFICATION SUCCESSFUL
- For memcpy(&struct_var, &(*ptr), size), generate struct_var = (*ptr) not (*struct_var) = (*ptr)
- When destination is &struct_var where struct_var is a struct (not pointer), use struct_var directly
- This fixes aws_hash_table_swap compilation error with --no-slice flag
…ion pointers

When decompiling LLVM IR to C, functions declared in K&R style (no parameter
list) that are used as function pointers with parameters were generating
incorrect C signatures. This caused verification failures in CBMC when the
decompiled code had different function signatures than expected.

Problem:
- Original C: void *allocator()  [K&R style, compatible with void*(*)(void*)]
- LLVM IR: define i8* @Allocator()  [type: i8* ()]
- Used as: bitcast (i8* ()* @Allocator to i8* (i8*)*)
- llvm2c generated: void* allocator(void);  [wrong - no parameters]
- CBMC expected: void* allocator(void*);  [based on function pointer usage]

Solution:
1. Check debug info (DISubprogram) first - most authoritative source
2. Fall back to scanning function pointer usage (bitcasts) to find target types
3. Adjust function declarations to match the maximum parameter count found

The fix scans all bitcast operations where functions are cast to function pointer
types, extracts the target function pointer signature, and adjusts the function
declaration accordingly. This works for any function, any number of parameters,
and all usage sites without hardcoding.

Fixes verification failures where symbiotic->cbmc produced different results
than cbmc alone due to function pointer signature mismatches.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants