ProvScope finds points of divergence and convergence between two executions of the same program under different inputs. Given two function-call traces captured by Intel Pin, it aligns them hierarchically against the program's CFG and reports exactly where control flow first diverges and where it reconverges.
Published at HiPC 2022.
Binary + Intel Pin Binary + Intel Pin
(input A) (input B)
│ │
▼ ▼
Raw function trace Raw function trace
(->funcName / <-funcName lines) (->funcName / <-funcName lines)
│ │
└──────────────┬─────────────────────┘
▼
tools/extractFunccalls.py
(filter to syscall-relevant calls,
detect non-returning functions)
│
▼
.ftr (filtered function trace)
.nr (non-returning function list)
│
▼
src/provScope -c
(align traces against CFG,
compute divergence/convergence)
│
▼
Divergence/convergence report
| Dependency | Purpose |
|---|---|
| Intel Pin | Dynamic binary instrumentation to capture function traces |
| g++ with C++14 | Building provScope |
OpenSSL (libssl-dev, libcrypto) |
Hash computation in the alignment engine |
jsoncpp (libjsoncpp-dev) |
JSON output |
| Python 3 | Preprocessing scripts in tools/ |
| CFG files from LLVManalyses | Per-function control flow graphs (the coreutil_parsed/ data in this repo was generated there) |
Install on Ubuntu:
sudo apt-get install g++ libssl-dev libjsoncpp-dev
# Intel Pin: download from https://www.intel.com/content/www/us/en/developer/articles/tool/pin-a-binary-instrumentation-tool-downloads.htmlProvScope/
├── src/ # Main C++ analysis tool (provScope)
│ ├── provScope.cpp # Entry point and mode dispatch
│ ├── funcTrace.cpp/h # Hierarchical function trace construction
│ ├── cfg.cpp/h # CFG loading and traversal
│ ├── Comparison.cpp/h # Divergence/convergence alignment algorithm
│ ├── Matrix.cpp/h # Edit-distance matrix
│ ├── subgraph.cpp/h # Subgraph matching
│ ├── readFiles.cpp/h # File I/O for traces and CFGs
│ ├── Args.cpp/h # CLI argument parsing
│ ├── ec.cpp/h # Error codes
│ ├── tools.cpp/h # Utilities
│ ├── automate.py # Batch runner for experiments
│ ├── inputs/ # Prepared input files for experiments
│ └── Makefile
├── tools/ # Preprocessing and visualization scripts
│ ├── extractFunccalls.py # Parse raw Pin trace → .ftr + .nr files
│ ├── FuncAnalysis/ # Intel PinTool that captures function traces
│ ├── funcList.py # glibc function list helper
│ ├── convertDot.py # Convert CFG .txt → Graphviz .dot
│ ├── convertDotDirectory.py # Batch convertDot.py over a directory
│ ├── countDiff.py # Count differences in `diff` output
│ ├── cntNumLine.py # Count trace lines from `main` entry
│ ├── reduceLines.py # Reduce trace size
│ ├── CONSTANTS.py # Shared constants and arg parsing
│ └── Makefile
├── coreFuncTraceInput/ # Pre-processed .ftr trace files (coreutils benchmarks)
├── coreutil_parsed/ # Pre-generated CFG files (from LLVManalyses)
├── noRetFuncs/ # Pre-generated .nr files (non-returning functions)
├── funcList.txt # glibc function list (used to filter syscall-relevant calls)
├── glibc.txt # glibc symbol reference
└── noRetFuncList.txt # Aggregate non-returning function list
cd src
make
# Output: src/provScopeFirst, build the PinTool:
# Copy the FuncAnalysis tool into your Pin installation:
cp -r tools/FuncAnalysis $PIN_ROOT/source/tools/
cd $PIN_ROOT/source/tools/FuncAnalysis
makeRun your binary under Pin to capture the raw trace:
$PIN_ROOT/pin -t $PIN_ROOT/source/tools/FuncAnalysis/obj-intel64/FuncAnalysis.so \
-o trace_inputA.txt -- ./your_binary [inputA]
$PIN_ROOT/pin -t $PIN_ROOT/source/tools/FuncAnalysis/obj-intel64/FuncAnalysis.so \
-o trace_inputB.txt -- ./your_binary [inputB]The raw trace uses ->funcName for function entry and <-funcName for return.
Filter each trace down to syscall-relevant calls and detect non-returning functions:
cd tools
python3 extractFunccalls.py trace_inputA.txt funcList.txt
# Outputs: trace_inputA.ftr trace_inputA.nr
python3 extractFunccalls.py trace_inputB.txt funcList.txt
# Outputs: trace_inputB.ftr trace_inputB.nrfuncList.txt (at the repo root) lists glibc functions — calls that only reach libc without touching a syscall are pruned.
CFG files are produced by the companion LLVManalyses repo. The output is a directory of per-function .txt files. Pre-generated examples for coreutils are in coreutil_parsed/.
./src/provScope -c funcList.txt trace_inputA.nr coreutil_parsed/your_parsed \
trace_inputA.ftr trace_inputB.ftrArguments:
| Position | Argument | Description |
|---|---|---|
| 1 | -c |
Compare mode |
| 2 | funcList.txt |
glibc function list (filters non-syscall calls) |
| 3 | *.nr |
Non-returning functions file (one file covers both traces since they share the same binary) |
| 4 | coreutil_parsed/<prog>_parsed |
Directory of per-function CFG .txt files |
| 5 | trace1.ftr |
First preprocessed function trace |
| 6 | trace2.ftr |
Second preprocessed function trace |
Concrete example (using the repo's pre-generated data):
./src/provScope -c funcList.txt noRetFuncs/uniq_all.nr coreutil_parsed/uniq_parsed \
coreFuncTraceInput/uniq/uniqc.ftr coreFuncTraceInput/uniq/uniqd.ftrPrepare a text file where each line holds the arguments for one comparison:
funcList.txt noRetFuncs/uniq_all.nr coreutil_parsed/uniq_parsed coreFuncTraceInput/uniq/uniqc.ftr coreFuncTraceInput/uniq/uniqd.ftr
Then run:
./src/provScope -f input.txt| Flag | Argc | Arguments | Description |
|---|---|---|---|
-c |
7 | funcList noRetFile parsedCFGDir ftr1 ftr2 |
Compare two traces (main use case) |
-p |
6 | funcList noRetFile parsedCFGDir ftr1 |
Find all paths in a single trace |
-t |
6 | funcList noRetFile parsedCFGDir ftr1 |
Print trace in hierarchical format |
-s |
4 | parsedCFGDir outfile |
Compute program specification from CFGs |
-f |
3 | inputFile |
Batch mode: read arguments from file |
-h |
2 | Print help |
Raw Pin trace (input to extractFunccalls.py):
->main
->set_program_name
<-set_program_name
->getopt_long
<-getopt_long
.ftr — filtered function trace (input to provScope):
main
set_program_name
/set_program_name
getopt_long
/getopt_long
Uses /funcName for returns. Only calls along syscall-reaching paths are kept.
.nr — non-returning functions (one name per line):
strrchr
__ofl_unlock
__stdio_close
Functions that exit via jump rather than ret — needed to reconstruct the call hierarchy correctly.
Parsed CFG .txt (one file per function, in coreutil_parsed/):
0x24198c0,epoint,0,0,0,na,na,na
Comma-delimited node records. Generated by LLVManalyses.
All benchmark data is pre-included in the repo:
| Experiment | Data |
|---|---|
| Differential locations | coreFuncTraceInput/ + noRetFuncs/ + coreutil_parsed/ — run with -c mode |
| Tracing overhead | tools/FuncAnalysis/ PinTool |
| Reduction in PIN traces | tools/extractFunccalls.py (lines before/after filtering) |
| CFG specification size | LLVManalyses repo |
Benchmarks: cat, chown, date, sort, uniq, b2sum, bzip2, bwa, mcf, minimap2.
- Requires Intel Pin for trace capture (proprietary, must be downloaded separately)
- CFG files must be pre-generated by the LLVManalyses repo
- Non-returning function detection is a best-effort stack scan; edge cases may require manual
.nrcorrection - Collective calls / inlined functions may require special handling in
extractFunccalls.py