ProvScope

ProvScope finds points of divergence and convergence between two executions of the same program under different inputs. Given two function-call traces captured by Intel Pin, it aligns them hierarchically against the program's CFG and reports exactly where control flow first diverges and where it reconverges.

Published at HiPC 2022.

How It Works

Binary + Intel Pin                   Binary + Intel Pin
(input A)                            (input B)
     │                                    │
     ▼                                    ▼
Raw function trace                  Raw function trace
(->funcName / <-funcName lines)     (->funcName / <-funcName lines)
     │                                    │
     └──────────────┬─────────────────────┘
                    ▼
        tools/extractFunccalls.py
        (filter to syscall-relevant calls,
         detect non-returning functions)
                    │
                    ▼
          .ftr  (filtered function trace)
          .nr   (non-returning function list)
                    │
                    ▼
              src/provScope -c
        (align traces against CFG,
         compute divergence/convergence)
                    │
                    ▼
          Divergence/convergence report

Dependencies

Dependency	Purpose
Intel Pin	Dynamic binary instrumentation to capture function traces
g++ with C++14	Building `provScope`
OpenSSL (`libssl-dev`, `libcrypto`)	Hash computation in the alignment engine
jsoncpp (`libjsoncpp-dev`)	JSON output
Python 3	Preprocessing scripts in `tools/`
CFG files from LLVManalyses	Per-function control flow graphs (the `coreutil_parsed/` data in this repo was generated there)

Install on Ubuntu:

sudo apt-get install g++ libssl-dev libjsoncpp-dev
# Intel Pin: download from https://www.intel.com/content/www/us/en/developer/articles/tool/pin-a-binary-instrumentation-tool-downloads.html

Repository Layout

ProvScope/
├── src/                        # Main C++ analysis tool (provScope)
│   ├── provScope.cpp           # Entry point and mode dispatch
│   ├── funcTrace.cpp/h         # Hierarchical function trace construction
│   ├── cfg.cpp/h               # CFG loading and traversal
│   ├── Comparison.cpp/h        # Divergence/convergence alignment algorithm
│   ├── Matrix.cpp/h            # Edit-distance matrix
│   ├── subgraph.cpp/h          # Subgraph matching
│   ├── readFiles.cpp/h         # File I/O for traces and CFGs
│   ├── Args.cpp/h              # CLI argument parsing
│   ├── ec.cpp/h                # Error codes
│   ├── tools.cpp/h             # Utilities
│   ├── automate.py             # Batch runner for experiments
│   ├── inputs/                 # Prepared input files for experiments
│   └── Makefile
├── tools/                      # Preprocessing and visualization scripts
│   ├── extractFunccalls.py     # Parse raw Pin trace → .ftr + .nr files
│   ├── FuncAnalysis/           # Intel PinTool that captures function traces
│   ├── funcList.py             # glibc function list helper
│   ├── convertDot.py           # Convert CFG .txt → Graphviz .dot
│   ├── convertDotDirectory.py  # Batch convertDot.py over a directory
│   ├── countDiff.py            # Count differences in `diff` output
│   ├── cntNumLine.py           # Count trace lines from `main` entry
│   ├── reduceLines.py          # Reduce trace size
│   ├── CONSTANTS.py            # Shared constants and arg parsing
│   └── Makefile
├── coreFuncTraceInput/         # Pre-processed .ftr trace files (coreutils benchmarks)
├── coreutil_parsed/            # Pre-generated CFG files (from LLVManalyses)
├── noRetFuncs/                 # Pre-generated .nr files (non-returning functions)
├── funcList.txt                # glibc function list (used to filter syscall-relevant calls)
├── glibc.txt                   # glibc symbol reference
└── noRetFuncList.txt           # Aggregate non-returning function list

Building

cd src
make
# Output: src/provScope

Usage

Step 0 — Capture Function Traces with Intel Pin

First, build the PinTool:

# Copy the FuncAnalysis tool into your Pin installation:
cp -r tools/FuncAnalysis $PIN_ROOT/source/tools/
cd $PIN_ROOT/source/tools/FuncAnalysis
make

Run your binary under Pin to capture the raw trace:

$PIN_ROOT/pin -t $PIN_ROOT/source/tools/FuncAnalysis/obj-intel64/FuncAnalysis.so \
    -o trace_inputA.txt -- ./your_binary [inputA]

$PIN_ROOT/pin -t $PIN_ROOT/source/tools/FuncAnalysis/obj-intel64/FuncAnalysis.so \
    -o trace_inputB.txt -- ./your_binary [inputB]

The raw trace uses ->funcName for function entry and <-funcName for return.

Step 1 — Preprocess Traces

Filter each trace down to syscall-relevant calls and detect non-returning functions:

cd tools
python3 extractFunccalls.py trace_inputA.txt funcList.txt
# Outputs: trace_inputA.ftr  trace_inputA.nr

python3 extractFunccalls.py trace_inputB.txt funcList.txt
# Outputs: trace_inputB.ftr  trace_inputB.nr

funcList.txt (at the repo root) lists glibc functions — calls that only reach libc without touching a syscall are pruned.

Step 2 — Generate CFGs (via LLVManalyses)

CFG files are produced by the companion LLVManalyses repo. The output is a directory of per-function .txt files. Pre-generated examples for coreutils are in coreutil_parsed/.

Step 3 — Find Divergence/Convergence

./src/provScope -c funcList.txt trace_inputA.nr coreutil_parsed/your_parsed \
    trace_inputA.ftr trace_inputB.ftr

Arguments:

Position	Argument	Description
1	`-c`	Compare mode
2	`funcList.txt`	glibc function list (filters non-syscall calls)
3	`*.nr`	Non-returning functions file (one file covers both traces since they share the same binary)
4	`coreutil_parsed/<prog>_parsed`	Directory of per-function CFG `.txt` files
5	`trace1.ftr`	First preprocessed function trace
6	`trace2.ftr`	Second preprocessed function trace

Concrete example (using the repo's pre-generated data):

./src/provScope -c funcList.txt noRetFuncs/uniq_all.nr coreutil_parsed/uniq_parsed \
    coreFuncTraceInput/uniq/uniqc.ftr coreFuncTraceInput/uniq/uniqd.ftr

Batch Mode

Prepare a text file where each line holds the arguments for one comparison:

funcList.txt noRetFuncs/uniq_all.nr coreutil_parsed/uniq_parsed coreFuncTraceInput/uniq/uniqc.ftr coreFuncTraceInput/uniq/uniqd.ftr

Then run:

./src/provScope -f input.txt

All Modes

Flag	Argc	Arguments	Description
`-c`	7	`funcList noRetFile parsedCFGDir ftr1 ftr2`	Compare two traces (main use case)
`-p`	6	`funcList noRetFile parsedCFGDir ftr1`	Find all paths in a single trace
`-t`	6	`funcList noRetFile parsedCFGDir ftr1`	Print trace in hierarchical format
`-s`	4	`parsedCFGDir outfile`	Compute program specification from CFGs
`-f`	3	`inputFile`	Batch mode: read arguments from file
`-h`	2		Print help

File Formats

Raw Pin trace (input to extractFunccalls.py):

->main
->set_program_name
<-set_program_name
->getopt_long
<-getopt_long

.ftr — filtered function trace (input to provScope):

main
set_program_name
/set_program_name
getopt_long
/getopt_long

Uses /funcName for returns. Only calls along syscall-reaching paths are kept.

.nr — non-returning functions (one name per line):

strrchr
__ofl_unlock
__stdio_close

Functions that exit via jump rather than ret — needed to reconstruct the call hierarchy correctly.

Parsed CFG .txt (one file per function, in coreutil_parsed/):

0x24198c0,epoint,0,0,0,na,na,na

Comma-delimited node records. Generated by LLVManalyses.

Experiments (HiPC 2022)

All benchmark data is pre-included in the repo:

Experiment	Data
Differential locations	`coreFuncTraceInput/` + `noRetFuncs/` + `coreutil_parsed/` — run with `-c` mode
Tracing overhead	`tools/FuncAnalysis/` PinTool
Reduction in PIN traces	`tools/extractFunccalls.py` (lines before/after filtering)
CFG specification size	LLVManalyses repo

Benchmarks: cat, chown, date, sort, uniq, b2sum, bzip2, bwa, mcf, minimap2.

Known Limitations

Requires Intel Pin for trace capture (proprietary, must be downloaded separately)
CFG files must be pre-generated by the LLVManalyses repo
Non-returning function detection is a best-effort stack scan; edge cases may require manual .nr correction
Collective calls / inlined functions may require special handling in extractFunccalls.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ProvScope

How It Works

Dependencies

Repository Layout

Building

Usage

Step 0 — Capture Function Traces with Intel Pin

Step 1 — Preprocess Traces

Step 2 — Generate CFGs (via LLVManalyses)

Step 3 — Find Divergence/Convergence

Batch Mode

All Modes

File Formats

Experiments (HiPC 2022)

Known Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
coreFuncTraceInput		coreFuncTraceInput
coreutil_parsed		coreutil_parsed
noRetFuncs		noRetFuncs
src		src
tools		tools
README.md		README.md
funcList.txt		funcList.txt
glibc.txt		glibc.txt
noRetFuncList.txt		noRetFuncList.txt

Folders and files

Latest commit

History

Repository files navigation

ProvScope

How It Works

Dependencies

Repository Layout

Building

Usage

Step 0 — Capture Function Traces with Intel Pin

Step 1 — Preprocess Traces

Step 2 — Generate CFGs (via LLVManalyses)

Step 3 — Find Divergence/Convergence

Batch Mode

All Modes

File Formats

Experiments (HiPC 2022)

Known Limitations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages