Skip to content

Commit 2f16977

Browse files
Add function pointer restriction documentation
1 parent 7e4cdd5 commit 2f16977

File tree

2 files changed

+174
-0
lines changed

2 files changed

+174
-0
lines changed

doc/cprover-manual/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
[A Short Tutorial](cbmc/tutorial/),
1010
[Loop Unwinding](cbmc/unwinding/),
1111
[Assertion Checking](cbmc/assertions/),
12+
[Restricting function pointers](cbmc/restrict-function-pointer/),
1213
[Memory Analyzer](cbmc/memory-analyzer/),
1314
[Program Harness](cbmc/goto-harness/)
1415

Lines changed: 173 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,173 @@
1+
[CPROVER Manual TOC](../../)
2+
3+
## Restricting function pointers
4+
5+
### Motivation
6+
7+
CBMC comes with a way to resolve calls to function pointers to direct function
8+
calls. This is needed because symbolic execution itself can't handle calls to
9+
function pointers. In practice, this looks something like this:
10+
11+
Given that there are functions with these signatures available in the program:
12+
13+
```
14+
int f(int x);
15+
int g(int x);
16+
int h(int x);
17+
```
18+
19+
And we have a call site like this:
20+
21+
```
22+
typedef int(*fptr_t)(int x);
23+
void call(fptr_t fptr) {
24+
int r = fptr(10);
25+
assert(r > 0);
26+
}
27+
```
28+
29+
Function pointer removal will turn this into code similar to this:
30+
31+
```
32+
void call(fptr_t fptr) {
33+
int r;
34+
if(fptr == &f) {
35+
r = f(10);
36+
} else if(fptr == &g) {
37+
r = g(10);
38+
} else if(fptr == &h) {
39+
r = h(10);
40+
} else {
41+
// sanity check
42+
assert(false);
43+
assume(false);
44+
}
45+
return r;
46+
}
47+
```
48+
49+
This works well enough for simple cases. However, this is a very simple
50+
replacement only based on the signature of the function (and whether or not they
51+
have their address taken somewhere in the program), so if there are many
52+
functions matching a particular signature, or if some of these functions are
53+
expensive in symex (e.g. functions with lots of loops or recursion), then this
54+
can be a bit cumbersome - especially if we, as a user, already know that a
55+
particular function pointer will only resolve to a single function or a small
56+
set of functions. This is what the `--restrict-function-pointer` option allows.
57+
58+
### Example
59+
60+
Take the motivating example. Let us assume that we know for a fact that `call`
61+
will always receive pointers to either `f` or `g` during actual executions of
62+
the program, and symbolic execution for `h` is too expensive to simply ignore
63+
the cost of its branch. For this, we will label the places in each function
64+
where function pointers are being called, to this pattern:
65+
66+
```
67+
<function-name>.function_pointer_call.<N>
68+
```
69+
70+
where `N` is referring to which function call it is - so the first call to a
71+
function pointer in a function will have `N=1`, the 5th `N=5` etc.
72+
73+
We can call `cbmc` with `--restrict-function-pointer
74+
call.function_pointer_call.1/f,g`. This can be read as
75+
76+
> For the first call to a function pointer in the function `call`, assume that
77+
> it can only be a call to `f` or `g`
78+
79+
The resulting code looks similar to the original example, except now there will
80+
not be a call to `h`:
81+
82+
```
83+
void call(fptr_t fptr) {
84+
int r;
85+
if(fptr == &f) {
86+
r = f(10);
87+
} else if(fptr == &g) {
88+
r = g(10);
89+
} else {
90+
// sanity check
91+
assert(false);
92+
assume(false);
93+
}
94+
return r;
95+
}
96+
```
97+
98+
Another example: Imagine we have a simple virtual filesystem API and implementation
99+
like this:
100+
101+
```
102+
typedef struct filesystem_t filesystem_t;
103+
struct filesystem_t {
104+
int (*open)(filesystem_t *filesystem, const char* file_name);
105+
};
106+
107+
int fs_open(filesystem_t *filesystem, const char* file_name) {
108+
filesystem->open(filesystem, file_name);
109+
}
110+
111+
int nullfs_open(filesystem_t *filesystem, const char* file_name) {
112+
return -1;
113+
}
114+
115+
filesystem_t nullfs_val = {.open = nullfs_open};
116+
filesystem *const nullfs = &nullfs_val;
117+
118+
filesystem_t *get_fs_impl() {
119+
// some fancy logic to determine
120+
// which filesystem we're getting -
121+
// in-memory, backed by a database, OS file system
122+
// - but in our case, we know that
123+
// it always ends up being nullfs
124+
// for the cases we care about
125+
return nullfs;
126+
}
127+
int main(void) {
128+
filesystem_t *fs = get_fs_impl();
129+
assert(fs_open(fs, "hello.txt") != -1);
130+
}
131+
```
132+
133+
In this case, the assumption is that *we* know that in our `main`, `fs` can be
134+
nothing other than `nullfs`; But perhaps due to the logic being too complicated
135+
symex ends up being unable to figure this out, so in the call to `fs_open()` we
136+
end up branching on all functions matching the signature of
137+
`filesystem_t::open`, which could be quite a few functions within the program.
138+
Worst of all, if it's address is ever taken in the program, as far as the "dumb"
139+
function pointer removal is concerned it could be `fs_open()` itself due to it
140+
having a matching signature, leading to symex being forced to follow a
141+
potentially infinite recursion until its unwind limit.
142+
143+
In this case we can again restrict the function pointer to the value which we
144+
know it will have:
145+
146+
```
147+
--restrict-function-pointer fs_open.function_pointer_call.1/nullfs_open
148+
```
149+
150+
### Loading from file
151+
152+
If you have many places where you want to restrict function pointers, it'd be a
153+
nuisance to have to specify them all on the command line. In these cases, you
154+
can specify a file to load the restrictions from instead, via the
155+
`--function-pointer-restrictions-file` option, which you can give the name of a
156+
JSON file with this format:
157+
158+
```
159+
{
160+
"function_call_site_name": ["function1", "function2", ...],
161+
...
162+
}
163+
```
164+
165+
**Note:** If you pass in multiple files, or a mix of files and command line
166+
restrictions, the final restrictions will be a set union of all specified
167+
restrictions.
168+
169+
**Note:** as of now, if something goes wrong during type checking (i.e. making
170+
sure that all function pointer replacements refer to functions in the symbol
171+
table that have the correct type), the error message will refer the command line
172+
option `--restrict-function-pointer` regardless of whether the restriction in
173+
question came from the command line or a file.

0 commit comments

Comments
 (0)