This repository contains all the code necessary for building MS- Wasm and reproducing the results presented in our paper.
Most programs compiled to WebAssembly (Wasm) today are written in unsafe languages like C and C++. Unfortunately, memory-unsafe C code remains unsafe when compiled to Wasm—and attackers can exploit buffer overflows and use- after-frees in Wasm almost as easily as they can on native platforms. Memory- Safe WebAssembly (MSWasm) proposes to extend Wasm with language-level memory- safety abstractions to precisely address this problem. In this paper, we build on the original MSWasm position paper to realize this vision. We give a precise and formal semantics of MSWasm, and prove that well-typed MSWasm programs are, by construction, robustly memory safe. To this end, we develop a novel, language-independent memory-safety property based on colored memory locations and pointers. This property also lets us reason about the security guarantees of a formal C-to-MSWasm compiler—and prove that it always produces memory-safe programs (and preserves the semantics of safe programs). We use these formal results to then guide several implementations: Two compilers of MSWasm to native code, and a C-to-MSWasm compiler (that extends Clang). Our MSWasm compilers support different enforcement mechanisms, allowing developers to make security- performance trade-offs according to their needs. Our evaluation shows that on the PolyBenchC suite, the overhead of enforcing memory safety in software ranges from 22% (enforcing spatial safety alone) to 198% (enforcing full memory safety), and 51.7% when using hardware memory capabilities for spatial safety and pointer integrity.
More importantly, MSWasm’s design makes it easy to swap between enforcement mechanisms; as fast (especially hardware-based) enforcement techniques become available, MSWasm will be able to take advantage of these advances almost for free.
Each name links to its Github repo.
-
rWasm: source code for our AOT compiler from MSWasm bytecode. Consists of rWasm with modifications to support compiling from MSWasm bytecode. This is available on thePATHof the docker container: runningrWasm -w --ms-wasm <path/to/mswasm/file>will create a folder namedgenerated. The--ms-wasm-packed-tags,--ms-wasm-no-tags,--ms-wasm-baggy-boundsflags can optionally be added to change the runtime enforcement used. Runrwasm -–helpfor more information and additional options.cding into thegeneratedfolder and runningcargo run --releasewill run the program. To compile to Cheri-C instead, use themswasm-cheribranch and add the--cheriflag. -
mswasm-graal: source code for our JIT compiler from MSWasm bytecode. Consists of GraalVM with modifications to support MSWasm Bytecode.mswasm-graalis available on thePATHof the docker container: runningmswasm-graal –Builtins=wasi_snapshot_preview1 <path/to/mswasm/file>will run the file. There is also a version of Graal’s implementation of vanilla Wasm on the path - runningwasm-graal --Builtins=wasi_snapshot_preview1 <path/to/wasm/file>will run the Wasm program.
mswasm-llvm: source code for our compiler from C to MSWasm bytecode. Consists of a fork of LLVM (specifically, the CHERI fork of LLVM) with modifications to produce MSWasm bytecode. Running/home/mswasm-llvm/llvm/build/bin/clang –target=wasm32-wasi --sysroot=”/home/mswasm-wasi-libc/sysroot” <path/to/c/file>will generate an MSWasm program from a basic C program. Due to current limitations of the MSWasm prototypes, clang does not correctly compile arbitrary programs. In particular, expect errors on most programs that usestdout.
-
mswasm-wasi-libc: source code for supporting compilation of C executables to MSWasm bytecode. Consists of WASI-libc with modifications to support MSWasm bytecode. -
mswasm-polybench: PolybenchC benchmarks, compiled to native x86-64 code, Wasm bytecode, and MSWasm bytecode; along with scripts to perform benchmarking. Inside of the benchmark- binaries folders, there are.mswasmMS-WebAssembly binary files to be run byrWasmandmswasm-graal.mswatMS-WebAssembly text files to be read by humans.nativebinaries to run as native C code.wasmWebAssembly binaries to be run byrWasmandwasm-graal.watWebAssembly text files to be read by humans. -
mswasm-wabt: source code formswasm2wat, a utility for converting MSWasm files into a readable text format. Consists of a fork of WABT partially modified to work on MSWasm bytecode.mswasm2watis available on thePATHand takes an MSWasm binary file as input. For more information Wasm text format, see [this guide](https://developer.mozilla.org/en-US/docs/ WebAssembly/Understanding_the_text_format).
We provide a Docker container to create an environment to run these benchmarks at ghcr.io/plsyssec/mswasm. To install, install Docker and run
docker pull ghcr.io/plsyssec/mswasm:latestThis will download the container. The container is around 80GB. Once the container is downloaded, you can run it with commands such as
docker run -it mswasm:latestYou may also use the Docker desktop interface if desired.
We provide a Dockerfile to create an environment to run these benchmarks. To install, install Docker, and in the same directory as the Dockerfile, run
docker build -t mswasm .This will build the container. This will take a reasonable amount of time (around 45 minutes on our host machine) and require internet for cloning git repositories. This will take more than 32GB of RAM due to compilation, and the final image will be around 80GB. Once the container is built, you can run it with commands such as
docker run -it mswasm:latestYou may also use the Docker desktop interface if desired.
To run the benchmark suite from the paper, use the original benchify.toml file
with uncommented warmup, min_run and max_run values. The original file can be
found in the mswasm-polybench repo. Running the entire benchmark suite will
take a substantial amount of time. It would be advised to lower the min and max
runs.
It can be run with the same instructions as before: cd mswasm-polybench
benchify benchify.toml. Output will be generated on stdout and in the
benchify-results folder.
To choose which tests to run, you can modify the benchify.toml file.
The min_runs and max_runs numbers will determine the number of times
a benchmark will be run. Lowering these values to 3 will provide a quicker
benchmark run, but the results will have more noise.
You can comment tests out by removing or commenting them out in benchify.toml.
For instance, if I only wanted to run the 2mm benchmark, my benchify.toml would
look like
[[tests]]
name = "2mm"
tag = "wasi"
file = "benchmark-binaries/2mm.mswasm"
stdout_is_timing = true
# [[tests]]
# name = "3mm"
# tag = "wasi"
# file = "benchmark-binaries/3mm.mswasm"
# stdout_is_timing = true
#
# [[tests]]
# name = "adi"
# tag = "wasi"
# file = "benchmark-binaries/adi.mswasm"
# stdout_is_timing = true
#
# [[tests]]
# name = "atax"
# tag = "wasi"
# file = "benchmark-binaries/atax.mswasm"
# stdout_is_timing = true
To generate the graphs used in the paper, you can use the
generate-graphs.py script. If you wish to run
the script from the container, run git pull from the mswasm-polybench repo
to pull the script, and then run apt-get update && apt-get install python3-matplotlib python3-pandas python3-seaborn to install needed
dependencies. The script can then be run as
python3 <path to generate-graphs.py> <path to benchify csv data in benchify-results>The script will generate 5 graphs as pdfs in the current working directory.
You can extract these graphs from the container with docker cp, such as
docker container ls # find the name of your container
docker cp <container name>:<path to pdf> <path on local machine to copy to>Alternatively, the python script only requires the benchify csv data, so
it does not need to be run on the container. docker cp can be used to
retrieve the benchify csv data, then the python script can be run on
a local machine that has python3 and matplotlib, pandas, and seaborn
installed.