Skip to content

RaoNikitha/DifferentialTestingWASM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Differential Testing for WASM

Setting up your environment

Run the following command to setup the environment and install all the dependencies listed in /code/requirements.txt.

./code/setup_env.sh

Set the environment variable with the OpenAI API key.

export OPENAI_API_KEY="<ENTER OPENAI API KEY>"

1. Extract Relevant Context

All the scripts to extract relevant context can be found under /code/extract_context/ directory.

1.1. Extract list of instructions and map instructions to relevant human written tests.

Extract the list of instructions from the specifications document and generate a mapping between instructions and human written test files from the wasm spec repo.

python3 main_extract_instruction_test_map.py

1.2. Extract constraints

Extract all the relevant constraints for each instruction from the WASM specifications document.

python3 main_extract_constraints.py

1.3. Extract relevant context from code

For a given implementation, extract the relevant code snippet from the source code files for each instruction. Then generate a list of differences (described in natural language) between the extracted code snippets.

python3 main_extract_code_diffs.py

1.4. Preprocess human written tests

The human written test files have several test cases in each file for a given instruction. We preprocess these files to only keep the test cases relevant to validation and separate them by creating a new file for each test case.

python3 main_separate_human_tests.py

2. Test Generation

The test generation framework includes two main steps. The scripts can be found in /code/test_generation/

2.1 Generate test descriptions

First, we use all the extracted information as context to prompt the LLM to generate descriptions of what each test should do.

python3 main_test_decription_generation.py

2.2 Generate tests from descriptions

We then use the test descriptions along with few randomly sampled human written tests, as few shot examples, and a list of human written guidelines to generate the tests themselves.

python3 main_test_generation.py

3. Utils

The data loaders and other data specific utilities are in /code/utils/data_utils.py

The model specific utilities are in /code/utils/model_utils.py

All the prompts are in /code/utils/prompt_utils.py

4. Evaluation

All the evaluation scripts are under /code/evaluation/

4.1. Setup instructions for each WASM implementation

We setup the four different WASM implementations by following the guidelines in the respective repositories.

After setup, the following commands should run successfully:

# Should run the Wasm Spec implementation (may need to add an alias)
wasm-spec -v
# Should run the Wizard VM CLI
wizeng --version
# Should run the wasmtime CLI
wasmtime --version
# Should run the V8 CLI
d8 -version

4.2. Remove assert strings from generated tests

We remove the assertion strings generated by the LLM as the expected output since the error messages are not consistent between implementations. Removing them simplifies differential tests to Pass/Fail instead of error message differences.

# Should place new `.wast` files, stripped of assertion strings, in `data/all_generated_tests/no_assert_str_wast_tests`
pushd code/evaluation/ && python3 remove_assert_str.py && popd

4.3. Convert .wast test files into suitable format for execution

We convert the generated .wast test files into the format suitable for execution for each of the implementations.

The wizard-engine, wasmtime and WASM spec use the .bin.wast format. To translate the .wast files to .bin.wast, run the following script:

# Should place generated `.bin.wast` files in `data/all_generated_tests/BIN_no_assert_str_wast_tests`
pushd ./code/evaluation/format_translation && ./to_binary.sh && popd

The v8 engine uses .js format. To translate the .wast files to .js, run the following script:

# Should place generated `.js` files in `data/all_generated_tests/JS_no_assert_str_wast_tests`
pushd ./code/evaluation/format_translation && ./to_js.sh && popd

4.4. Run tests

NOTE: THIS WILL OVERWRITE LOGS FROM THE MOST-RECENT RUN, TO PROTECT THEM, MOVE code/evaluation/output/* CONTENTS TO ANOTHER LOCATION.

We then run the tests on all the different implementations under test and compare them against the WASM spec implementation.

For the wizard-engine:

pushd ./code/evaluation/test_runners && ./spec_vs_wizard.sh 2>&1 | tee ../output/logs_wizard_eng__wasm_spec.out && popd

For wasmtime:

pushd ./code/evaluation/test_runners && ./spec_vs_wasmtime.sh 2>&1 | tee ../output/logs_wasmtime__wasm_spec.out && popd

For v8:

pushd ./code/evaluation/test_runners && ./spec_vs_v8.sh 2>&1 | tee ../output/logs_v8__wasm_spec.out && popd

4.5. Get Differentiating tests

We take the logs from running the tests for each implementation and compare the execution output against the WASM spec implementation, and we extract all the tests where the outcomes of the two implementations are different. This produces .json file comprising all the information relevant to the differentiating tests, including the test file metadata, test content and the execution outcome of the WASM spec and the implementation under test, aiding in analysis of tests which highlight differentiating behaviour.

To get the differential tests for some implementation, make sure the right one is targeted at the top of the script through the test_implementation variable as shown below:

test_implementation = 'wizard_eng'
#test_implementation = 'v8'
#test_implementation = 'wasmtime'

Once the right one is targeted (wizard_eng, v8, or wasmtime), run the following to extract the differential tests:

pushd ./code/evaluation/ && python3 get_diff_tests.py && popd

This will output a file summarizing the found differential tests in the same directory:

ls -al ./code/evaluation/diff_tests__wasm_spec__*

4.5. Manual Post-Processing

We have some scripts that have proven useful when manually looking through the differentiating tests for Wasm.

First, wasmtime has different behavior when imported items are not available for linking at verification time. The wasmtime engine categorizes this as a failure to verify, whereas other implementations ignore the missing import and verify the code that is available. We have a script to generate some Wasm modules with stubbed imports to help work around this issue and see if the tests are differentiating with the workaround in place.

To generate the stubs:

pushd ./code/evaluation/evaluation_helpers && ./create_import_mod.sh && popd

Now you have tests with stubbed imports in ../../data/all_generated_tests/WITH_IMPORTS_no_assert_str_wast_tests, notice these are .wast. You'll need to pick up from section 4.3 and 4.4 to run the tests. Remember to change the input dir name as appropriate in the script!

Misc

  • /code/context/ has all the relevant context including the source code files, extracted constraints, bug classes and their descriptions, extracted code context (includes relevant code snippets for instruction, and list of differences between the implementations), and the generated test descriptions.

  • /code/human_written_tests/WebAssembly/processed_control_flow_validation/ has both the relevant human written test files as well as the processed tests files (See Section 1.4).

References

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •