Running TOGA with an assertless dataset

When running TOGA with the provided datasets, for example the `evosuite_reaching_tests`, TOGA generates two pairs of files:
- `except_model_inputs.csv` and `exception_preds.csv`;
- `assert_model_inputs.csv` and `assertion_preds.csv`;

For my project, I am trying to use TOGA on a dataset similar to ATLAS' dataset, where all of the assertions have been replaced by a placeholder, for example, something like this:

```
public void testIssue705() throws Exception {
    Issue705Bean input = new Issue705Bean(""key"", ""value"");
    String json = MAPPER.writeValueAsString(input);
    // TEST ORACLE
}
```

where `// TEST ORACLE` would be our placeholder (this can be replaced by something else too).

&nbsp;

Problem is, when i run TOGA with a dataset like this, it fails to write something in the `assert_model_inputs.csv` (this file will only contain the header), which leads to not being able to generate the `assertion_preds.csv` and, consequently, fail because it can not find this same file.
This leads to the following stacktrace:
```
preparing assertion model inputs

// Defects4J: flaky method
// public void testBug2849731() {
// TestIntervalCategoryDataset d = new TestIntervalCategoryDataset();
// d.addItem(2.5, 2.0, 3.0, "R1", "C1");
// d.addItem(4.0, 0.0, 0.0, "R2", "C1");
// ,
// DatasetUtilities.iterateRangeBounds(d));
// }
/**
 * Another test for bug 2849731.
 */
public void testBug2849731_2() {
    XYIntervalSeriesCollection d = new XYIntervalSeriesCollection();
    XYIntervalSeries s = new XYIntervalSeries("S1");
    s.add(1.0, Double.NaN, Double.NaN, Double.NaN, 1.5, Double.NaN);
    d.addSeries(s);
    Range r = DatasetUtilities.iterateDomainBounds(d);
    // TEST ORACLE
}

[]
05/20/2022 21:39:40 - WARNING - __main__ -   Process rank: -1, device: cpu, n_gpu: 0, distributed training: False, 16-bits training: False
Some weights of the model checkpoint at microsoft/codebert-base were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at microsoft/codebert-base and are newly initialized: ['classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight', 'classifier.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
05/20/2022 21:39:48 - INFO - __main__ -   Training/evaluation parameters Namespace(data_dir='./', model_type='roberta', model_name_or_path='microsoft/codebert-base', task_name='assertion_classifier', output_dir='./models/assertion_classifier', config_name='', tokenizer_name='', cache_dir='', max_seq_length=200, do_train=False, do_eval=False, do_predict=True, evaluate_during_training=False, do_lower_case=False, per_gpu_train_batch_size=32, per_gpu_eval_batch_size=256, gradient_accumulation_steps=1, learning_rate=1e-05, weight_decay=0.0, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=8.0, max_steps=-1, warmup_steps=0, logging_steps=50, save_steps=50, eval_all_checkpoints=True, no_cuda=False, overwrite_output_dir=True, overwrite_cache=False, seed=42, fp16=False, fp16_opt_level='O1', local_rank=-1, server_ip='', server_port='', train_file='train.txt', dev_file='valid.txt', test_file='assert_model_inputs.csv', pred_model_dir='model/assertions/pretrained/', test_result_dir='test_results.tsv', n_gpu=0, device=device(type='cpu'), output_mode='classification', start_epoch=0, start_step=0)
testing

05/20/2022 21:39:49 - INFO - __main__ -   Creating features from dataset file at ./

05/20/2022 21:39:49 - INFO - utils -   LOOKING AT ./assert_model_inputs.csv

/home/daniel/work/oracles/toga/model/assertions/utils.py:364: RuntimeWarning: Mean of empty slice.
  print(f'total_samples {len(trunc_b)} total truncations {(trunc_both > 0).sum()}, mean_trunc {trunc_both.mean()}, median {np.median(trunc_both)}, max = {trunc_both.max()}')
/home/daniel/.local/lib/python3.9/site-packages/numpy/core/_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
/home/daniel/.local/lib/python3.9/site-packages/numpy/core/fromnumeric.py:3440: RuntimeWarning: Mean of empty slice.
  return _methods._mean(a, axis=axis, dtype=dtype,
Traceback (most recent call last):
  File "/home/daniel/work/oracles/toga/model/assertions/run_classifier.py", line 720, in <module>
    main()
  File "/home/daniel/work/oracles/toga/model/assertions/run_classifier.py", line 715, in main
    evaluate(args, model, tokenizer, checkpoint=None, prefix='', mode='test')
  File "/home/daniel/work/oracles/toga/model/assertions/run_classifier.py", line 236, in evaluate
    eval_dataset, instances = load_and_cache_examples(args, eval_task, tokenizer, ttype='test')
  File "/home/daniel/work/oracles/toga/model/assertions/run_classifier.py", line 421, in load_and_cache_examples
    features = convert_examples_to_features(examples, label_list, args.max_seq_length, tokenizer, output_mode,
  File "/home/daniel/work/oracles/toga/model/assertions/utils.py", line 364, in convert_examples_to_features
    print(f'total_samples {len(trunc_b)} total truncations {(trunc_both > 0).sum()}, mean_trunc {trunc_both.mean()}, median {np.median(trunc_both)}, max = {trunc_both.max()}')
  File "/home/daniel/.local/lib/python3.9/site-packages/numpy/core/_methods.py", line 40, in _amax
    return umr_maximum(a, axis, None, out, keepdims, initial, where)
ValueError: zero-size array to reduction operation maximum which has no identity



Result of running bash ./model/assertions/run_eval.sh assert_model_inputs.csv
CompletedProcess(args=['bash', './model/assertions/run_eval.sh', 'assert_model_inputs.csv'], returncode=1)



Traceback (most recent call last):
  File "/home/daniel/work/oracles/toga/toga.py", line 218, in <module>
    main()
  File "/home/daniel/work/oracles/toga/toga.py", line 83, in main
    with open("assertion_preds.csv") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'assertion_preds.csv'
```

I have also tried to first run one of the provided datasets (`evosuite_reaching_tests`) to have it generate this missing file. 
With this, i can run TOGA, however, as expected, the `assert_model_inputs.csv`  is overwritten and left empty.
Also, as may be expected, as the `assertion_preds.csv` is not generated for the dataset being used, TOGA, despite running, can not infer any assert.

In summary:
I would like to use TOGA on a project that has absolutely no assert statements and have it generate all of the needed asserts, is this in any way possible?

Thank you, 
Daniel Bento

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running TOGA with an assertless dataset #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Running TOGA with an assertless dataset #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions