-
Notifications
You must be signed in to change notification settings - Fork 13
Description
When running TOGA with the provided datasets, for example the evosuite_reaching_tests, TOGA generates two pairs of files:
except_model_inputs.csvandexception_preds.csv;assert_model_inputs.csvandassertion_preds.csv;
For my project, I am trying to use TOGA on a dataset similar to ATLAS' dataset, where all of the assertions have been replaced by a placeholder, for example, something like this:
public void testIssue705() throws Exception {
Issue705Bean input = new Issue705Bean(""key"", ""value"");
String json = MAPPER.writeValueAsString(input);
// TEST ORACLE
}
where // TEST ORACLE would be our placeholder (this can be replaced by something else too).
Problem is, when i run TOGA with a dataset like this, it fails to write something in the assert_model_inputs.csv (this file will only contain the header), which leads to not being able to generate the assertion_preds.csv and, consequently, fail because it can not find this same file.
This leads to the following stacktrace:
preparing assertion model inputs
// Defects4J: flaky method
// public void testBug2849731() {
// TestIntervalCategoryDataset d = new TestIntervalCategoryDataset();
// d.addItem(2.5, 2.0, 3.0, "R1", "C1");
// d.addItem(4.0, 0.0, 0.0, "R2", "C1");
// ,
// DatasetUtilities.iterateRangeBounds(d));
// }
/**
* Another test for bug 2849731.
*/
public void testBug2849731_2() {
XYIntervalSeriesCollection d = new XYIntervalSeriesCollection();
XYIntervalSeries s = new XYIntervalSeries("S1");
s.add(1.0, Double.NaN, Double.NaN, Double.NaN, 1.5, Double.NaN);
d.addSeries(s);
Range r = DatasetUtilities.iterateDomainBounds(d);
// TEST ORACLE
}
[]
05/20/2022 21:39:40 - WARNING - __main__ - Process rank: -1, device: cpu, n_gpu: 0, distributed training: False, 16-bits training: False
Some weights of the model checkpoint at microsoft/codebert-base were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at microsoft/codebert-base and are newly initialized: ['classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight', 'classifier.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
05/20/2022 21:39:48 - INFO - __main__ - Training/evaluation parameters Namespace(data_dir='./', model_type='roberta', model_name_or_path='microsoft/codebert-base', task_name='assertion_classifier', output_dir='./models/assertion_classifier', config_name='', tokenizer_name='', cache_dir='', max_seq_length=200, do_train=False, do_eval=False, do_predict=True, evaluate_during_training=False, do_lower_case=False, per_gpu_train_batch_size=32, per_gpu_eval_batch_size=256, gradient_accumulation_steps=1, learning_rate=1e-05, weight_decay=0.0, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=8.0, max_steps=-1, warmup_steps=0, logging_steps=50, save_steps=50, eval_all_checkpoints=True, no_cuda=False, overwrite_output_dir=True, overwrite_cache=False, seed=42, fp16=False, fp16_opt_level='O1', local_rank=-1, server_ip='', server_port='', train_file='train.txt', dev_file='valid.txt', test_file='assert_model_inputs.csv', pred_model_dir='model/assertions/pretrained/', test_result_dir='test_results.tsv', n_gpu=0, device=device(type='cpu'), output_mode='classification', start_epoch=0, start_step=0)
testing
05/20/2022 21:39:49 - INFO - __main__ - Creating features from dataset file at ./
05/20/2022 21:39:49 - INFO - utils - LOOKING AT ./assert_model_inputs.csv
/home/daniel/work/oracles/toga/model/assertions/utils.py:364: RuntimeWarning: Mean of empty slice.
print(f'total_samples {len(trunc_b)} total truncations {(trunc_both > 0).sum()}, mean_trunc {trunc_both.mean()}, median {np.median(trunc_both)}, max = {trunc_both.max()}')
/home/daniel/.local/lib/python3.9/site-packages/numpy/core/_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
/home/daniel/.local/lib/python3.9/site-packages/numpy/core/fromnumeric.py:3440: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
Traceback (most recent call last):
File "/home/daniel/work/oracles/toga/model/assertions/run_classifier.py", line 720, in <module>
main()
File "/home/daniel/work/oracles/toga/model/assertions/run_classifier.py", line 715, in main
evaluate(args, model, tokenizer, checkpoint=None, prefix='', mode='test')
File "/home/daniel/work/oracles/toga/model/assertions/run_classifier.py", line 236, in evaluate
eval_dataset, instances = load_and_cache_examples(args, eval_task, tokenizer, ttype='test')
File "/home/daniel/work/oracles/toga/model/assertions/run_classifier.py", line 421, in load_and_cache_examples
features = convert_examples_to_features(examples, label_list, args.max_seq_length, tokenizer, output_mode,
File "/home/daniel/work/oracles/toga/model/assertions/utils.py", line 364, in convert_examples_to_features
print(f'total_samples {len(trunc_b)} total truncations {(trunc_both > 0).sum()}, mean_trunc {trunc_both.mean()}, median {np.median(trunc_both)}, max = {trunc_both.max()}')
File "/home/daniel/.local/lib/python3.9/site-packages/numpy/core/_methods.py", line 40, in _amax
return umr_maximum(a, axis, None, out, keepdims, initial, where)
ValueError: zero-size array to reduction operation maximum which has no identity
Result of running bash ./model/assertions/run_eval.sh assert_model_inputs.csv
CompletedProcess(args=['bash', './model/assertions/run_eval.sh', 'assert_model_inputs.csv'], returncode=1)
Traceback (most recent call last):
File "/home/daniel/work/oracles/toga/toga.py", line 218, in <module>
main()
File "/home/daniel/work/oracles/toga/toga.py", line 83, in main
with open("assertion_preds.csv") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'assertion_preds.csv'
I have also tried to first run one of the provided datasets (evosuite_reaching_tests) to have it generate this missing file.
With this, i can run TOGA, however, as expected, the assert_model_inputs.csv is overwritten and left empty.
Also, as may be expected, as the assertion_preds.csv is not generated for the dataset being used, TOGA, despite running, can not infer any assert.
In summary:
I would like to use TOGA on a project that has absolutely no assert statements and have it generate all of the needed asserts, is this in any way possible?
Thank you,
Daniel Bento