The encode_labels() function in senteval/sick.py from this article is meant for labels [1, ..., K] (see section 4.2 of the paper).
STSBenchmarkEval class inherits SICKRelatednessEval, so it inherits its encode_labels() function. However, STSBenchmark task has labels from 0 to 5.
Thus by constructions, a model trained in this way will never predict correctly data with label in [0, 1].
It is easy to check this issue by running examples/bow.py on STSBenchmark task and printing min(results['STSBenchmark']['yhat']) that will be always greater than 1!
An easy way to fix this could be shifting the original labels in senteval/sts.py:
sick_data['y'] = [float(s)+1 for s in sick_data['y']]
and then fix the ranges in the rest of the code. However, that will probably mess the code for SICK task, that is currently correct.