I noticed that in 1.9 version compression levels "overlap", I mean some of them are basically the same.
I took the silesia corpus* and that's he result:
| Compressor name |
Compr. size |
Ratio |
Filename |
| memcpy |
211957760 |
100.00 |
silesia.tar |
| libdeflate 1.9 -1 |
73503035 |
34.68 |
silesia.tar |
| libdeflate 1.9 -2 |
71070103 |
33.53 |
silesia.tar |
| libdeflate 1.9 -3 |
70170668 |
33.11 |
silesia.tar |
| libdeflate 1.9 -4 |
69471739 |
32.78 |
silesia.tar |
| libdeflate 1.9 -5 |
68171764 |
32.16 |
silesia.tar |
| libdeflate 1.9 -6 |
67510595 |
31.85 |
silesia.tar |
| libdeflate 1.9 -7 |
67141683 |
31.68 |
silesia.tar |
| libdeflate 1.9 -8 |
66766242 |
31.50 |
silesia.tar |
| libdeflate 1.9 -9 |
66716614 |
31.48 |
silesia.tar |
| libdeflate 1.9 -10 |
64786046 |
30.57 |
silesia.tar |
| libdeflate 1.9 -11 |
64710756 |
30.53 |
silesia.tar |
| libdeflate 1.9 -12 |
64687172 |
30.52 |
silesia.tar |
Compression in levels 8 and 9, 11 and 12 are almost the same - difference in ratio of 0.02% and 0.01% is hardly noticable. Level 10 is not much different than 11 either. Difference between 10-12 is ~0.05% and between 9 and 10 is almost 1%.
First I decided to leave levels 6, 9, and 12 as they are and spread those in between by ratio. Also because level 9 is now the line between lazy/2 and near_optimal algorithms. First I thought even spread would be good but then I realised that "logarithmic", or something like that, would be better as it would resemble existing ones. So I calculated new ratios to be 1/2, 3/4 (and 1) of the gap between 6 and 9 (31.85 - 31.48) as well as 9 and 12 (31.48 - 30.52).
| lv |
x |
x+1 |
x+2 |
y |
| x+(y-x) |
0 |
1/2 |
3/4 |
1 |
| lv |
6 |
7 |
8 |
9 |
| ratio |
31.85 |
31.66 |
31.57 |
31.48 |
| lv |
9 |
10 |
11 |
12 |
| ratio |
31.48 |
31.00 |
30.76 |
30.52 |
That would be "ideal" to look for.
First I took the 9-12 levels range and checked what were the results for v1.8. After some tweek I got to the point where they were almost perfectly matched. Then with levels 6-9 it wasn't that easy but I brought it to acceptable point. Now the results are like this:
| Compressor name |
Compr. size |
Ratio |
Filename |
| memcpy |
211957760 |
100.00 |
silesia.tar |
| libdeflate 1.10-1 -1 |
73503035 |
34.68 |
silesia.tar |
| libdeflate 1.10-1 -2 |
71070103 |
33.53 |
silesia.tar |
| libdeflate 1.10-1 -3 |
70170668 |
33.11 |
silesia.tar |
| libdeflate 1.10-1 -4 |
69471739 |
32.78 |
silesia.tar |
| libdeflate 1.10-1 -5 |
68171764 |
32.16 |
silesia.tar |
| libdeflate 1.10-1 -6 |
67510595 |
31.85 |
silesia.tar |
| libdeflate 1.10-1 -7 |
67155164 |
31.68 |
silesia.tar |
| libdeflate 1.10-1 -8 |
66850226 |
31.54 |
silesia.tar |
| libdeflate 1.10-1 -9 |
66716614 |
31.48 |
silesia.tar |
| libdeflate 1.10-1 -10 |
65724812 |
31.01 |
silesia.tar |
| libdeflate 1.10-1 -11 |
65030245 |
30.68 |
silesia.tar |
| libdeflate 1.10-1 -12 |
64685969 |
30.52 |
silesia.tar |
Deltas calculated for it are as follows:
| lv |
6 |
7 |
8 |
9 |
| d(6-9) |
0 |
0.46 |
0.84 |
1 |
| lv |
9 |
10 |
11 |
12 |
| d(9-12) |
0 |
0.49 |
0.83 |
1 |
The results are spread bit more "evenly" and the gap between 9 and 10 is halved.
Similar results are for other data sets.
I think you should consider to adjust these compression levels to that or something similar.
Here are the changes to deflate_compress.c in diff. I will do pull request if that is what you are interested in.
* I chose it as it is diverse, non homogeneous, relatively big corpus that resembles real life date, imo best for general purpose compressor. I tested other corpora available on net and the results were very similar, almost the same.
They include enwik, lukas medical images and my "own", namely app (mozilla - 64bit executables from silesia corpus, google earth 32-bit for windows and firefox for linux), png-dec (bunch of decompressed png images) and html/css/js (bunch of sites styles and scripts; something that imitates html pages).
** To produce results I used lzbench with libdeflate-1.9 support.
I noticed that in 1.9 version compression levels "overlap", I mean some of them are basically the same.
I took the silesia corpus* and that's he result:
Compression in levels 8 and 9, 11 and 12 are almost the same - difference in ratio of 0.02% and 0.01% is hardly noticable. Level 10 is not much different than 11 either. Difference between 10-12 is ~0.05% and between 9 and 10 is almost 1%.
First I decided to leave levels 6, 9, and 12 as they are and spread those in between by ratio. Also because level 9 is now the line between
lazy/2andnear_optimalalgorithms. First I thought even spread would be good but then I realised that "logarithmic", or something like that, would be better as it would resemble existing ones. So I calculated new ratios to be 1/2, 3/4 (and 1) of the gap between 6 and 9 (31.85 - 31.48) as well as 9 and 12 (31.48 - 30.52).That would be "ideal" to look for.
First I took the 9-12 levels range and checked what were the results for v1.8. After some tweek I got to the point where they were almost perfectly matched. Then with levels 6-9 it wasn't that easy but I brought it to acceptable point. Now the results are like this:
Deltas calculated for it are as follows:
The results are spread bit more "evenly" and the gap between 9 and 10 is halved.
Similar results are for other data sets.
I think you should consider to adjust these compression levels to that or something similar.
Here are the changes to
deflate_compress.cin diff. I will do pull request if that is what you are interested in.* I chose it as it is diverse, non homogeneous, relatively big corpus that resembles real life date, imo best for general purpose compressor. I tested other corpora available on net and the results were very similar, almost the same.
They include enwik, lukas medical images and my "own", namely app (mozilla - 64bit executables from silesia corpus, google earth 32-bit for windows and firefox for linux), png-dec (bunch of decompressed png images) and html/css/js (bunch of sites styles and scripts; something that imitates html pages).
** To produce results I used lzbench with libdeflate-1.9 support.