Compression levels adjustment

I noticed that in 1.9 version compression levels "overlap", I mean some of them are basically the same.
I took the [silesia corpus](http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia)\* and that's he result:

| Compressor name    | Compr. size | Ratio | Filename |
| ----------------   | ----------- | ----- | -------- |
| memcpy             | 211957760 |100.00 | silesia.tar |
| libdeflate 1.9 -1  |  73503035 | 34.68 | silesia.tar |
| libdeflate 1.9 -2  |  71070103 | 33.53 | silesia.tar |
| libdeflate 1.9 -3  |  70170668 | 33.11 | silesia.tar |
| libdeflate 1.9 -4  |  69471739 | 32.78 | silesia.tar |
| libdeflate 1.9 -5  |  68171764 | 32.16 | silesia.tar |
| libdeflate 1.9 -6  |  67510595 | 31.85 | silesia.tar |
| libdeflate 1.9 -7  |  67141683 | 31.68 | silesia.tar |
| libdeflate 1.9 -8  |  66766242 | 31.50 | silesia.tar |
| libdeflate 1.9 -9  |  66716614 | 31.48 | silesia.tar |
| libdeflate 1.9 -10 |  64786046 | 30.57 | silesia.tar |
| libdeflate 1.9 -11 |  64710756 | 30.53 | silesia.tar |
| libdeflate 1.9 -12 |  64687172 | 30.52 | silesia.tar |


Compression in levels 8 and 9, 11 and 12 are almost the same - difference in ratio of 0.02% and 0.01% is hardly noticable. Level 10 is not much different than 11 either. Difference between 10-12 is ~0.05% and between 9 and 10 is almost 1%.

First I decided to leave levels 6, 9, and 12 as they are and spread those in between by ratio. Also because level 9 is now the line between `lazy/2` and `near_optimal` algorithms. First I thought even spread would be good but then I realised that "logarithmic", or something like that, would be better as it would resemble existing ones. So I calculated new ratios to be 1/2, 3/4 (and 1) of the gap between 6 and 9 (31.85 - 31.48) as well as 9 and 12 (31.48 - 30.52).

| lv | x | x+1 | x+2 | y |
| -- | - | - | - | - |
| x+(y-x) | 0 | 1/2 | 3/4 | 1 |
| lv | 6 | 7 | 8 | 9 |
| ratio | 31.85 | 31.66 | 31.57 | 31.48 |
| lv | 9 | 10 | 11 | 12 |
| ratio | 31.48 | 31.00 | 30.76 | 30.52 |

That would be "ideal" to look for.

First I took the 9-12 levels range and checked what were the results for v1.8. After some tweek I got to the point where they were almost perfectly matched. Then with levels 6-9 it wasn't that easy but I brought it to acceptable point. Now the results are like this:

| Compressor name    | Compr. size | Ratio | Filename |
| ----------------   | ----------- | ----- | -------- |
| memcpy             | 211957760 |100.00 | silesia.tar |
| libdeflate 1.10-1 -1 | 73503035 | 34.68 | silesia.tar |
| libdeflate 1.10-1 -2 | 71070103 | 33.53 | silesia.tar |
| libdeflate 1.10-1 -3 | 70170668 | 33.11 | silesia.tar |
| libdeflate 1.10-1 -4 | 69471739 | 32.78 | silesia.tar |
| libdeflate 1.10-1 -5 | 68171764 | 32.16 | silesia.tar |
| libdeflate 1.10-1 -6 | 67510595 | 31.85 | silesia.tar |
| libdeflate 1.10-1 -7 | 67155164 | 31.68 | silesia.tar |
| libdeflate 1.10-1 -8 | 66850226 | 31.54 | silesia.tar |
| libdeflate 1.10-1 -9 | 66716614 | 31.48 | silesia.tar |
| libdeflate 1.10-1 -10 | 65724812 | 31.01 | silesia.tar |
| libdeflate 1.10-1 -11 | 65030245 | 30.68 | silesia.tar |
| libdeflate 1.10-1 -12 | 64685969 | 30.52 | silesia.tar |

Deltas calculated for it are as follows:

| lv | 6 | 7 | 8 | 9 |
| -- | - | - | - | - |
| d(6-9) | 0 | 0.46 | 0.84 | 1 |
| lv | 9 | 10 | 11 | 12 |
| d(9-12) | 0 | 0.49 | 0.83 | 1 |

The results are spread bit more "evenly" and the gap between 9 and 10 is halved.

Similar results are for other data sets.

I think you should consider to adjust these compression levels to that or something similar.

Here are the changes to `deflate_compress.c` [in diff](https://pastebin.com/SbHpX7Hd). I will do pull request if that is what you are interested in.

----
\* I chose it as it is diverse, non homogeneous, relatively big corpus that resembles real life date, imo best for general purpose compressor. I tested other corpora available on net and the results were very similar, almost the same.
They include [enwik](http://www.mattmahoney.net/dc/textdata.html), [lukas medical images](http://www.data-compression.info/Corpora/LukasCorpus/index.html) and my "own", namely app (mozilla - 64bit executables from silesia corpus, google earth 32-bit for windows and firefox for linux), png-dec (bunch of decompressed png images) and html/css/js (bunch of sites styles and scripts; something that imitates html pages).

\*\* To produce results I used lzbench with [libdeflate-1.9 support](https://github.com/inikep/lzbench/pull/113).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compression levels adjustment #166

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Compressor name	Compr. size	Ratio	Filename
memcpy	211957760	100.00	silesia.tar
libdeflate 1.9 -1	73503035	34.68	silesia.tar
libdeflate 1.9 -2	71070103	33.53	silesia.tar
libdeflate 1.9 -3	70170668	33.11	silesia.tar
libdeflate 1.9 -4	69471739	32.78	silesia.tar
libdeflate 1.9 -5	68171764	32.16	silesia.tar
libdeflate 1.9 -6	67510595	31.85	silesia.tar
libdeflate 1.9 -7	67141683	31.68	silesia.tar
libdeflate 1.9 -8	66766242	31.50	silesia.tar
libdeflate 1.9 -9	66716614	31.48	silesia.tar
libdeflate 1.9 -10	64786046	30.57	silesia.tar
libdeflate 1.9 -11	64710756	30.53	silesia.tar
libdeflate 1.9 -12	64687172	30.52	silesia.tar

lv	x	x+1	x+2	y
x+(y-x)	0	1/2	3/4	1
lv	6	7	8	9
ratio	31.85	31.66	31.57	31.48
lv	9	10	11	12
ratio	31.48	31.00	30.76	30.52

Compression levels adjustment #166

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions