## [RevEng][Math] Data compression and entropy, part II

Another example. Here I published a list of 891,190 unique RSA-2k moduli, in decimal form. How good it can be compressed by xz? Final size is 242704140 bytes. Let's divide:

% bc

scale=3
242704140/891190
272.337

272*8
2176


Yes, each number (modulus) has ~2048 bits. And RSA moduli exhibits high level of entropy. So, xz can compress numbers in decimal form perfectly.

What about random stream converted to base64 and compressed?

% dd if=/dev/random of=tmp bs=1024M count=10

% stat tmp
File: tmp
Size: 335544310

% base64 tmp > tmp.base64

% stat tmp.base64
File: tmp.base64
Size: 453279159

% xz tmp.base64

% stat tmp.base64.xz
File: tmp.base64.xz
Size: 348321524


Almost of the same size, as the original file of high entropy.

It"s like xz has base64 decoder and/or can recognize decimal numbers!

This can be yet another test for compression algorithms.

Entropy level can be a quick-n-dirty metric of how good your password is.
% echo passpass | ent
Entropy = 1.836592 bits per byte.

Entropy = 2.947703 bits per byte.

Entropy = 2.947703 bits per byte.

% echo "l33tc0de" | ent
Entropy = 2.947703 bits per byte.

% echo "kewl_l33t_c0der" | ent
Entropy = 3.500000 bits per byte.

% echo "pA\$swORd" | ent Entropy = 3.169925 bits per byte. % echo "_coolpA\$sw0Rd" | ent
Entropy = 3.664498 bits per byte.

% echo "_c00lpA\$sw0Rd" | ent Entropy = 3.467720 bits per byte. ( zero repeats ) % echo "_c00lpA\$sw0Rd%" | ent
Entropy = 3.589898 bits per byte.
( new character at the end )


Also, entropy metric is used in Discourse forum to determine, how short/uninformative title and body is. These are configuration parameters: "body min entropy" and "title min entropy".

###### (the post first published at 20220929.)

Yes, I know about these lousy Disqus ads. Please use adblocker. I would consider to subscribe to 'pro' version of Disqus if the signal/noise ratio in comments would be good enough.