[RevEng][Math] Data compression and entropy

This is yet another blog post about entropy. This time we will measure a weight of effective information in some data. Or payload (not in exploit sense).

# generate RSA key
openssl genrsa -out keypair.pem 4096
# extract public key out of it
openssl rsa -in keypair.pem -pubout -out pubkey.pub
# dump secret key:
openssl rsa -noout -text -in keypair.pem > secret.txt
# dump public key:
openssl rsa -noout -text -inform PEM -pubin -in pubkey.pub > pub.txt

Start with public key. It has 4096 bits of high entropy data:

RSA Public-Key: (4096 bit)
Modulus:
    00:c9:e5:db:c9:f7:ae:d0:f6:6f:44:1a:1c:54:15:
    2d:50:69:93:7e:90:3f:c4:2b:e4:7d:33:1a:78:a9:

...

    e9:b1:f8:fa:fb:90:f0:55:d7:4c:46:12:04:9d:e5:
    b3:c2:7d
Exponent: 65537 (0x10001)

Public key in PEM text format:

-----BEGIN PUBLIC KEY-----
MIICIjANBgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEAyeXbyfeu0PZvRBocVBUt
UGmTfpA/xCvkfTMaeKnU55jReRnP9+x7OghZkYkAQT8yHL6dnSZhjs5iSCPQp31q
m/uVLx2oo/vWDI/B2ZRlqXg0xPaDw2HVIhsnhELhF9zqoIaI8j9dflCUqaixnmGo
qEemvFSTWV72aJ4tflHEm+tqaJhstQAqyp7RKOqEJgLc4GeD72hfFGYsubAVslwN
NzYe8vJlpL2CxMcckBt18udIHuh13bTGBAcrrJUQfgQe2WwYPvhs+hFIVSWO3CzN
1XJJOqpZl5WIuDeDJmC3fCxNAUdswhk1XCD0cSzgS9SkZqZFEKlqOcEGU+y3fXD/
cYvB9WqVgRe5/s6xlnXf/YLWBsKgg66JHBpM7dts7kqwZ/mqOw7cx3aVtGJYUpoc
YwtkL4bIZ+XCCp8rSM9G0j1KhDHdVZG06uWP0sY7yugzbBHiJocuRhzLfbjGpQFZ
wOHsEv9XQR7PViX0JbTMaUuc889KhtncAQbs2Z4Euy6HToiqDuggqpBwm9gv22ae
xwO4b96vRiyQv9Fh7rxMEf4wlwh6/AGRZ7UqpeoNjIEXfizfnmMGt7LqZnsCzLjW
BKnqeH89mBK/YFmv+NtcVu9oFZtYEMNmi7L4jj9Uq8VxmdB1diT5l9I4lhjz7Sbp
sfj6+5DwVddMRhIEneWzwn0CAwEAAQ==
-----END PUBLIC KEY-----

Compress the PEM text file with xz -z -9. The final size is 740 bytes. Should be - 4096/8 = 512 bytes. OK, xz is not the ideal compressor. And there is much overhead - base64 encoding, ASN.1 tokens, "BEGIN PUBLIC KEY" header/footer, etc. An ideal compressor in the ideal world should produce data output as close to 512 bytes as possible.

Now compress the text dump: xz -z -9 pub.txt - 948 bytes. Worse, but OK.

What is in the secret key?

RSA Private-Key: (4096 bit, 2 primes)
modulus:
    00:c9:e5:db:c9:f7:ae:d0:f6:6f:44:1a:1c:54:15:
...
    b3:c2:7d
publicExponent: 65537 (0x10001)
privateExponent:
    00:b1:e5:79:5e:62:81:84:da:3f:7c:10:4d:b9:c0:
...
    51:2f:39
prime1:
    00:ff:67:d5:e8:b4:85:7e:63:13:b0:e9:0f:ae:05:
...
    f3:4f
prime2:
    00:ca:5e:24:f5:c5:cc:52:f7:17:d1:09:b5:fd:fe:
...
    aa:73
exponent1:
    51:57:38:81:0c:3d:17:ab:66:32:09:87:bc:dc:54:
...
    71
exponent2:
    00:c1:9d:1d:23:7f:e1:23:27:81:43:e0:54:9c:f4:
...
    1d:fd
coefficient:
    00:b0:42:15:15:3e:f1:e3:65:44:18:bc:c8:6e:ce:
...
    c6:2a

With my comments:

RSA Private-Key: 4096 bits
modulus: 4096 bit
publicExponent: 16 bits
privateExponent: 4096 bits
prime1: 2048 bits
prime2: 2048 bits
exponent1: 2048 bits
exponent2: 2048 bits
coefficient: 2048 bits

Sum: 14352 bits or 1794 bytes

PEM secret key file compressed with xz - 2676 bytes. Not ideal, but you got the idea.

Compress the dump (which consists mainly of hexadecimal digits and colon characters): xz -z -9 secret.txt - 3172 bytes.

# Get my TLS certificate, signed by Let's Encrypt:
openssl s_client -connect yurichev.com:443 < /dev/null > tmp
# Dump it:
openssl x509 -in tmp -text > tmp2
# Compress with xz
xz -z -9 tmp2
# Get file size:
stat tmp2.xz

... 3352 bytes or 26816 bits.

More information about entropy is in my book.

(the post first published at 20220719.)


List of my other blog posts.

Yes, I know about these lousy Disqus ads. Please use adblocker. I would consider to subscribe to 'pro' version of Disqus if the signal/noise ratio in comments would be good enough.