[Crypto] Cracking AES-CTR with reused nonce

To understand CTR mode, keep in mind this scheme from Wikipedia.

When the same key and nonce is used, ciphertext1 XOR ciphertext2 = plaintext1 XOR plaintext2.

English text and 'sparse' packet

First example. There are two messages. One is in plain English. Second is what I call 'sparse packet' -- many zero bytes 'on background' with spots of random data. Packets like these are abundant in binary protocols and file formats. Including SSH. Both messages are to be encrypted with the same key and nonce.

def encrypt(msg):
    KEY=b'very secret key!'
    ctr = Crypto.Util.Counter.new(128, little_endian=False, initial_value=123456789012345678912345678900)
    t = Crypto.Cipher.AES.new(KEY, Crypto.Cipher.AES.MODE_CTR, counter = ctr)
    return t.encrypt(msg)

# from https://www.gutenberg.org/files/1661/1661-0.txt
msg1=b"Well, I followed you to your door, and so made sure that I was really an object of interest to the celebrated Mr. Sherlock Holmes."

c1=encrypt(msg1)
print (c1)

def zero_string(x):
    return b"\x00"*x

msg2=b""
msg2=msg2+os.urandom(4)+zero_string(4)
msg2=msg2+os.urandom(8)+zero_string(16)
msg2=msg2+os.urandom(4)+zero_string(3)
msg2=msg2+os.urandom(5)+zero_string(16)
msg2=msg2+os.urandom(4)+zero_string(10)
msg2=msg2+os.urandom(1)+zero_string(12)
msg2=msg2+os.urandom(4)+zero_string(4)
msg2=msg2+os.urandom(3)+zero_string(32)
hexdump.hexdump(msg2)

c2=encrypt(msg2)
print (c2)

The resulting 'random' 'sparse' block is:

00000000: BB 1F 7E B7 00 00 00 00  A3 1C 68 E6 1D 50 D0 82  ..~.......h..P..
00000010: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................
00000020: D4 5D 4F C8 00 00 00 B2  43 6C 2A 12 00 00 00 00  .]O.....Cl*.....
00000030: 00 00 00 00 00 00 00 00  00 00 00 00 CF ED 9D 7D  ...............}
00000040: 00 00 00 00 00 00 00 00  00 00 72 00 00 00 00 00  ..........r.....
00000050: 00 00 00 00 00 00 00 D3  41 E8 41 00 00 00 00 B8  ........A.A.....
00000060: 23 70 00 00 00 00 00 00  00 00 00 00 00 00 00 00  #p..............
00000070: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................
00000080: 00 00                                             ..

Now let the fun commence. Here we have only two ciphertexts. Nothing else! We don't know key, nonce. Heck, we don't even know encryption algorithm! We just XOR two ciphertexts and dump info.

#!/usr/bin/env python3

# https://pypi.org/project/hexdump/
import hexdump

c1=b'VC\xdb\x8d\xd5\x9b\x95z\x07\x83&\xf9\xbfU1\xb9\x84\xca\xc3\x9e\x06\x9e\x14\xb7}\x16\xe8\x98\xa9\x90!X\xc3;*D2\xb1~\xf6\xb9\xc8{\xd8\xb0B\x92\x8dp\xe5a\xfb\x95fg4l\xb3\x86c\xe1\xab\xc7\xfda\x9c\xdfO\xfc=u\xb9\xff?q.\x03\xd4\xb0\x1c\x80!\x9f\xe5\xdfC\x02\xec\r\xee\xd8n\xba\xe6\xbd\xe0M5|\x1e\xf2\x86Oi\xc8\xa6\xf3\xdad\nwS\xeb_\xecs\x8a\xe7_M\xcc\x15\x1e\xab\xbe#R\xdb\xed='
c2=b'!\xde\x9b!\xf9\xbb\xdcZP\xf5\xad\x88X\x85\xf8"\xa4\xb3\xac\xeb&\xea{\x97\x04y\x9d\xea\x89\xf4N76o\xde9\\\xd5^\xcd\xb4p\xbf\x94\xd4\'\xb2\xfe\x05\x97\x04\xdb\xe1\x0e\x06@L\xfa\xa6\x14\xca>C<\x04\xfd\xb3#\x85\x1d\x14\xd7\xdfP|Df\xb7\xc4<\xefG\xbf\x8c\xb17g@@\x00\x9fN\xce\x89\x9d!\xc3k\\}\x97\xea*\x0b\xba\xc7\x87\xbf\x00*:!\xc5\x7f\xbf\x1b\xef\x953"\xaf~>\xe3\xd1O?\xbe\x9e\x13'

# from https://stackoverflow.com/questions/29408173/byte-operations-xor-in-python
def XOR_two_blocks(x, y):
    return bytes(a ^ b for a, b in zip(x, y))

xored=XOR_two_blocks(c1, c2)
hexdump.hexdump(xored)

And here we can see spots of English texts, at places where zero strings in second message are located:

00000000: 77 9D 40 AC 2C 20 49 20  57 76 8B 71 E7 D0 C9 9B  w.@., I Wv.q....
00000010: 20 79 6F 75 20 74 6F 20  79 6F 75 72 20 64 6F 6F   you to your doo
00000020: F5 54 F4 7D 6E 64 20 3B  0D B8 C4 4C 64 65 20 73  .T.}nd ;...Lde s
00000030: 75 72 65 20 74 68 61 74  20 49 20 77 2B 95 84 C1  ure that I w+...
00000040: 65 61 6C 6C 79 20 61 6E  20 6F 0D 6A 65 63 74 20  eally an o.ject
00000050: 6F 66 20 69 6E 74 65 AC  4D EE 47 20 74 6F 20 C1  of inte.M.G to .
00000060: 8E 5E 20 63 65 6C 65 62  72 61 74 65 64 20 4D 72  .^ celebrated Mr
00000070: 2E 20 53 68 65 72 6C 6F  63 6B 20 48 6F 6C 6D 65  . Sherlock Holme
00000080: 73 2E                                             s.

Two XML files

Got two XML files: 1, 2. First, please take a glance on them. Now encrypt them both with the same key and nonce.

XOR two and dump:

00000000: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................
00000010: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................
00000020: 00 00 00 00 00 00 00 00  21 33 31 20 27 29 26 4D  ........!31 ')&M
00000030: 7E 7F 4D 59 2D 31 00 00  00 00 1C 46 53 3B 2D 6A  ~.MY-1.....FS;-j
00000040: 46 65 1E 65 4D 4C 07 13  08 45 7C 37 17 00 02 1A  Fe.eML...E|7....
00000050: 10 1B 45 6B 4E 32 2F 38  29 36 02 25 4E 41 4D 45  ..EkN2/8)6.%NAME
00000060: 02 4B 72 74 69 73 68 4E  30 06 01 45 7A 5D 59 4F  .KrtishN0..Ez]YO
00000070: 57 09 13 6E 22 26 20 30  31 00 00 00 00 00 00 00  W..n"& 01.......
...

Nothing interesting so far, but you see a zero header. Why it's here? This is a sign that two plain texts have the same header. And this is true. The header is: "<?xml version="1.0" encoding="UTF-8"?>" plus another opening "<" at the second line. This information may me significant.

Oh, and you see the "NAME" word in clear. This means that one XML has "name" word at this position, and another -- four spaces at the same position. A lowercase character XORed with 0x20 will result in the same character in uppercase. I.e., XOR 0x20 operation switch case of Latin-1 character.

Now we may suspect that XML files have some popular tags. Maybe, "<description>". We will try to XOR this string at all possible positions.

c1=read_bin_file(sys.argv[1])
c2=read_bin_file(sys.argv[2])

# from https://stackoverflow.com/questions/29408173/byte-operations-xor-in-python
def XOR_two_blocks(x, y):
    return bytes(a ^ b for a, b in zip(x, y))

def my_is_printable(c):
    return (c>=0x20 and c<=0x7E) or c==0xa or c==0xd

def all_printable(buf):
    return all(map(my_is_printable, buf))

xored=XOR_two_blocks(c1, c2)
hexdump.hexdump(xored)

def try_str(str_to_try):
    print ("trying string: ", str_to_try)
    for i in range(len(xored)-len(str_to_try)):
        tmp=XOR_two_blocks(xored[i:i+len(str_to_try)], str_to_try)
        if all_printable(tmp):
            print ("position 0x%x" % i)
            hexdump.hexdump(tmp)

try_str(b"<description>")
try_str(b"</description>")

Many noise blocks to be printed, but oops, something new:

...
position 0x7f
00000000: 3C 43 4F 55 4E 54 52 59  3E 55 53 41 3C           <COUNTRY>USA<
...
position 0x155
00000000: 20 20 3C 43 4F 4D 50 41  4E 59 3E 43 42             <COMPANY>CB
...
position 0x1a4
00000000: 33 20 3C 2F 43 44 3E 0A  20 20 3C 43 44           3 </CD>.  <CD
...
position 0x22c
00000000: 20 20 20 20 3C 50 52 49  43 45 3E 39 2E               <PRICE>9.
...
position 0x28d
00000000: 56 3E 0A 20 20 20 20 3C  41 52 54 49 53           V>.    <ARTIS
...
position 0x306
00000000: 45 3E 0A 20 20 20 20 3C  59 45 41 52 3E           E>.    <YEAR>
...
position 0x37d
00000000: 53 0D 4D 48 5E 4F 69 70  74 69 6F 6E 3E           S.MH^Oiption>
...

This information about new XML tags can be used again.

...
position 0x149
00000000: 3C 70 72 69 63 65 3E 0A  20                       <price>.
...
position 0x207
00000000: 2F 2F 6E 61 6D 65 3E 0A  20                       //name>.
...
position 0x8b
00000000: 3E 54 77 6F 20 6F 66 20  6F 75                    >Two of ou
...
position 0x42c
00000000: 61 6C 6F 72 69 65 73 3E  39 35                    alories>95
...
position 0x9a
00000000: 6F 75 73 20 42 65 6C 67  69                       ous Belgi
position 0xac
00000000: 60 20 77 69 74 68 20 70  6C                       ` with pl
...
position 0x395
00000000: 79 6C 65 20 42 72 65 61  6B 66                    yle Breakf
...
position 0x11b
00000000: 62 65 72 72 79 20 42 65                           berry Be
...
position 0x403
00000000: 2D 70 6F 70 75 6C 61 72                           -popular
position 0x414
00000000: 64 6E 73 3C 2F 64 65 73                           dns</des
...
position 0x1ec
00000000: 42 65 72 72 79 2D 42 65  72                       Berry-Ber
...
position 0x402
00000000: 6E 3E 70 6F 70 75 6C 61  72                       n>popular
...
position 0x19c
00000000: 64 20 63 72 65 61 6D                              d cream
...

You see, some English words are surfaced. By the way, you can try popular English words as well. With some effort, you can extract much more information.

Two English plain texts

def encrypt(msg):
    KEY=b'very secret key!'
    ctr = Crypto.Util.Counter.new(128, little_endian=False, initial_value=123456789012345678912345678900)
    t = Crypto.Cipher.AES.new(KEY, Crypto.Cipher.AES.MODE_CTR, counter = ctr)
    return t.encrypt(msg)

# from https://www.gutenberg.org/files/1661/1661-0.txt
msg1=b"Well, I followed you to your door, and so made sure that I was really an object of interest to the celebrated Mr. Sherlock Holmes."
msg2=b"We both thought the best resource was flight, when pursued by so formidable an antagonist; so you will find the nest empty when you"

# from https://stackoverflow.com/questions/29408173/byte-operations-xor-in-python
def XOR_two_blocks(x, y):
    return bytes(a ^ b for a, b in zip(x, y))

c1=encrypt(msg1)
c2=encrypt(msg2)
hexdump.hexdump (XOR_two_blocks(c1, c2))
00000000: 00 00 4C 0E 43 54 21 00  12 07 03 19 08 1F 11 44  ..L.CT!........D
00000010: 54 11 0A 55 42 11 1C 54  59 1D 10 01 4F 11 1D 0C  T..UB..TY...O...
00000020: 17 0C 57 00 1D 44 46 1F  06 47 05 15 48 45 57 1B  ..W..DF..G..HEW.
00000030: 10 1C 45 50 01 1A 12 01  45 2D 00 15 18 53 53 1D  ..EP....E-...SS.
00000040: 45 07 03 1E 14 49 05 0F  42 03 07 4A 04 0D 54 41  E....I..B..J..TA
...

This has the following feature: since all Latin-1 texts missing 7th bit, XORed ciphertexts will miss it too. BTW, notice two zero bytes at the beginning. Indeed, both plain texts starts with "We".

Two Russian plain texts

A Cyrillic letter, when encoded in UTF-8, has the following format of a pair: 'D0 xx' or 'D1 xx'. So a Russian word is UTF-8-encoded as 'Dx xx Dx xx Dx xx Dx xx Dx xx ...", where Dx is D0 or D1. This information may be important.

msg1="Он говорил на том изысканном французском языке, на котором не только говорили, но и думали наши деды, и с теми тихими, покровительственными интонациями,
которые свойственны состаревшемуся в свете и при дворе значительному человеку. Он подошел к Анне Павловне, поцеловал ее руку, подставив ей свою надушенную и сияющую лысину, и покойно уселся на диване."
msg2="Все счастливые семьи похожи друг на друга, каждая несчастливая семья несчастлива по-своему. Все смешалось в доме Облонских. Жена узнала, что муж был в связи с бывшею в их доме француженкою-гувернанткой, и объявила мужу, что не может жить с ним в одном доме. Положение это продолжалось уже третий день и мучительно чувствовалось и самими супругами,"

# from https://stackoverflow.com/questions/29408173/byte-operations-xor-in-python
def XOR_two_blocks(x, y):
    return bytes(a ^ b for a, b in zip(x, y))

c1=encrypt(msg1.encode("utf-8"))
c2=encrypt(msg2.encode("utf-8"))
hexdump.hexdump (XOR_two_blocks(c1, c2))
00000000: 00 0C 01 3C F0 65 93 01  3F 01 35 00 0E 00 01 01  .....e..?.5.....
00000010: 3A 00 00 F0 68 6D 62 61  AB 01 37 F0 6F 51 6C 95  :...hmba..7.oQl.
00000020: 00 04 01 3B 01 33 F1 51  6F 6A 6E 61 55 6D 6E 6D  ...;.3.QojnaUmnm
00000030: 66 6E 68 9C F0 65 55 51  51 53 60 63 9D 01 3B 01  fnh..eUQQS`c..;.
00000040: 33 F0 67 65 50 50 6B 53  6E 63 6C 90 FD AF 00 0D  3.gePPkSncl.....
00000050: 01 3B 00 0C 00 01 FC 90  01 32 F0 60 9D 00 0F 01  .;.......2.`....
00000060: 3F 00 05 00 0E 00 01 01  3C 00 07 F0 68 6D 62 65  ?...........hmbe
00000070: 90 00 0D F0 6F 51 6B 64  5C 6C 6B 5C 6F AF F0 63  ....oQkd\lk\o..c
00000080: 6D 6E 65 63 51 6F 56 50  60 69 51 6A 52 68 97 F0  mnecQoVP`iQjRh..
...

You'll see that there are many long strings, consisting of pairs like '00 xx' or '01 xx', where xx has two high bits stripped, i.e., xx<=0x3f. Indeed, two-bytes UTF-8 encoding is: '110xxxxx 10xxxxxx'. This informatiom may be important. We can deduce that at these positions there are Russian words, in both plain texts. These 'strings' are interrupted by other bytes -- that may mean that there are spaces, punctuation marks, Latin-1 characters. Or, two plain texts are 'out of sync' -- first UTF-8 pair is encoded at odd positions, second pair at even.

Other writing systems may require 3-byte UTF-8, and analysis may be even easier.

Moral of the story

An attacker can record all your traffic with a hope you'll once reuse nonce, maybe, mistakenly. He/she can patiently wait for years. And just try all packet pairs to find something familiar, like English words or magic numbers, etc. No significant computational resources are required to XOR two blocks and find strings in results.

This problem is similar to using one-time pad:

In particular, one-time use is absolutely necessary. If a one-time pad is used just twice, simple mathematical operations can reduce it to a running key cipher.
...
If both plaintexts are in a natural language (e.g., English or Russian), each stands a very high chance of being recovered by heuristic cryptanalysis, with possibly a few ambiguities. Of course, a longer message can only be broken for the portion that overlaps a shorter message, plus perhaps a little more by completing a word or phrase. The most famous exploit of this vulnerability occurred with the Venona project.

( https://en.wikipedia.org/wiki/One-time_pad#True_randomness )

All the files

Here

(the post first published at 20220831.)


List of my other blog posts.

Yes, I know about these lousy Disqus ads. Please use adblocker. I would consider to subscribe to 'pro' version of Disqus if the signal/noise ratio in comments would be good enough.