To understand CTR mode, keep in mind this scheme from Wikipedia.
When the same key and nonce is used, ciphertext1 XOR ciphertext2 = plaintext1 XOR plaintext2.
First example. There are two messages. One is in plain English. Second is what I call 'sparse packet' -- many zero bytes 'on background' with spots of random data. Packets like these are abundant in binary protocols and file formats. Including SSH. Both messages are to be encrypted with the same key and nonce.
def encrypt(msg): KEY=b'very secret key!' ctr = Crypto.Util.Counter.new(128, little_endian=False, initial_value=123456789012345678912345678900) t = Crypto.Cipher.AES.new(KEY, Crypto.Cipher.AES.MODE_CTR, counter = ctr) return t.encrypt(msg) # from https://www.gutenberg.org/files/1661/1661-0.txt msg1=b"Well, I followed you to your door, and so made sure that I was really an object of interest to the celebrated Mr. Sherlock Holmes." c1=encrypt(msg1) print (c1) def zero_string(x): return b"\x00"*x msg2=b"" msg2=msg2+os.urandom(4)+zero_string(4) msg2=msg2+os.urandom(8)+zero_string(16) msg2=msg2+os.urandom(4)+zero_string(3) msg2=msg2+os.urandom(5)+zero_string(16) msg2=msg2+os.urandom(4)+zero_string(10) msg2=msg2+os.urandom(1)+zero_string(12) msg2=msg2+os.urandom(4)+zero_string(4) msg2=msg2+os.urandom(3)+zero_string(32) hexdump.hexdump(msg2) c2=encrypt(msg2) print (c2)
The resulting 'random' 'sparse' block is:
00000000: BB 1F 7E B7 00 00 00 00 A3 1C 68 E6 1D 50 D0 82 ..~.......h..P.. 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000020: D4 5D 4F C8 00 00 00 B2 43 6C 2A 12 00 00 00 00 .]O.....Cl*..... 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 CF ED 9D 7D ...............} 00000040: 00 00 00 00 00 00 00 00 00 00 72 00 00 00 00 00 ..........r..... 00000050: 00 00 00 00 00 00 00 D3 41 E8 41 00 00 00 00 B8 ........A.A..... 00000060: 23 70 00 00 00 00 00 00 00 00 00 00 00 00 00 00 #p.............. 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000080: 00 00 ..
Now let the fun commence. Here we have only two ciphertexts. Nothing else! We don't know key, nonce. Heck, we don't even know encryption algorithm! We just XOR two ciphertexts and dump info.
#!/usr/bin/env python3 # https://pypi.org/project/hexdump/ import hexdump c1=b'VC\xdb\x8d\xd5\x9b\x95z\x07\x83&\xf9\xbfU1\xb9\x84\xca\xc3\x9e\x06\x9e\x14\xb7}\x16\xe8\x98\xa9\x90!X\xc3;*D2\xb1~\xf6\xb9\xc8{\xd8\xb0B\x92\x8dp\xe5a\xfb\x95fg4l\xb3\x86c\xe1\xab\xc7\xfda\x9c\xdfO\xfc=u\xb9\xff?q.\x03\xd4\xb0\x1c\x80!\x9f\xe5\xdfC\x02\xec\r\xee\xd8n\xba\xe6\xbd\xe0M5|\x1e\xf2\x86Oi\xc8\xa6\xf3\xdad\nwS\xeb_\xecs\x8a\xe7_M\xcc\x15\x1e\xab\xbe#R\xdb\xed=' c2=b'!\xde\x9b!\xf9\xbb\xdcZP\xf5\xad\x88X\x85\xf8"\xa4\xb3\xac\xeb&\xea{\x97\x04y\x9d\xea\x89\xf4N76o\xde9\\\xd5^\xcd\xb4p\xbf\x94\xd4\'\xb2\xfe\x05\x97\x04\xdb\xe1\x0e\x06@L\xfa\xa6\x14\xca>C<\x04\xfd\xb3#\x85\x1d\x14\xd7\xdfP|Df\xb7\xc4<\xefG\xbf\x8c\xb17g@@\x00\x9fN\xce\x89\x9d!\xc3k\\}\x97\xea*\x0b\xba\xc7\x87\xbf\x00*:!\xc5\x7f\xbf\x1b\xef\x953"\xaf~>\xe3\xd1O?\xbe\x9e\x13' # from https://stackoverflow.com/questions/29408173/byte-operations-xor-in-python def XOR_two_blocks(x, y): return bytes(a ^ b for a, b in zip(x, y)) xored=XOR_two_blocks(c1, c2) hexdump.hexdump(xored)
And here we can see spots of English texts, at places where zero strings in second message are located:
00000000: 77 9D 40 AC 2C 20 49 20 57 76 8B 71 E7 D0 C9 9B w.@., I Wv.q.... 00000010: 20 79 6F 75 20 74 6F 20 79 6F 75 72 20 64 6F 6F you to your doo 00000020: F5 54 F4 7D 6E 64 20 3B 0D B8 C4 4C 64 65 20 73 .T.}nd ;...Lde s 00000030: 75 72 65 20 74 68 61 74 20 49 20 77 2B 95 84 C1 ure that I w+... 00000040: 65 61 6C 6C 79 20 61 6E 20 6F 0D 6A 65 63 74 20 eally an o.ject 00000050: 6F 66 20 69 6E 74 65 AC 4D EE 47 20 74 6F 20 C1 of inte.M.G to . 00000060: 8E 5E 20 63 65 6C 65 62 72 61 74 65 64 20 4D 72 .^ celebrated Mr 00000070: 2E 20 53 68 65 72 6C 6F 63 6B 20 48 6F 6C 6D 65 . Sherlock Holme 00000080: 73 2E s.
Got two XML files: 1, 2. First, please take a glance on them. Now encrypt them both with the same key and nonce.
XOR two and dump:
00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000020: 00 00 00 00 00 00 00 00 21 33 31 20 27 29 26 4D ........!31 ')&M 00000030: 7E 7F 4D 59 2D 31 00 00 00 00 1C 46 53 3B 2D 6A ~.MY-1.....FS;-j 00000040: 46 65 1E 65 4D 4C 07 13 08 45 7C 37 17 00 02 1A Fe.eML...E|7.... 00000050: 10 1B 45 6B 4E 32 2F 38 29 36 02 25 4E 41 4D 45 ..EkN2/8)6.%NAME 00000060: 02 4B 72 74 69 73 68 4E 30 06 01 45 7A 5D 59 4F .KrtishN0..Ez]YO 00000070: 57 09 13 6E 22 26 20 30 31 00 00 00 00 00 00 00 W..n"& 01....... ...
Nothing interesting so far, but you see a zero header. Why it's here? This is a sign that two plain texts have the same header. And this is true. The header is: "<?xml version="1.0" encoding="UTF-8"?>" plus another opening "<" at the second line. This information may me significant.
Oh, and you see the "NAME" word in clear. This means that one XML has "name" word at this position, and another -- four spaces at the same position. A lowercase character XORed with 0x20 will result in the same character in uppercase. I.e., XOR 0x20 operation switch case of Latin-1 character.
Now we may suspect that XML files have some popular tags. Maybe, "<description>". We will try to XOR this string at all possible positions.
c1=read_bin_file(sys.argv[1]) c2=read_bin_file(sys.argv[2]) # from https://stackoverflow.com/questions/29408173/byte-operations-xor-in-python def XOR_two_blocks(x, y): return bytes(a ^ b for a, b in zip(x, y)) def my_is_printable(c): return (c>=0x20 and c<=0x7E) or c==0xa or c==0xd def all_printable(buf): return all(map(my_is_printable, buf)) xored=XOR_two_blocks(c1, c2) hexdump.hexdump(xored) def try_str(str_to_try): print ("trying string: ", str_to_try) for i in range(len(xored)-len(str_to_try)): tmp=XOR_two_blocks(xored[i:i+len(str_to_try)], str_to_try) if all_printable(tmp): print ("position 0x%x" % i) hexdump.hexdump(tmp) try_str(b"<description>") try_str(b"</description>")
Many noise blocks to be printed, but oops, something new:
... position 0x7f 00000000: 3C 43 4F 55 4E 54 52 59 3E 55 53 41 3C <COUNTRY>USA< ... position 0x155 00000000: 20 20 3C 43 4F 4D 50 41 4E 59 3E 43 42 <COMPANY>CB ... position 0x1a4 00000000: 33 20 3C 2F 43 44 3E 0A 20 20 3C 43 44 3 </CD>. <CD ... position 0x22c 00000000: 20 20 20 20 3C 50 52 49 43 45 3E 39 2E <PRICE>9. ... position 0x28d 00000000: 56 3E 0A 20 20 20 20 3C 41 52 54 49 53 V>. <ARTIS ... position 0x306 00000000: 45 3E 0A 20 20 20 20 3C 59 45 41 52 3E E>. <YEAR> ... position 0x37d 00000000: 53 0D 4D 48 5E 4F 69 70 74 69 6F 6E 3E S.MH^Oiption> ...
This information about new XML tags can be used again.
... position 0x149 00000000: 3C 70 72 69 63 65 3E 0A 20 <price>. ... position 0x207 00000000: 2F 2F 6E 61 6D 65 3E 0A 20 //name>. ... position 0x8b 00000000: 3E 54 77 6F 20 6F 66 20 6F 75 >Two of ou ... position 0x42c 00000000: 61 6C 6F 72 69 65 73 3E 39 35 alories>95 ... position 0x9a 00000000: 6F 75 73 20 42 65 6C 67 69 ous Belgi position 0xac 00000000: 60 20 77 69 74 68 20 70 6C ` with pl ... position 0x395 00000000: 79 6C 65 20 42 72 65 61 6B 66 yle Breakf ... position 0x11b 00000000: 62 65 72 72 79 20 42 65 berry Be ... position 0x403 00000000: 2D 70 6F 70 75 6C 61 72 -popular position 0x414 00000000: 64 6E 73 3C 2F 64 65 73 dns</des ... position 0x1ec 00000000: 42 65 72 72 79 2D 42 65 72 Berry-Ber ... position 0x402 00000000: 6E 3E 70 6F 70 75 6C 61 72 n>popular ... position 0x19c 00000000: 64 20 63 72 65 61 6D d cream ...
You see, some English words are surfaced. By the way, you can try popular English words as well. With some effort, you can extract much more information.
def encrypt(msg): KEY=b'very secret key!' ctr = Crypto.Util.Counter.new(128, little_endian=False, initial_value=123456789012345678912345678900) t = Crypto.Cipher.AES.new(KEY, Crypto.Cipher.AES.MODE_CTR, counter = ctr) return t.encrypt(msg) # from https://www.gutenberg.org/files/1661/1661-0.txt msg1=b"Well, I followed you to your door, and so made sure that I was really an object of interest to the celebrated Mr. Sherlock Holmes." msg2=b"We both thought the best resource was flight, when pursued by so formidable an antagonist; so you will find the nest empty when you" # from https://stackoverflow.com/questions/29408173/byte-operations-xor-in-python def XOR_two_blocks(x, y): return bytes(a ^ b for a, b in zip(x, y)) c1=encrypt(msg1) c2=encrypt(msg2) hexdump.hexdump (XOR_two_blocks(c1, c2))
00000000: 00 00 4C 0E 43 54 21 00 12 07 03 19 08 1F 11 44 ..L.CT!........D 00000010: 54 11 0A 55 42 11 1C 54 59 1D 10 01 4F 11 1D 0C T..UB..TY...O... 00000020: 17 0C 57 00 1D 44 46 1F 06 47 05 15 48 45 57 1B ..W..DF..G..HEW. 00000030: 10 1C 45 50 01 1A 12 01 45 2D 00 15 18 53 53 1D ..EP....E-...SS. 00000040: 45 07 03 1E 14 49 05 0F 42 03 07 4A 04 0D 54 41 E....I..B..J..TA ...
This has the following feature: since all Latin-1 texts missing 7th bit, XORed ciphertexts will miss it too. BTW, notice two zero bytes at the beginning. Indeed, both plain texts starts with "We".
A Cyrillic letter, when encoded in UTF-8, has the following format of a pair: 'D0 xx' or 'D1 xx'. So a Russian word is UTF-8-encoded as 'Dx xx Dx xx Dx xx Dx xx Dx xx ...", where Dx is D0 or D1. This information may be important.
msg1="Он говорил на том изысканном французском языке, на котором не только говорили, но и думали наши деды, и с теми тихими, покровительственными интонациями, которые свойственны состаревшемуся в свете и при дворе значительному человеку. Он подошел к Анне Павловне, поцеловал ее руку, подставив ей свою надушенную и сияющую лысину, и покойно уселся на диване." msg2="Все счастливые семьи похожи друг на друга, каждая несчастливая семья несчастлива по-своему. Все смешалось в доме Облонских. Жена узнала, что муж был в связи с бывшею в их доме француженкою-гувернанткой, и объявила мужу, что не может жить с ним в одном доме. Положение это продолжалось уже третий день и мучительно чувствовалось и самими супругами," # from https://stackoverflow.com/questions/29408173/byte-operations-xor-in-python def XOR_two_blocks(x, y): return bytes(a ^ b for a, b in zip(x, y)) c1=encrypt(msg1.encode("utf-8")) c2=encrypt(msg2.encode("utf-8")) hexdump.hexdump (XOR_two_blocks(c1, c2))
00000000: 00 0C 01 3C F0 65 93 01 3F 01 35 00 0E 00 01 01 .....e..?.5..... 00000010: 3A 00 00 F0 68 6D 62 61 AB 01 37 F0 6F 51 6C 95 :...hmba..7.oQl. 00000020: 00 04 01 3B 01 33 F1 51 6F 6A 6E 61 55 6D 6E 6D ...;.3.QojnaUmnm 00000030: 66 6E 68 9C F0 65 55 51 51 53 60 63 9D 01 3B 01 fnh..eUQQS`c..;. 00000040: 33 F0 67 65 50 50 6B 53 6E 63 6C 90 FD AF 00 0D 3.gePPkSncl..... 00000050: 01 3B 00 0C 00 01 FC 90 01 32 F0 60 9D 00 0F 01 .;.......2.`.... 00000060: 3F 00 05 00 0E 00 01 01 3C 00 07 F0 68 6D 62 65 ?...........hmbe 00000070: 90 00 0D F0 6F 51 6B 64 5C 6C 6B 5C 6F AF F0 63 ....oQkd\lk\o..c 00000080: 6D 6E 65 63 51 6F 56 50 60 69 51 6A 52 68 97 F0 mnecQoVP`iQjRh.. ...
You'll see that there are many long strings, consisting of pairs like '00 xx' or '01 xx', where xx has two high bits stripped, i.e., xx<=0x3f. Indeed, two-bytes UTF-8 encoding is: '110xxxxx 10xxxxxx'. This informatiom may be important. We can deduce that at these positions there are Russian words, in both plain texts. These 'strings' are interrupted by other bytes -- that may mean that there are spaces, punctuation marks, Latin-1 characters. Or, two plain texts are 'out of sync' -- first UTF-8 pair is encoded at odd positions, second pair at even.
Other writing systems may require 3-byte UTF-8, and analysis may be even easier.
An attacker can record all your traffic with a hope you'll once reuse nonce, maybe, mistakenly. He/she can patiently wait for years. And just try all packet pairs to find something familiar, like English words or magic numbers, etc. No significant computational resources are required to XOR two blocks and find strings in results.
This problem is similar to using one-time pad:
In particular, one-time use is absolutely necessary. If a one-time pad is used just twice, simple mathematical operations can reduce it to a running key cipher. ... If both plaintexts are in a natural language (e.g., English or Russian), each stands a very high chance of being recovered by heuristic cryptanalysis, with possibly a few ambiguities. Of course, a longer message can only be broken for the portion that overlaps a shorter message, plus perhaps a little more by completing a word or phrase. The most famous exploit of this vulnerability occurred with the Venona project.
( https://en.wikipedia.org/wiki/One-time_pad#True_randomness )
Yes, I know about these lousy Disqus ads. Please use adblocker. I would consider to subscribe to 'pro' version of Disqus if the signal/noise ratio in comments would be good enough.