I had to implement it by myself:
Whenever you call a function like finalize() or get_digest(), you think you get a hash of a buffer. But this can be rephrased. Actually, you get an internal state of hash function. It simply, copies all bits/bytes, as is.
And there was no problem with that. But this property lead to unexpected and undesired outcome: that state can be restored back to SHA1 function. And then you can continue hashing process. Without even knowing, what exactly has been hashed before.
Another little known fact: these hash functions use padding. When you call finalize function, the input buffer is aligned by 64-byte boundary. Or padded. The first byte of padding is 0x80. The last 8 bytes (64 bits) is replaced by count of bits processed. Other bytes are 0x00.
Let's see a usual SHA1 final function:
/* * SHA1 finalization. Ends an SHA1 message-digest operation, writing the * the message digest and zeroizing the context. */ void SHA1Final(void *digest, SHA1_CTX *context) { unsigned char bits[8]; u_int32_t index = (context->bcount[1] >> 3) & 0x3f; /* Save number of bits */ Encode(bits, context->bcount, 8); /* Pad out to 56 mod 64. */ SHA1Update(context, PADDING, ((index < 56) ? 56 : 120) - index); /* Append length (before padding) */ SHA1Update(context, bits, 8); /* Store state in digest */ Encode(digest, context->state, 20); /* Zeroize sensitive information. */ memset(context, 0, sizeof (*context)); }
( src )
You see, number of bits saved, padding added, and state is copied to 'digest' function argument.
Given that, I'll call the internal SHA1 function (that doesn't use padding) as sha1_core(). This function is very much like the pseudocode from Wikipedia, where padding operation is omitted.
Then I found some toyish pure Python implementation of SHA1 and reworked it a bit. I divided it to sha1_core() and padding operation. Also, I added another feature -- a state can be loaded during initialization.
#!/usr/bin/env python3 import struct, hexdump, hashlib, binascii # Stolen from https://codereview.stackexchange.com/questions/37648/python-implementation-of-sha1 # And reworked def sha1_core(init_hash, padded_data): print ("sha1_core(), loading state", init_hash) assert len(init_hash)==8*5 idx=0 h0=int(init_hash[idx:idx+8], 16) idx=idx+8 h1=int(init_hash[idx:idx+8], 16) idx=idx+8 h2=int(init_hash[idx:idx+8], 16) idx=idx+8 h3=int(init_hash[idx:idx+8], 16) idx=idx+8 h4=int(init_hash[idx:idx+8], 16) assert '%08x%08x%08x%08x%08x' % (h0, h1, h2, h3, h4) == init_hash def rol(n, b): return ((n << b) | (n >> (32 - b))) & 0xffffffff print ("sha1_core(), padded_data:") hexdump.hexdump(padded_data) print ("sha1_core(), len(padded_data)", len(padded_data)) thunks = [padded_data[i:i+64] for i in range(0, len(padded_data), 64)] for thunk in thunks: w = list(struct.unpack('>16L', thunk)) + [0] * 64 for i in range(16, 80): w[i] = rol((w[i-3] ^ w[i-8] ^ w[i-14] ^ w[i-16]), 1) a, b, c, d, e = h0, h1, h2, h3, h4 # Main loop for i in range(0, 80): if 0 <= i < 20: f = (b & c) | ((~b) & d) k = 0x5A827999 elif 20 <= i < 40: f = b ^ c ^ d k = 0x6ED9EBA1 elif 40 <= i < 60: f = (b & c) | (b & d) | (c & d) k = 0x8F1BBCDC elif 60 <= i < 80: f = b ^ c ^ d k = 0xCA62C1D6 a, b, c, d, e = rol(a, 5) + f + e + k + w[i] & 0xffffffff, \ a, rol(b, 30), c, d h0 = h0 + a & 0xffffffff h1 = h1 + b & 0xffffffff h2 = h2 + c & 0xffffffff h3 = h3 + d & 0xffffffff h4 = h4 + e & 0xffffffff result='%08x%08x%08x%08x%08x' % (h0, h1, h2, h3, h4) print ("sha1_core() result", result) return result # from http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-3.html (3.3) def align2grain (i, grain): return ((i + grain-1) & ~(grain-1)) def calc_pad_bytes_to_add(size): return align2grain(size, 64)-size def add_padding(data, size): return data + b"\x80" + b"\x00"*(calc_pad_bytes_to_add(size)-(8+1)) + struct.pack('>Q', 8 * size) def sha1(init_hash, data): return sha1_core(init_hash, add_padding(data, len(data))) sha1_standard_init_state="67452301efcdab8998badcfe10325476c3d2e1f0" # self-test # TODO try longer strings TEST_MSG=b"hello, world" m=hashlib.sha1() buf_to_hash=TEST_MSG hexdump.hexdump(buf_to_hash) m.update(buf_to_hash) hash1=m.hexdigest() hash2=sha1(sha1_standard_init_state, TEST_MSG) assert hash1==hash2
Take a look on what is hashed by sha1_core() instead of the 'hello, world' string:
sha1_core(), loading state 67452301efcdab8998badcfe10325476c3d2e1f0 sha1_core(), padded_data: 00000000: 68 65 6C 6C 6F 2C 20 77 6F 72 6C 64 80 00 00 00 hello, world.... 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 60 ...............` sha1_core(), len(padded_data) 64 sha1_core() result b7e23ec29af22b0b4e41da31e868d57226121c84
You see, the input message is aligned by 64 (0x40) bytes. You see the 0x80 byte after the input string. You see the last byte 0x60 (96), which is number of bits. I.e., 96/8=12 (input) bytes.
I would say that a MAC function is kind of 'symmetric' signature (as opposed to public-key signature). In a sense that sender and receiver shares the same key, unknown to attacker.
This is how MAC can be implemented naively. A sender 'signs' a packet with MAC: H(key || message). This is also called 'the secret-prefix construction' [mentioned at least in Jean-Philippe Aumasson -- Serious Cryptography: A Practical Introduction to Modern Encryption].
A receiver checks that 'signature'.
sender.py:
#!/usr/bin/env python3 import hashlib # Secret key known only to sender and receiver: secret_key=b"MegaSuperKey" def naive_MAC(key, message): m=hashlib.sha1() m.update(key) m.update(message) return m.digest() message=b"Some plaintext visible to attacker." f=open("packet.bin", "wb") f.write(message + naive_MAC(secret_key, message)) f.close()
The output binary file may looks like:
% xxd -g 1 packet.bin 00000000: 53 6f 6d 65 20 70 6c 61 69 6e 74 65 78 74 20 76 Some plaintext v 00000010: 69 73 69 62 6c 65 20 74 6f 20 61 74 74 61 63 6b isible to attack 00000020: 65 72 2e 14 f4 35 11 18 bb 10 02 1c 31 19 92 81 er...5......1... 00000030: 72 7c c2 9c 26 99 f1
You see the original message and 20 bytes of SHA1 at the end.
receiver.py:
#!/usr/bin/env python3 import hashlib # Secret key known only to sender and receiver: secret_key=b"MegaSuperKey" def naive_MAC(key, message): m=hashlib.sha1() m.update(key) m.update(message) return m.digest() f=open("packet.bin", "rb") buf=f.read() f.close() MAC_size=20 # SHA1 # Cut off MAC: MAC_from_file=buf[-MAC_size:] message=buf[0:len(buf)-MAC_size] # Check MAC: if naive_MAC(secret_key, message)==MAC_from_file: print ("MAC is correct") else: print ("Error: MAC is NOT correct!")
Now the question -- can an attacker in middle (during MITM attack) alter the packet so that receiver wouldn't notice MAC mismatch? Without knowing the secret key?
An attacker can't alter the message, but he can add some info after it. Because the attacker knows the MAC, it can reload it to sha1_core() and continue hashing.
This code does it. It's a bit hairy, because it needs to calculate padding sizes and lengths (in bits) that are stored in two packets. But there is nothing more hard or complex.
f=open("packet.bin", "rb") buf=f.read() f.close() MAC_size=20 # SHA1 # Cut off MAC: MAC_from_file=buf[-MAC_size:] message=buf[0:len(buf)-MAC_size] secret_key_len=12 # We know message length # We also know that the secret key has 12 bytes # So we can easily reconstruct the buffer that was passed to sha1_core() # But not secret key. Here it is shown as a string of "???..." original_buf_len=secret_key_len+len(message) original_buf_padded=add_padding(b"?"*secret_key_len + message, original_buf_len) print ("original_buf_padded") hexdump.hexdump(original_buf_padded) # We know that after hashing buf via sha1_core(), its state was MAC_from_file MAC_from_file_hex=binascii.hexlify(MAC_from_file).decode("utf-8") data_we_add=b"This is added by attacker." # Load that state and hash the data we want to add # But the final padding block must contain length of the whole buffer original_buf_padded_plus_our_data_len=len(original_buf_padded)+len(data_we_add) # As of padding: 8 bytes for final 64-bit word, 1 byte for 0x80, other bytes are 0x00 # 2nd argument: length in bytes. But it's embedded in padding block as a number of bits. def construct_padding(padding_len, length_at_the_end): return b"\x80" + b"\x00"*(padding_len-(8+1)) + struct.pack('>Q', length_at_the_end*8) pad_len=calc_pad_bytes_to_add(original_buf_padded_plus_our_data_len) new_MAC=sha1_core(MAC_from_file_hex, data_we_add + construct_padding(pad_len, original_buf_padded_plus_our_data_len)) print ("new_MAC", new_MAC) # Now construct a new message: pad_len=calc_pad_bytes_to_add(secret_key_len+len(message)) original_key_and_msg_len=secret_key_len+len(message) new_buf=message + construct_padding(pad_len, original_key_and_msg_len) + data_we_add print ("new_buf") hexdump.hexdump(new_buf) f=open("packet.bin", "wb") f.write(new_buf + bytes.fromhex(new_MAC)) f.close()
The attacker knows MAC and the corresponding buffer:
original_buf_padded 00000000: 3F 3F 3F 3F 3F 3F 3F 3F 3F 3F 3F 3F 53 6F 6D 65 ????????????Some 00000010: 20 70 6C 61 69 6E 74 65 78 74 20 76 69 73 69 62 plaintext visib 00000020: 6C 65 20 74 6F 20 61 74 74 61 63 6B 65 72 2E 80 le to attacker.. 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 78 ...............x
The '???...' string is in place of unknown 12-byte secret key.
Now the attacker calculates padding for his new 'addition'. This is important -- total number of bits in this message must count also number of bits in the original block (padded).
00000000: 54 68 69 73 20 69 73 20 61 64 64 65 64 20 62 79 This is added by 00000010: 20 61 74 74 61 63 6B 65 72 2E 80 00 00 00 00 00 attacker....... 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 D0 ................
Reconstruct new buffer with new MAC (the sha1_core() state after hashing two padded buffers):
new_MAC 3d9ae2467751fe8c8eaf64e1ae2e6f24d0338da5 new_buf 00000000: 53 6F 6D 65 20 70 6C 61 69 6E 74 65 78 74 20 76 Some plaintext v 00000010: 69 73 69 62 6C 65 20 74 6F 20 61 74 74 61 63 6B isible to attack 00000020: 65 72 2E 80 00 00 00 00 00 00 00 00 00 00 00 00 er.............. 00000030: 00 00 01 78 54 68 69 73 20 69 73 20 61 64 64 65 ...xThis is adde 00000040: 64 20 62 79 20 61 74 74 61 63 6B 65 72 2E d by attacker.
The resulting packet.bin file:
% xxd -g 1 packet.bin 00000000: 53 6f 6d 65 20 70 6c 61 69 6e 74 65 78 74 20 76 Some plaintext v 00000010: 69 73 69 62 6c 65 20 74 6f 20 61 74 74 61 63 6b isible to attack 00000020: 65 72 2e 80 00 00 00 00 00 00 00 00 00 00 00 00 er.............. 00000030: 00 00 01 78 54 68 69 73 20 69 73 20 61 64 64 65 ...xThis is adde 00000040: 64 20 62 79 20 61 74 74 61 63 6b 65 72 2e 3d 9a d by attacker.=. 00000050: e2 46 77 51 fe 8c 8e af 64 e1 ae 2e 6f 24 d0 33 .FwQ....d...o$.3 00000060: 8d a5
The receiver.py doesn't report any error!
Yes, padding had to be added. This cannot be overcome. However, the attacker can insert anything after the padding.
Funny thing, you can extend recursively. By running extend.py several times. I did so -- for testing. It must work.
At least three ways.
1. Hash truncation. Truncate output hash and the attacker couldn't reload the state. But brute-force is still theoretically possible. Truncated versions of SHA2 are: SHA-224, SHA-384, SHA-512/224 and SHA-512/256.
2. SHA3 (Keccak). Protected by design.
3. 'Serious' MAC function. Can be used with older hash functions. Today, HMAC is popular. Let's see how it's done (in basic form):
H(key || H(key || message))
( Wikipedia )
H() function called two times. And there is nothing to 'extend'. HMAC function looks so weird also because it is constructed to protect against this attack.
Among major high-profile protocols, HMAC is used at least in SSH and TLS, to 'sign' packets.
LEA is the attack against 'naive' MAC of the form: H(key || message).
Vulnerable hash functions: MD5, SHA1, SHA2 (untruncated).
You see, my code works on assumption that the secret key has 12 bytes. If the attacker don't know this for sure, it can just try all key sizes -- this is trivial.
I found this reading helpful: 1, 2.
Here. Python code.
Yes, I know about these lousy Disqus ads. Please use adblocker. I would consider to subscribe to 'pro' version of Disqus if the signal/noise ratio in comments would be good enough.