[Crypto] Ethereum keystore - what is in it?

This is my notes about Ethereum keystore. I just created a key (or address) with the password '1234'. Feel free to use it, but there is no ETH on this address.

 % ./geth account new
...
Your new account is locked with a password. Please give a password. Do not forget this password.
Password:
Repeat password:

Your new key was generated

Public address of the key:   0xEC1dC9454d4727294CFB22c341F5560F5293b974
Path of the secret key file: /home/i/.ethereum/keystore/UTC--2022-11-30T18-28-00.028017872Z--ec1dc9454d4727294cfb22c341f5560f5293b974

...

What is in this file?

{"address":"ec1dc9454d4727294cfb22c341f5560f5293b974","crypto":{"cipher":"aes-128-ctr","ciphertext":"33391ffea38bffa499221c34f15ebd8802544c1e357c545044235aa1e2e5d18c","cipherparams":{"iv":"796d401810173d1b2ada5ced48018e14"},"kdf":"scrypt","kdfparams":{"dklen":32,"n":262144,"p":1,"r":8,"salt":"2b4615e610fe3eada89c884fbb34f4894fdd348345dcac43ada4062041ca5176"},"mac":"fe231144df6d3568ee5caecd4c62ac9d0169af99c55e43937aa626ca12ff7eb4"},"id":"ff9f7c5d-d65c-4aa2-9374-d56d4f37a682","version":3}

Tidy version:

% cat UTC--2022-11-30T18-28-00.028017872Z--ec1dc9454d4727294cfb22c341f5560f5293b974 | jq
{
  "address": "ec1dc9454d4727294cfb22c341f5560f5293b974",
  "crypto": {
    "cipher": "aes-128-ctr",
    "ciphertext": "33391ffea38bffa499221c34f15ebd8802544c1e357c545044235aa1e2e5d18c",
    "cipherparams": {
      "iv": "796d401810173d1b2ada5ced48018e14"
    },
    "kdf": "scrypt",
    "kdfparams": {
      "dklen": 32,
      "n": 262144,
      "p": 1,
      "r": 8,
      "salt": "2b4615e610fe3eada89c884fbb34f4894fdd348345dcac43ada4062041ca5176"
    },
    "mac": "fe231144df6d3568ee5caecd4c62ac9d0169af99c55e43937aa626ca12ff7eb4"
  },
  "id": "ff9f7c5d-d65c-4aa2-9374-d56d4f37a682",
  "version": 3
}

Let's see what each field means.

scrypt

You probably heard about GPU/FPGA/ASIC mining rigs. SHA1/SHA2 algorithm works faster to these platforms, because it can be parallelized easily.

Here comes to play so called 'memory-hard functions' (MHFs). They use RAM extensively.

All these devices - CPU, GPU, FPGA and ASIC can use DDR RAM without problems. But! RAM accesses cannot be parallelized. RAM becomes a bottleneck. No matter how faster your GPU/FPGA/ASIC is, your DDR RAM has the same performance as if attached to mainstream CPU.

This levels all CPU/GPU/FPGA/ASIC owners/users. Basically, this is protection against brute-force.

Well-known examples are scrypt and Argon2.

Let's measure scrypt performance:

#!/usr/bin/env python3

import hashlib

def key(password, data):
    key = hashlib.scrypt(
        bytes(password, 'utf-8'),
        salt=bytes("12345", "utf-8"),
        n=262144,
        r=8,
        p=1,
        maxmem=2000000000,
        dklen=32
        )
    return key

for _ in range(60):
    key("password", "random data")

My venerable Intel(R) Xeon(R) CPU E31220 @ 3.10GHz spend one second to call scrypt. Of course, this is Python. Pure C version would be much faster. But you got the idea.

BTW, this is why you feel lag when you 'unlock' your account in Geth.

Unpacking keystore file

My code is reworked version of the code by David Egan.

#!/usr/bin/env python3

import hashlib
import sys, json
from getpass import getpass
from Crypto.Cipher import AES
from Crypto.Util import Counter
from Crypto.Hash import keccak

def password_to_key(password, scrypt_params):
    key = hashlib.scrypt(
        bytes(password, 'utf-8'),
        salt=bytes.fromhex(scrypt_params["salt"]),
        n=scrypt_params["n"],
        r=scrypt_params["r"],
        p=scrypt_params["p"],
        maxmem=2000000000,
        dklen=scrypt_params["dklen"]
        )
    return key  

def verify_key(key, ciphertext, mac):
    validate = key[16:] + bytes.fromhex(ciphertext)
    k=keccak.new(digest_bits=256)
    k.update(validate)
    return mac == k.hexdigest()

def read_json(filename):
    with open(filename) as f_in:
        return(json.load(f_in))

def main(filename):
    json = read_json(filename)
    data = json["crypto"]
    password = getpass()
    #password="1234"
    k=password_to_key(password, data["kdfparams"])
    print ("key=", k.hex())
    if (verify_key(k, data["ciphertext"], data["mac"])):
        print("Password verified.")
        iv_int = int(data["cipherparams"]["iv"], 16)
        ctr = Counter.new(AES.block_size * 8, initial_value=iv_int)
        dec_suite = AES.new(k[:16], AES.MODE_CTR, counter=ctr)
        decrypted_private_key = dec_suite.decrypt(bytes.fromhex(data["ciphertext"]))
        print("Private key:", decrypted_private_key.hex())
    else:
        print("Password NOT verified.")

filename = sys.argv[1]
main(filename)

Basically, the password is 'crunched' via scrypt. Resulting hash is used as a key for AES-128 in CTR mode.

You then decrypt 'ciphertext' using AES-128 and IV mentioned in JSON file. Decrypted plaintext is a private key for your Ethereum address.

But how you can be sure that the password is correct? You can decrypt 'ciphertext' with any key. How to be sure?

Key (or hash) obtained using scrypt plus 'ciphertext' is hashed using SHA-3 (or Keccak). Resulting hash is called MAC. MAC is like checksum. Here it is stored in JSON file, as a checksum. Since, 'ciphertext' is hashed, this is 'encrypt-than-MAC' scheme: you can check MAC before decrypting 'ciphertext'. Other 'checksum' algorithms may be much more vulnerable and some information about encrypted private key may be leaked.

Let's run it with our JSON key:

% ./keystore_decrypt.py UTC--2022-11-30T18-28-00.028017872Z--ec1dc9454d4727294cfb22c341f5560f5293b974
Password:
key= 6cd27944e3a46ec32e853f46f9ce5a7bc0681bc1a7268718324641d0d56b58a7
Password verified.
Private key: 9a1fee826569e2b9c53ca973d031a90f50d70b78689d58a3c2b76bf7da32cded

This is the private key, meant to be kept in secret.

Note: you can't use password as a key for AES, for many reasons. Instead, key derivation functions (KDF) are used. Scrypt is also a KDF.

Getting more information from private key

Let's see how to extract public key and address from private key.

The 'secp256k1' EC curve is used in Ethereum. Here we get public key point (X/Y) and then convert it to a hex string. It must not be 'compressed'. ( More on 'compressed' keys. )

A part of that hex string (not the string as whole) is then hashed with SHA3 or Keccak again, to get Ethereum address.

#!/usr/bin/env python3

import sys

pri_key=sys.argv[1]

# from https://docs.ethers.io/v5/api/utils/address/
#pri_key="b976778317b23a1385ec2d483eda6904d9319135b89f1d8eee9f6d2593e2665d"

# pip install fastecdsa
import fastecdsa.keys
import fastecdsa.curve
import fastecdsa.encoding.sec1

curve = fastecdsa.curve.secp256k1
private_key_raw = int(pri_key, base=16)
pubkey = fastecdsa.keys.get_public_key(private_key_raw, curve)
print ("pub key as EC point:")
print(pubkey)
#t = fastecdsa.encoding.sec1.SEC1Encoder().encode_public_key(pubkey, compressed=True)
t = fastecdsa.encoding.sec1.SEC1Encoder().encode_public_key(pubkey, compressed=False)
print ("not compressed encoded pub key:")
print("0x" + t.hex())

# pip install pysha3
import sha3

z=sha3.keccak_256()
z.update(t[1:])
print ("sha3 of pub key:")
print (z.hexdigest()[24:])
% ./prikey_to_address.py 9a1fee826569e2b9c53ca973d031a90f50d70b78689d58a3c2b76bf7da32cded
pub key as EC point:
X: 0xa28333c0d6a8362f839676a7abe5577c520067eace17613caacd82458f340e7c
Y: 0x6ce1fb092b4605b647c52546ab8ad2d0fac89c84f66d5f34aa3d4cf653b5ad14
(On curve <secp256k1>)
not compressed encoded pub key:
0x04a28333c0d6a8362f839676a7abe5577c520067eace17613caacd82458f340e7c6ce1fb092b4605b647c52546ab8ad2d0fac89c84f66d5f34aa3d4cf653b5ad14
sha3 of pub key:
ec1dc9454d4727294cfb22c341f5560f5293b974

ec1dc9454d4727294cfb22c341f5560f5293b974 is the same address we see in JSON file.

So basically, the only one thing is in your keystore -- EC private key, protected with password.

(the post first published at 20221201.)


List of my other blog posts.

Yes, I know about these lousy Disqus ads. Please use adblocker. I would consider to subscribe to 'pro' version of Disqus if the signal/noise ratio in comments would be good enough.