[Math] Parallelizing remainder calculation over bignum

For instance, you want to calculate a remainder of some very huge bignum: x=bignum % divisor. But bignum is very huge, you want to parallelize the process.

This is kind of problem in style of Project Euler. Here I split the (random) bignum by 3 parts and calculate 3 remainders. Can you write the missing function, that will reconstruct the 'whole' remainder?

#!/usr/bin/env python3
import random

CHUNK_SIZE=10000

for _ in range(30000):
    hi=random.randint(0, CHUNK_SIZE-1)
    mi=random.randint(0, CHUNK_SIZE-1)
    lo=random.randint(0, CHUNK_SIZE-1)

    dividend=hi*(CHUNK_SIZE**2) + mi*CHUNK_SIZE + lo

    divisor=random.randint(1,CHUNK_SIZE-1)

    # calculate 3 remainders in parallel:
    rem_hi=hi % divisor
    rem_mi=mi % divisor
    rem_lo=lo % divisor

    remainder_my=reconstruct_remainder(rem_hi, rem_mi, rem_lo, divisor)

    # self-test:
    remainder_real=dividend % divisor
    assert remainder_real==remainder_my

Stop reading here. Try to work on this problem for a while. To continue reading, scroll down for the solution.

And there it is, with comments:

#!/usr/bin/env python3

def reconstruct_remainder(rem_hi, rem_mi, rem_lo, divisor):
    # all these 4 methods work

    # simplest version:
    # two remainders are just shifted left
    # in other words, 3 remainders 'stretch'
    #x=(rem_hi*((CHUNK_SIZE**2)))
    #x=x+(rem_mi*CHUNK_SIZE)
    #x=x+rem_lo
    #x=x % divisor
    #return x

    # with early modulo operation, so that all values will stay under divisor
    # this is faster
    #x=((rem_hi*((CHUNK_SIZE**2) % divisor)) % divisor)
    #x=x+((rem_mi*(CHUNK_SIZE % divisor) % divisor))
    #x=x+rem_lo
    #x=x % divisor
    #return x

    # but if chunk size known beforehand, we can precompute these constants:
    precomputed_mult_hi=(CHUNK_SIZE**2) % divisor
    precomputed_mult_mi=CHUNK_SIZE % divisor

    #x=rem_hi*precomputed_mult_hi
    #x=x+(rem_mi*precomputed_mult_mi)
    #x=x+rem_lo
    #x=x % divisor
    #return x

    # probably the most performant method
    # with early modulo operation
    # all values always stays under divisor
    x=(rem_hi*precomputed_mult_hi) % divisor
    x=x+((rem_mi*precomputed_mult_mi) % divisor)
    x=x+rem_lo
    x=x % divisor
    return x

It can run on any number of cores/threads.

Also, another fact from modulo arithmetics that helped me to solve that problem:

#!/usr/bin/env python3
import random

CHUNK_SIZE=10000

for _ in range(10000):
    hi=random.randint(0, CHUNK_SIZE-1)
    lo=random.randint(0, CHUNK_SIZE-1)

    dividend=hi*CHUNK_SIZE + lo

    divisor=random.randint(1, 50)

    remainder=dividend % divisor

    rem_hi=hi % divisor
    rem_lo=lo % divisor

    # note that. here we combine remainder from the high part with low part
    # and we can divide it by divisor
    x=(rem_hi*CHUNK_SIZE + lo) % divisor
    assert x==remainder

    x=(((rem_hi*(CHUNK_SIZE % divisor)) % divisor)+rem_lo) % divisor
    assert x==remainder

Actually, I have no need in parallelizing such an operation. But I did this for better understanding of modulo arithmetics. Hopefully, it can be of help for someone else.

(the post first published at 20230516.)

List of my other blog posts.

Subscribe to my news feed

Yes, I know about these lousy Disqus ads. Please use adblocker. I would consider to subscribe to 'pro' version of Disqus if the signal/noise ratio in comments would be good enough.