 ## [Math] Parallelizing remainder calculation over bignum

For instance, you want to calculate a remainder of some very huge bignum: x=bignum % divisor. But bignum is very huge, you want to parallelize the process.

This is kind of problem in style of Project Euler. Here I split the (random) bignum by 3 parts and calculate 3 remainders. Can you write the missing function, that will reconstruct the 'whole' remainder?

#!/usr/bin/env python3
import random

CHUNK_SIZE=10000

for _ in range(30000):
hi=random.randint(0, CHUNK_SIZE-1)
mi=random.randint(0, CHUNK_SIZE-1)
lo=random.randint(0, CHUNK_SIZE-1)

dividend=hi*(CHUNK_SIZE**2) + mi*CHUNK_SIZE + lo

divisor=random.randint(1,CHUNK_SIZE-1)

# calculate 3 remainders in parallel:
rem_hi=hi % divisor
rem_mi=mi % divisor
rem_lo=lo % divisor

remainder_my=reconstruct_remainder(rem_hi, rem_mi, rem_lo, divisor)

# self-test:
remainder_real=dividend % divisor
assert remainder_real==remainder_my


Stop reading here. Try to work on this problem for a while. To continue reading, scroll down for the solution.

And there it is, with comments:

#!/usr/bin/env python3

def reconstruct_remainder(rem_hi, rem_mi, rem_lo, divisor):
# all these 4 methods work

# simplest version:
# two remainders are just shifted left
# in other words, 3 remainders 'stretch'
#x=(rem_hi*((CHUNK_SIZE**2)))
#x=x+(rem_mi*CHUNK_SIZE)
#x=x+rem_lo
#x=x % divisor
#return x

# with early modulo operation, so that all values will stay under divisor
# this is faster
#x=((rem_hi*((CHUNK_SIZE**2) % divisor)) % divisor)
#x=x+((rem_mi*(CHUNK_SIZE % divisor) % divisor))
#x=x+rem_lo
#x=x % divisor
#return x

# but if chunk size known beforehand, we can precompute these constants:
precomputed_mult_hi=(CHUNK_SIZE**2) % divisor
precomputed_mult_mi=CHUNK_SIZE % divisor

#x=rem_hi*precomputed_mult_hi
#x=x+(rem_mi*precomputed_mult_mi)
#x=x+rem_lo
#x=x % divisor
#return x

# probably the most performant method
# with early modulo operation
# all values always stays under divisor
x=(rem_hi*precomputed_mult_hi) % divisor
x=x+((rem_mi*precomputed_mult_mi) % divisor)
x=x+rem_lo
x=x % divisor
return x


It can run on any number of cores/threads.

Also, another fact from modulo arithmetics that helped me to solve that problem:

#!/usr/bin/env python3
import random

CHUNK_SIZE=10000

for _ in range(10000):
hi=random.randint(0, CHUNK_SIZE-1)
lo=random.randint(0, CHUNK_SIZE-1)

dividend=hi*CHUNK_SIZE + lo

divisor=random.randint(1, 50)

remainder=dividend % divisor

rem_hi=hi % divisor
rem_lo=lo % divisor

# note that. here we combine remainder from the high part with low part
# and we can divide it by divisor
x=(rem_hi*CHUNK_SIZE + lo) % divisor
assert x==remainder

x=(((rem_hi*(CHUNK_SIZE % divisor)) % divisor)+rem_lo) % divisor
assert x==remainder


Actually, I have no need in parallelizing such an operation. But I did this for better understanding of modulo arithmetics. Hopefully, it can be of help for someone else.

###### (the post first published at 20230516.) Yes, I know about these lousy Disqus ads. Please use adblocker. I would consider to subscribe to 'pro' version of Disqus if the signal/noise ratio in comments would be good enough.